4 posts with this tag.
Research framework for crafting and evaluating constitutional principles that actually work in AI alignment
Anthropic's training methodology that uses AI feedback instead of human feedback to align models
Extracting implicit principles from preference data by inverting the Constitutional AI process
Democratic approach to AI alignment where populations source principles through deliberation