정렬 검증(alignment-verification)이란 무엇인가요?

Question

Accepted Answer

AI 시스템이 설계자의 의도와 가치에 부합하게 작동하는지 기술적으로 확인하는 과정이다. 단순한 행동 테스트를 넘어 모델의 내부 메커니즘이 안전한지 직접 검증하는 것을 목표로 한다.

alignment-verification