본문으로 건너뛰기
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization | AI Trends