FP8 양자화(fp8-quantization)이란 무엇인가요?

Question

Accepted Answer

모델의 가중치를 8비트 부동소수점 형식으로 변환하여 메모리 사용량을 줄이고 추론 속도를 높이는 최적화 기법이다.

fp8-quantization