분리형 서빙(disaggregated-serving)이란 무엇인가요?

Question

Accepted Answer

LLM 추론의 두 단계인 프리필(Prefill)과 디코드(Decode)를 서로 다른 GPU 풀에서 독립적으로 처리하는 방식이다. 연산 중심의 프리필과 메모리 대역폭 중심의 디코드를 분리함으로써 자원 경합을 방지하고 전체 시스템의 처리량과 지연 시간을 최적화한다.

disaggregated-serving