AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving
Bin Gao , Zhuomin He , Puru Sharma , Qingxuan Kang , Djordje Jevdjic , Junbo Deng , Xingkun Yang , Zhou Yu , and Pengfei Zuo
In 2024 USENIX Annual Technical Conference , 2024