FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Sheng, Ying, Zheng, Lianmin, Yuan, Binhang, Li, Zhuohan, Ryabinin, Max, Fu, Daniel Y., Xie, Zhiqiang, Chen, Beidi, Barrett, Clark, Gonzalez, Joseph E., Liang, Percy, Ré, Christopher, Stoica, Ion, and Zhang, Ce
2023