subject:"How vLLM Model Handler Works \(Plus a Summary of Model Memory Management in Beam ML\)"

Re: How vLLM Model Handler Works (Plus a Summary of Model Memory Management in Beam ML)

2025-01-31 Thread Kenneth Knowles

Great read! Thanks for sharing. Highly recommend. On Fri, Jan 31, 2025 at 11:57 AM Danny McCormick via dev < dev@beam.apache.org> wrote: > Late last year, I added support for vLLM in RunInference. I ended up being > able to go from prototyping to checked in code quickly enough that I didn't > put

How vLLM Model Handler Works (Plus a Summary of Model Memory Management in Beam ML)

2025-01-31 Thread Danny McCormick via dev

Late last year, I added support for vLLM in RunInference. I ended up being able to go from prototyping to checked in code quickly enough that I didn't put together/share a full design, but in retrospect I thought it might be helpful to have a record of what I did since others might want to do simil