mitiskuma opened a new pull request, #18871: URL: https://github.com/apache/tvm/pull/18871
## Summary - Batch compute dispatches into a single GPUCommandEncoder, flushing on sync/readback instead of per-dispatch submit to reduce JS↔GPU transition overhead during LLM decode - Cache uniform buffers (FIFO/512), bind groups (FIFO/256), shape tuples, and pool MAP_READ staging buffers to eliminate redundant GPU object creation - Fix padding self-assignment bug in `deviceCopyToGPU` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
