Re: [PR] Batched GPU dispatch and object caching for WebGPU runtime [tvm]

via GitHub Wed, 04 Mar 2026 09:27:30 -0800


mitiskuma commented on code in PR #18871:
URL: https://github.com/apache/tvm/pull/18871#discussion_r2885096090



##########
web/src/runtime.ts:
##########
@@ -1674,8 +1679,17 @@ export class Instance implements Disposable {
    * @returns The created shape tuple.
    */
   makeShapeTuple(shape: Array<number>): TVMObject {
+    const key = shape.toString();
+    const cached = this.shapeTupleCache.get(key);
+    if (cached !== undefined) {
+      return cached;
+    }
     const shapeArray = shape.map((value) => new Scalar(value, "int"));
-    return this.ctx.makeShapeTuple(...shapeArray);
+    const tuple = this.ctx.makeShapeTuple(...shapeArray);
+    // Detach from scope so the cached object survives across scopes.
+    this.detachFromCurrentScope(tuple);
+    this.shapeTupleCache.set(key, tuple);
+    return tuple;

Review Comment:
   The number of unique shapes in an LLM model is small and bounded (typically 
a few dozen at most), so this cache won't grow unboundedly in practice. Adding 
FIFO eviction here would be over-engineering for a problem that doesn't occur 
in real workloads. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Batched GPU dispatch and object caching for WebGPU runtime [tvm]

Reply via email to