Re: [PR] feat: Add Native Support for In-Memory Cache [datafusion-comet]

via GitHub Wed, 01 Jul 2026 00:19:32 -0700


pchintar commented on PR #4591:
URL: 
https://github.com/apache/datafusion-comet/pull/4591#issuecomment-4851412494


   Hi @andygrove , thanks again for the detailed review! I've addressed your 
review comments in the latest update:
   
   * Reworked the planner logic to determine whether the native cache path 
should be used from the configured cache serializer together with the feature 
flag, eliminating the planning-time Spark job previously used to inspect cached 
batches.
   
   * Removed the unnecessary driver-side `collect().distinct()` from the 
serializer read path
   
   * Replaced the length-based identity projection shortcut with an explicit 
identity-index check so only true identity projections bypass projection, while 
reordered projections continue to project columns correctly.
   
   * Updated the test suite to remove the Spark 4.x-specific `classic.Dataset` 
dependency from the shared test code, keeping the suite cross-version 
compatible.
   
   * For the ownership concern, I took another look at the current 
implementation but didn't make any further changes. The current implementation 
already decodes only the requested columns through 
`convertCachedBatchToColumnarBatch`, and I couldn't identify a way to simplify 
that path further without changing its behavior.
   
   I also added the new `CometInMemoryCacheBenchmark` under 
`spark/src/test/scala/org/apache/spark/sql/benchmark/`, following the existing 
benchmark style. It compares repeated scans of a cached table with the native 
cache path enabled and disabled, together with a selective-filter variant to 
exercise the pruning path. On my local machine, I observed modest improvements 
for both workloads. When you have a chance, could you please take a look at the 
benchmark? I tried to follow the requested benchmark structure but if you'd 
prefer any further changes to the workload/setup, I would be happy to update it 
further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add Native Support for In-Memory Cache [datafusion-comet]

Reply via email to