Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-22 Thread via GitHub
Iskander14yo commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3104690263 @parthchandra `query.py` in the original PR contains the whole SparkSession configuration. Here you can see following: ```python df = spark.read.parquet("hits.par

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-22 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3104670999 Thanks @Iskander14yo. I'll try that. Do you know how the values for EventDate are meant to be interpreted? I tried days in unix epoch but that also filtered out all t

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-21 Thread via GitHub
Iskander14yo commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3101264542 > I did not see this message when I reproduced the error. You can run `./benchmark.sh` (to make it faster, remove all queries from `queries.sql` except the one t

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-21 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3099856269 > It seems that Comet falls back to Spark execution (`Comet native execution is disabled due to: unsupported Spark partitioning: ArrayBuffer(PageViews#463L DESC NULLS

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-21 Thread via GitHub
Iskander14yo commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3099044466 Thanks for the feedback! **On the failing query:** Appreciate the reminder, I had forgotten that Comet can use different readers. To avoid extra tuning, I’ll

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-18 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3090666597 The query succeeds with `native_iceberg_compat` and `--conf spark.comet.scan.allowIncompatible=true` -- This is an automated message from the Apache Git Service

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-18 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3090665542 I can reproduce the above problem only after removing this part`EventDate >= '2013-07-01' AND EventDate <= '2013-07-31'` from the query. (With the clause all records

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-18 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3090665794 With the EventDate clause we may have a second problem. ``` scala> spark.sql("SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID =

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-18 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3090174632 Thank you for trying to add Comet to Clickbench! Your configuration looks ok. The distribution of memory between heap and off-heap is a little tricky and really