alamb commented on issue #13983:
URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2612064902

   > Q23 might be improved if it can utilize filter pushdown? I think a >5x 
improvement might come from that.
   
   
   Running without filter pushdown (the default)
   
   ```sql
   set datafusion.execution.parquet.pushdown_filters = false;
   
   SELECT "SearchPhrase", MIN("URL"), MIN("Title"), COUNT(*) AS c, 
COUNT(DISTINCT "UserID") FROM hits_partitioned WHERE "Title" LIKE '%Google%' 
AND "URL" NOT LIKE '%.google.%' AND "SearchPhrase" <> '' GROUP BY 
"SearchPhrase" ORDER BY c DESC LIMIT 10;
   ```
   I get:
   
   Elapsed 2.232 seconds.
   Elapsed 2.252 seconds.
   Elapsed 2.236 seconds.
   
   When I enabled filter pushdown it goes 15% faster.
   
   ```sql
   set datafusion.execution.parquet.pushdown_filters = true;
   
   SELECT "SearchPhrase", MIN("URL"), MIN("Title"), COUNT(*) AS c, 
COUNT(DISTINCT "UserID") FROM hits_partitioned WHERE "Title" LIKE '%Google%' 
AND "URL" NOT LIKE '%.google.%' AND "SearchPhrase" <> '' GROUP BY 
"SearchPhrase" ORDER BY c DESC LIMIT 10;
   ```
   
   I get:
   Elapsed 1.981 seconds.
   Elapsed 1.953 seconds.
   Elapsed 1.966 seconds.
   
   
   Still not 5x though 🤔 
   
   Though it gives me new motivation tohelp @XiangpengHao  get the pushdown 
improvements over the line in 
   
   https://github.com/apache/arrow-rs/pull/6921


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to