korowa commented on issue #14238: URL: https://github.com/apache/datafusion/issues/14238#issuecomment-2614450912
Simple embedding of coalescer into filter ([branch](https://github.com/korowa/arrow-datafusion/tree/coalesce-filter) [commit](https://github.com/korowa/arrow-datafusion/commit/ee583d5d54f5c5aa227bf2a65c3ae1ace086de0b)) gave following tpch results: <details> <summary>iterations = 50, target_partitions = 1</summary> ``` -------------------- Benchmark tpch_sf1.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Query ┃ main-50-1 ┃ coalesce-filter-50-1 ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ QQuery 1 │ 627.47ms │ 629.93ms │ no change │ │ QQuery 2 │ 70.65ms │ 68.79ms │ no change │ │ QQuery 3 │ 205.84ms │ 211.65ms │ no change │ │ QQuery 4 │ 142.87ms │ 142.49ms │ no change │ │ QQuery 5 │ 260.97ms │ 261.31ms │ no change │ │ QQuery 6 │ 112.58ms │ 114.68ms │ no change │ │ QQuery 7 │ 448.47ms │ 442.84ms │ no change │ │ QQuery 8 │ 292.73ms │ 292.71ms │ no change │ │ QQuery 9 │ 445.51ms │ 440.39ms │ no change │ │ QQuery 10 │ 302.97ms │ 304.37ms │ no change │ │ QQuery 11 │ 52.44ms │ 50.49ms │ no change │ │ QQuery 12 │ 220.78ms │ 220.20ms │ no change │ │ QQuery 13 │ 257.90ms │ 255.94ms │ no change │ │ QQuery 14 │ 180.84ms │ 181.05ms │ no change │ │ QQuery 15 │ 223.13ms │ 224.52ms │ no change │ │ QQuery 16 │ 46.47ms │ 46.51ms │ no change │ │ QQuery 17 │ 444.51ms │ 446.87ms │ no change │ │ QQuery 18 │ 673.91ms │ 658.09ms │ no change │ │ QQuery 19 │ 355.00ms │ 353.86ms │ no change │ │ QQuery 20 │ 218.29ms │ 219.16ms │ no change │ │ QQuery 21 │ 502.70ms │ 499.70ms │ no change │ │ QQuery 22 │ 75.02ms │ 73.65ms │ no change │ └──────────────┴───────────┴──────────────────────┴───────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ Total Time (main-50-1) │ 6161.05ms │ │ Total Time (coalesce-filter-50-1) │ 6139.19ms │ │ Average Time (main-50-1) │ 280.05ms │ │ Average Time (coalesce-filter-50-1) │ 279.05ms │ │ Queries Faster │ 0 │ │ Queries Slower │ 0 │ │ Queries with No Change │ 22 │ └─────────────────────────────────────┴───────────┘ ``` </details> <details> <summary>iterations = 50, target_partitions = 4</summary> ``` -------------------- Benchmark tpch_sf1.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main-50-4 ┃ coalesce-filter-50-4 ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 271.38ms │ 305.43ms │ 1.13x slower │ │ QQuery 2 │ 85.21ms │ 83.54ms │ no change │ │ QQuery 3 │ 132.98ms │ 138.93ms │ no change │ │ QQuery 4 │ 99.76ms │ 98.72ms │ no change │ │ QQuery 5 │ 221.97ms │ 211.67ms │ no change │ │ QQuery 6 │ 47.04ms │ 46.85ms │ no change │ │ QQuery 7 │ 291.19ms │ 301.14ms │ no change │ │ QQuery 8 │ 168.84ms │ 181.85ms │ 1.08x slower │ │ QQuery 9 │ 277.33ms │ 278.66ms │ no change │ │ QQuery 10 │ 216.29ms │ 216.12ms │ no change │ │ QQuery 11 │ 56.61ms │ 59.36ms │ no change │ │ QQuery 12 │ 145.40ms │ 135.61ms │ +1.07x faster │ │ QQuery 13 │ 199.95ms │ 189.47ms │ +1.06x faster │ │ QQuery 14 │ 90.87ms │ 85.90ms │ +1.06x faster │ │ QQuery 15 │ 122.24ms │ 140.18ms │ 1.15x slower │ │ QQuery 16 │ 56.68ms │ 57.13ms │ no change │ │ QQuery 17 │ 280.48ms │ 284.55ms │ no change │ │ QQuery 18 │ 521.43ms │ 517.77ms │ no change │ │ QQuery 19 │ 141.59ms │ 144.49ms │ no change │ │ QQuery 20 │ 152.08ms │ 150.74ms │ no change │ │ QQuery 21 │ 368.20ms │ 356.05ms │ no change │ │ QQuery 22 │ 55.01ms │ 55.71ms │ no change │ └──────────────┴───────────┴──────────────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ Total Time (main-50-4) │ 4002.52ms │ │ Total Time (coalesce-filter-50-4) │ 4039.86ms │ │ Average Time (main-50-4) │ 181.93ms │ │ Average Time (coalesce-filter-50-4) │ 183.63ms │ │ Queries Faster │ 3 │ │ Queries Slower │ 3 │ │ Queries with No Change │ 16 │ └─────────────────────────────────────┴───────────┘ ``` </details> I tend to interpret these numbers as no difference between the two approaches so I don't see (yet) a rationale for embedding coalescer into the operators, and it seems to be fine to leave it as it is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org