zhuqi-lucas opened a new issue, #16241: URL: https://github.com/apache/datafusion/issues/16241
### Is your feature request related to a problem or challenge? Currently, read from CSV default to UTF8, when setting to UTF8, the performance improved a lot. See the result: ```rust ./bench.sh compare main default_utf8_for_unkown_type Comparing main and default_utf8_for_unkown_type Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/clickbench_1.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/clickbench_1.json does not exist Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/clickbench_partitioned.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/clickbench_partitioned.json does not exist Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/h2o_join.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/h2o_join.json does not exist Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch.json does not exist Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch1.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch1.json does not exist Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch10.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch10.json does not exist -------------------- Benchmark tpch_mem_sf10.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ default_utf8_for_unkown_type ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 328.67ms │ 321.92ms │ no change │ │ QQuery 2 │ 63.01ms │ 61.09ms │ no change │ │ QQuery 3 │ 115.07ms │ 115.89ms │ no change │ │ QQuery 4 │ 65.51ms │ 65.96ms │ no change │ │ QQuery 5 │ 226.31ms │ 228.79ms │ no change │ │ QQuery 6 │ 49.78ms │ 55.67ms │ 1.12x slower │ │ QQuery 7 │ 500.94ms │ 491.28ms │ no change │ │ QQuery 8 │ 169.84ms │ 170.33ms │ no change │ │ QQuery 9 │ 376.36ms │ 377.73ms │ no change │ │ QQuery 10 │ 173.76ms │ 176.28ms │ no change │ │ QQuery 11 │ 44.19ms │ 44.36ms │ no change │ │ QQuery 12 │ 177.45ms │ 176.37ms │ no change │ │ QQuery 13 │ 120.58ms │ 119.20ms │ no change │ │ QQuery 14 │ 23.83ms │ 22.58ms │ +1.06x faster │ │ QQuery 15 │ 56.57ms │ 55.66ms │ no change │ │ QQuery 16 │ 51.25ms │ 53.85ms │ 1.05x slower │ │ QQuery 17 │ 419.65ms │ 398.08ms │ +1.05x faster │ │ QQuery 18 │ 2142.91ms │ 1926.17ms │ +1.11x faster │ │ QQuery 19 │ 80.08ms │ 80.44ms │ no change │ │ QQuery 20 │ 110.32ms │ 108.41ms │ no change │ │ QQuery 21 │ 835.77ms │ 776.81ms │ +1.08x faster │ │ QQuery 22 │ 51.87ms │ 50.27ms │ no change │ └──────────────┴───────────┴──────────────────────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ Total Time (main) │ 6183.72ms │ │ Total Time (default_utf8_for_unkown_type) │ 5877.13ms │ │ Average Time (main) │ 281.08ms │ │ Average Time (default_utf8_for_unkown_type) │ 267.14ms │ │ Queries Faster │ 4 │ │ Queries Slower │ 2 │ │ Queries with No Change │ 16 │ └─────────────────────────────────────────────┴───────────┘ -------------------- Benchmark tpch_sf1.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ default_utf8_for_unkown_type ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 53.89ms │ 55.31ms │ no change │ │ QQuery 2 │ 18.83ms │ 18.69ms │ no change │ │ QQuery 3 │ 27.53ms │ 28.45ms │ no change │ │ QQuery 4 │ 19.24ms │ 20.74ms │ 1.08x slower │ │ QQuery 5 │ 38.84ms │ 38.58ms │ no change │ │ QQuery 6 │ 18.38ms │ 17.62ms │ no change │ │ QQuery 7 │ 49.14ms │ 50.69ms │ no change │ │ QQuery 8 │ 38.30ms │ 39.04ms │ no change │ │ QQuery 9 │ 70.32ms │ 46.85ms │ +1.50x faster │ │ QQuery 10 │ 58.20ms │ 39.86ms │ +1.46x faster │ │ QQuery 11 │ 20.48ms │ 13.67ms │ +1.50x faster │ │ QQuery 12 │ 36.34ms │ 29.02ms │ +1.25x faster │ │ QQuery 13 │ 30.98ms │ 27.47ms │ +1.13x faster │ │ QQuery 14 │ 22.34ms │ 22.23ms │ no change │ │ QQuery 15 │ 33.72ms │ 33.16ms │ no change │ │ QQuery 16 │ 12.58ms │ 12.55ms │ no change │ │ QQuery 17 │ 57.71ms │ 56.33ms │ no change │ │ QQuery 18 │ 67.58ms │ 68.15ms │ no change │ │ QQuery 19 │ 33.12ms │ 36.06ms │ 1.09x slower │ │ QQuery 20 │ 27.81ms │ 28.32ms │ no change │ │ QQuery 21 │ 57.20ms │ 58.21ms │ no change │ │ QQuery 22 │ 12.38ms │ 12.75ms │ no change │ └──────────────┴─────────┴──────────────────────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Total Time (main) │ 804.91ms │ │ Total Time (default_utf8_for_unkown_type) │ 753.77ms │ │ Average Time (main) │ 36.59ms │ │ Average Time (default_utf8_for_unkown_type) │ 34.26ms │ │ Queries Faster │ 5 │ │ Queries Slower │ 2 │ │ Queries with No Change │ 15 │ └─────────────────────────────────────────────┴──────────┘ ``` ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org