zhuqi-lucas opened a new issue, #16241:
URL: https://github.com/apache/datafusion/issues/16241

   ### Is your feature request related to a problem or challenge?
   
   Currently, read from CSV default to UTF8, when setting to UTF8, the 
performance improved a lot.
   
   See the result:
   
   ```rust
   ./bench.sh compare  main  default_utf8_for_unkown_type
   Comparing main and default_utf8_for_unkown_type
   Note: Skipping 
/Users/zhuqi/arrow-datafusion/benchmarks/results/main/clickbench_1.json as 
/Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/clickbench_1.json
 does not exist
   Note: Skipping 
/Users/zhuqi/arrow-datafusion/benchmarks/results/main/clickbench_partitioned.json
 as 
/Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/clickbench_partitioned.json
 does not exist
   Note: Skipping 
/Users/zhuqi/arrow-datafusion/benchmarks/results/main/h2o_join.json as 
/Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/h2o_join.json
 does not exist
   Note: Skipping 
/Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch.json as 
/Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch.json
 does not exist
   Note: Skipping 
/Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch1.json as 
/Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch1.json
 does not exist
   Note: Skipping 
/Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch10.json as 
/Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch10.json
 does not exist
   --------------------
   Benchmark tpch_mem_sf10.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃      main ┃ default_utf8_for_unkown_type ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │  328.67ms │                     321.92ms │     no change │
   │ QQuery 2     │   63.01ms │                      61.09ms │     no change │
   │ QQuery 3     │  115.07ms │                     115.89ms │     no change │
   │ QQuery 4     │   65.51ms │                      65.96ms │     no change │
   │ QQuery 5     │  226.31ms │                     228.79ms │     no change │
   │ QQuery 6     │   49.78ms │                      55.67ms │  1.12x slower │
   │ QQuery 7     │  500.94ms │                     491.28ms │     no change │
   │ QQuery 8     │  169.84ms │                     170.33ms │     no change │
   │ QQuery 9     │  376.36ms │                     377.73ms │     no change │
   │ QQuery 10    │  173.76ms │                     176.28ms │     no change │
   │ QQuery 11    │   44.19ms │                      44.36ms │     no change │
   │ QQuery 12    │  177.45ms │                     176.37ms │     no change │
   │ QQuery 13    │  120.58ms │                     119.20ms │     no change │
   │ QQuery 14    │   23.83ms │                      22.58ms │ +1.06x faster │
   │ QQuery 15    │   56.57ms │                      55.66ms │     no change │
   │ QQuery 16    │   51.25ms │                      53.85ms │  1.05x slower │
   │ QQuery 17    │  419.65ms │                     398.08ms │ +1.05x faster │
   │ QQuery 18    │ 2142.91ms │                    1926.17ms │ +1.11x faster │
   │ QQuery 19    │   80.08ms │                      80.44ms │     no change │
   │ QQuery 20    │  110.32ms │                     108.41ms │     no change │
   │ QQuery 21    │  835.77ms │                     776.81ms │ +1.08x faster │
   │ QQuery 22    │   51.87ms │                      50.27ms │     no change │
   └──────────────┴───────────┴──────────────────────────────┴───────────────┘
   ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Benchmark Summary                           ┃           ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ Total Time (main)                           │ 6183.72ms │
   │ Total Time (default_utf8_for_unkown_type)   │ 5877.13ms │
   │ Average Time (main)                         │  281.08ms │
   │ Average Time (default_utf8_for_unkown_type) │  267.14ms │
   │ Queries Faster                              │         4 │
   │ Queries Slower                              │         2 │
   │ Queries with No Change                      │        16 │
   └─────────────────────────────────────────────┴───────────┘
   --------------------
   Benchmark tpch_sf1.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃    main ┃ default_utf8_for_unkown_type ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │ 53.89ms │                      55.31ms │     no change │
   │ QQuery 2     │ 18.83ms │                      18.69ms │     no change │
   │ QQuery 3     │ 27.53ms │                      28.45ms │     no change │
   │ QQuery 4     │ 19.24ms │                      20.74ms │  1.08x slower │
   │ QQuery 5     │ 38.84ms │                      38.58ms │     no change │
   │ QQuery 6     │ 18.38ms │                      17.62ms │     no change │
   │ QQuery 7     │ 49.14ms │                      50.69ms │     no change │
   │ QQuery 8     │ 38.30ms │                      39.04ms │     no change │
   │ QQuery 9     │ 70.32ms │                      46.85ms │ +1.50x faster │
   │ QQuery 10    │ 58.20ms │                      39.86ms │ +1.46x faster │
   │ QQuery 11    │ 20.48ms │                      13.67ms │ +1.50x faster │
   │ QQuery 12    │ 36.34ms │                      29.02ms │ +1.25x faster │
   │ QQuery 13    │ 30.98ms │                      27.47ms │ +1.13x faster │
   │ QQuery 14    │ 22.34ms │                      22.23ms │     no change │
   │ QQuery 15    │ 33.72ms │                      33.16ms │     no change │
   │ QQuery 16    │ 12.58ms │                      12.55ms │     no change │
   │ QQuery 17    │ 57.71ms │                      56.33ms │     no change │
   │ QQuery 18    │ 67.58ms │                      68.15ms │     no change │
   │ QQuery 19    │ 33.12ms │                      36.06ms │  1.09x slower │
   │ QQuery 20    │ 27.81ms │                      28.32ms │     no change │
   │ QQuery 21    │ 57.20ms │                      58.21ms │     no change │
   │ QQuery 22    │ 12.38ms │                      12.75ms │     no change │
   └──────────────┴─────────┴──────────────────────────────┴───────────────┘
   ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
   ┃ Benchmark Summary                           ┃          ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
   │ Total Time (main)                           │ 804.91ms │
   │ Total Time (default_utf8_for_unkown_type)   │ 753.77ms │
   │ Average Time (main)                         │  36.59ms │
   │ Average Time (default_utf8_for_unkown_type) │  34.26ms │
   │ Queries Faster                              │        5 │
   │ Queries Slower                              │        2 │
   │ Queries with No Change                      │       15 │
   └─────────────────────────────────────────────┴──────────┘
   ```
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to