alamb commented on code in PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#discussion_r2173214983


##########
datafusion-cli/src/main.rs:
##########
@@ -171,7 +171,13 @@ async fn main_inner() -> Result<()> {
         env::set_current_dir(p).unwrap();
     };
 
-    let session_config = get_session_config(&args)?;
+    let mut session_config = get_session_config(&args)?;
+
+    let parquet_options = &mut session_config.options_mut().execution.parquet;
+    // Consistent with the clickbench benchmark:
+    // The hits_partitioned dataset specifies string columns
+    // as binary due to how it was written. Force it to strings
+    parquet_options.binary_as_string = true;

Review Comment:
   Ah, this makes sense -- I found the clickbench equivalent option here: 
https://github.com/ClickHouse/ClickBench/blob/main/datafusion/create_partitioned.sql#L4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to