64 for writing hive style partitions [datafusion]

via GitHub Fri, 06 Sep 2024 15:12:29 -0700


alamb commented on code in PR #12283:
URL: https://github.com/apache/datafusion/pull/12283#discussion_r1747767088



##########
datafusion/core/src/datasource/file_format/write/demux.rs:
##########
@@ -320,9 +324,11 @@ async fn hive_style_partitions_demuxer(
 fn compute_partition_keys_by_row<'a>(
     rb: &'a RecordBatch,
     partition_by: &'a [(String, DataType)],
-) -> Result<Vec<Vec<&'a str>>> {
+) -> Result<Vec<Vec<String>>> {

Review Comment:
   🤔  I wonder if computing new strings for each row will be unnecessarily slow 
🤔  The current code only allocates a string for each distinct partition value 
(in the final take map) but this code now creates a new string for each row in 
the output record batch, just to match them up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for Utf8View, Boolean, Date32/64, int32/64 for writing hive style partitions [datafusion]

Reply via email to