zhuqi-lucas commented on PR #15348: URL: https://github.com/apache/datafusion/pull/15348#issuecomment-2747123093
I compared the sort_partition for utf8 and utf8view benchmark flamegraph for high cardinality: The utf8_view: <img width="1725" alt="image" src="https://github.com/user-attachments/assets/ccd77866-93c0-4274-a55c-5b966487f19b" /> The utf8: <img width="1725" alt="image" src="https://github.com/user-attachments/assets/4b7fc76e-c58c-4086-a340-f09a1fb78410" /> It looks like the utf8 sort partition, will reserve less memory besides utf8view, so it optimize to use concat_batches: ```rust // If less than sort_in_place_threshold_bytes, concatenate and sort in place if self.reservation.size() < self.sort_in_place_threshold_bytes { // Concatenate memory batches together and sort let batch = concat_batches(&self.schema, &self.in_mem_batches)?; self.in_mem_batches.clear(); self.reservation .try_resize(get_reserved_byte_for_record_batch(&batch))?; let reservation = self.reservation.take(); return self.sort_batch_stream(batch, metrics, reservation); } ``` So it will be much fast. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org