zhuqi-lucas commented on PR #15348:
URL: https://github.com/apache/datafusion/pull/15348#issuecomment-2747123093

   I compared the sort_partition for utf8 and utf8view benchmark flamegraph for 
high cardinality:
   
   The utf8_view:
   
   <img width="1725" alt="image" 
src="https://github.com/user-attachments/assets/ccd77866-93c0-4274-a55c-5b966487f19b";
 />
   
   
   
   The utf8:
   <img width="1725" alt="image" 
src="https://github.com/user-attachments/assets/4b7fc76e-c58c-4086-a340-f09a1fb78410";
 />
   
   
   
   
   It looks like the utf8 sort partition, will reserve less memory besides 
utf8view, so it optimize to use concat_batches:
   
   ```rust
   // If less than sort_in_place_threshold_bytes, concatenate and sort in place
           if self.reservation.size() < self.sort_in_place_threshold_bytes {
               // Concatenate memory batches together and sort
               let batch = concat_batches(&self.schema, &self.in_mem_batches)?;
               self.in_mem_batches.clear();
               self.reservation
                   .try_resize(get_reserved_byte_for_record_batch(&batch))?;
               let reservation = self.reservation.take();
               return self.sort_batch_stream(batch, metrics, reservation);
           }
   ```
   
   So it will be much fast.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to