GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files
The above results were performed with the following setup: * `datafusion-cli -m 8G -d 50G --top-memory-consumers 25` * The default `datafusion.execution.parquet.max_row_group_size` of 1048576 * No other configs were modified GitHub link: https://github.com/apache/datafusion/discussions/16776#discussioncomment-13780281 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org