andygrove opened a new issue, #1186: URL: https://github.com/apache/datafusion-comet/issues/1186
### What is the problem the feature request solves? We use Arrow IPC to write shuffle output. We create a new writer for each batch and this means that we seralize the schema for each batch. ```rust let mut arrow_writer = StreamWriter::try_new(zstd::Encoder::new(output, 1)?, &batch.schema())?; arrow_writer.write(batch)?; arrow_writer.finish()?; ``` The schema is guaranteed to be the same for every batch so we should be able to use a single writer for all batches and avoid the cost of serializing the schema each time. Based on one benchmarks in https://github.com/apache/datafusion-comet/pull/1180 I am seeing a 4x speedup in encoding time by re-using the writer. ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org