Hi Artem, I research on this and open a issue[1] , Rob Young , Alexander Fedulov and I discuss on this. We also think this performance issue can be solved by manual flush. I had opened a pr[2]. You can cherry pick and package on your local, replace the jar in lib folder.
I'm willing to hear from you about this. 1.https://issues.apache.org/jira/browse/FLINK-35240 2.https://github.com/apache/flink/pull/24730 Best, Zhongqiang Gong Robert Young <robertyoun...@gmail.com> 于2024年4月26日周五 13:25写道: > Hi Artem, > > I had a debug of Flink 1.17.1 (running CsvFilesystemBatchITCase) and I see > the same behaviour. It's the same on master too. Jackson flushes [1] the > underlying stream after every `writeValue` call. I experimented with > disabling the flush by disabling Jackson's FLUSH_PASSED_TO_STREAM [2] > feature but this broke the Integration tests. This is because Jackson wraps > the stream in it's own Writer that buffers data. We depend on the flush to > flush the jackson writer and eventually write the bytes to the stream. > > One workaround I found [3] is to wrap the stream in an implementation that > ignores flush calls, and pass that to Jackson. So Jackson will flush it's > writer buffers and write the bytes to the underlying stream, then try to > flush the underlying stream but it will be a No-Op. The CsvBulkWriter will > continues to flush/sync the underlying stream. Unfortunately this required > code changes in Flink CSV so might not be helpful for you. > > 1. > https://github.com/FasterXML/jackson-dataformats-text/blob/8700b5489090f81b4b8d2636f9298ac47dbf14a3/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvGenerator.java#L504 > 2. > https://fasterxml.github.io/jackson-core/javadoc/2.13/com/fasterxml/jackson/core/JsonGenerator.Feature.html#FLUSH_PASSED_TO_STREAM > 3. > https://github.com/robobario/flink/commit/ae3fdb1ca9de748df791af232bba57d6d7289a79 > > Rob Young >