Hi  Artem,

I research on this and open a issue[1] , Rob Young , Alexander Fedulov and
I discuss on this. We also think this performance issue can be solved by
manual flush. I had
opened a pr[2]. You can cherry pick and package on your local, replace the
jar in lib folder.

I'm willing to hear from you about this.

1.https://issues.apache.org/jira/browse/FLINK-35240
2.https://github.com/apache/flink/pull/24730


Best,
Zhongqiang Gong

Robert Young <robertyoun...@gmail.com> 于2024年4月26日周五 13:25写道:

> Hi Artem,
>
> I had a debug of Flink 1.17.1 (running CsvFilesystemBatchITCase) and I see
> the same behaviour. It's the same on master too. Jackson flushes [1] the
> underlying stream after every `writeValue` call. I experimented with
> disabling the flush by disabling Jackson's FLUSH_PASSED_TO_STREAM [2]
> feature but this broke the Integration tests. This is because Jackson wraps
> the stream in it's own Writer that buffers data. We depend on the flush to
> flush the jackson writer and eventually write the bytes to the stream.
>
> One workaround I found [3] is to wrap the stream in an implementation that
> ignores flush calls, and pass that to Jackson. So Jackson will flush it's
> writer buffers and write the bytes to the underlying stream, then try to
> flush the underlying stream but it will be a No-Op. The CsvBulkWriter will
> continues to flush/sync the underlying stream. Unfortunately this required
> code changes in Flink CSV so might not be helpful for you.
>
> 1.
> https://github.com/FasterXML/jackson-dataformats-text/blob/8700b5489090f81b4b8d2636f9298ac47dbf14a3/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvGenerator.java#L504
> 2.
> https://fasterxml.github.io/jackson-core/javadoc/2.13/com/fasterxml/jackson/core/JsonGenerator.Feature.html#FLUSH_PASSED_TO_STREAM
> 3.
> https://github.com/robobario/flink/commit/ae3fdb1ca9de748df791af232bba57d6d7289a79
>
> Rob Young
>

Reply via email to