[ 
https://issues.apache.org/jira/browse/FLINK-35536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Nadberezny updated FLINK-35536:
---------------------------------------
    Attachment: RecordWiseFileCompactorSpecificAvroReaderFactory.java

> FileSystem sink on S3 produces invalid Avros when compaction is turned off
> --------------------------------------------------------------------------
>
>                 Key: FLINK-35536
>                 URL: https://issues.apache.org/jira/browse/FLINK-35536
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 1.19.0
>            Reporter: Juliusz Nadberezny
>            Priority: Major
>         Attachments: FileSink.java, 
> RecordWiseFileCompactorSpecificAvroReaderFactory.java
>
>
> Compaction on FileSystem sink on S3 uses multipart upload process.
> When compaction is turned on, everything is working as expected and sink 
> produces correct files.
> The problem is when you disable compaction for the sink that previously had 
> it enabled. In this case files that where being kept by multipart upload and 
> then are "released" with CompleteMultipartUpload will be broken.
> Broken Avro files seem to have Avro schema duplicated at the beginning of the 
> file.
>  
> Attached please find:
> 1. Implementation of RecordWiseFileCompactor.Reader.Factory that we are using.
> 2. FileSink definition
>  
> Steps to reproduce:
> 1. Deploy job with FileSystem sink with compaction enabled writing to 
> S3/MinIO.
> 2. Wait for job to produce some output.
> 3. Redeploy job with compaction disabled.
> 4. Wait for multipart upload complete and verify released files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to