Hello, We are currently using Flink 1.5 and we use the BucketingSink to save the result of job processing to HDFS. The data is in JSON format and we store one object per line in the resulting files.
We are planning to upgrade to Flink 1.6 and we see that there is this new StreamingFileSink, from the description it looks very similar to BucketingSink when using Row-encoded Output Format, my question is, should we consider to move to StreamingFileSink? I would like to better understand what are the suggested use cases for each of the two options now (?) We are also considering to additionally output the data in Parquet format for data scientists (to be stored in HDFS as well), for this I see some utils to work with StreamingFileSink, so I guess for this case it's recommended to use that option(?). Is it possible to use the Parquet writers even when the schema of the data may evolve ? Thanks in advance for your help. (Sorry if I put too many questions in the same message) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/