Hi, I share a concern:
Although we now support ORC Writer. It's not easy to support. We need to override something for ORC classes. Note that we are using a newer version of ORC, which is not forward compatible. Therefore, the data written by users using Flink Orc writer may not be readable by other engines, such as the old version of Hive. However, it is not so easy for users to use streaming file sink to support lower versions of ORC by themselves. A replacement may be `HadoopPathBasedBulkFormatBuilder` which is added in Flink 1.11. Best, Jingsong On Tue, Oct 13, 2020 at 7:16 PM Chesnay Schepler <ches...@apache.org> wrote: > How easy is the migration to the StreamingFileSink? > > On 10/13/2020 1:01 PM, Aljoscha Krettek wrote: > > On 13.10.20 11:18, David Anderson wrote: > >> I think the pertinent question is whether there are interesting cases > >> where > >> the BucketingSink is still a better choice. One case I'm not sure > >> about is > >> the situation described in docs for the StreamingFileSink under > >> Important > >> Note 2 [1]: > >> > >> ... upon normal termination of a job, the last in-progress files > >> will > >> not be transitioned to the “finished” state. > >> > >> I know this confuses and frustrates users, but I don't know if the > >> BucketingSink has any advantages in this regard. > > > > The BucketingSink suffers from the same problem. It's caused by the > > fact that we don't do a "final" checkpoint before shutting down a > > pipeline. We're trying to resolve that with FLIP-147 [1]. > > > > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ > > > > > > -- Best, Jingsong Lee