Re: [DISCUSS] Remove flink-connector-filesystem module.

Jingsong Li Tue, 13 Oct 2020 04:39:20 -0700

Hi,

I share a concern:

Although we now support ORC Writer. It's not easy to support. We need to
override something for ORC classes.

Note that we are using a newer version of ORC, which is not forward
compatible. Therefore, the data written by users using Flink Orc writer may
not be readable by other engines, such as the old version of Hive.
However, it is not so easy for users to use streaming file sink to support
lower versions of ORC by themselves.

A replacement may be `HadoopPathBasedBulkFormatBuilder` which is added in
Flink 1.11.

Best,
Jingsong

On Tue, Oct 13, 2020 at 7:16 PM Chesnay Schepler <ches...@apache.org> wrote:

> How easy is the migration to the StreamingFileSink?
>
> On 10/13/2020 1:01 PM, Aljoscha Krettek wrote:
> > On 13.10.20 11:18, David Anderson wrote:
> >> I think the pertinent question is whether there are interesting cases
> >> where
> >> the BucketingSink is still a better choice. One case I'm not sure
> >> about is
> >> the situation described in docs for the StreamingFileSink under
> >> Important
> >> Note 2 [1]:
> >>
> >>      ... upon normal termination of a job, the last in-progress files
> >> will
> >> not be transitioned to the “finished” state.
> >>
> >> I know this confuses and frustrates users, but I don't know if the
> >> BucketingSink has any advantages in this regard.
> >
> > The BucketingSink suffers from the same problem. It's caused by the
> > fact that we don't do a "final" checkpoint before shutting down a
> > pipeline. We're trying to resolve that with FLIP-147 [1].
> >
> > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
> >
> >
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] Remove flink-connector-filesystem module.

Reply via email to