Re: Greetings and question

Fabian Hueske Mon, 17 Jul 2017 04:21:16 -0700

Hi Robert,

I don't think anybody is working on a ORC file sink.
Are you interested in a sink for data streams or a batch sink?


Implementing a batch sink shouldn't be very hard.
You can either implement an OutputFormat the internally uses the ORC Java
API or you try to use Flink's HadoopOutputFormat which can wrap Hadoop
OutputFormats.

If you need a streaming ORC sink, things become a bit more challenging
because you would need to integrate the sink with Flink's checkpointing
mechanism.
I would recommend to have a look at the BucketingSink and it's JavaDocs.

Best,
Fabian

2017-07-17 6:55 GMT+02:00 Tzu-Li (Gordon) Tai <tzuli...@apache.org>:

> Hi Robert,
>
> Thanks for your interest in contributing that.
> AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink.
> I’ll loop in Fabian (CC'ed) who might know more about this.
> The only complicated consideration in designing sinks is to consider the
> delivery guarantees it will provide and how to provide them using Flink’s
> checkpointing mechanism.
> I would suggest to open a JIRA (if there isn’t one already) and elaborate
> the details there to collect feedback before jumping right in.
>
> Cheers,
> Gordon
>
> On 17 July 2017 at 3:47:02 AM, Robert Rapplean (
> rrappl...@altitudedigital.com) wrote:
>
> Hey, everyone.
>
> I have a need for Flink to write to ORCFile tables in the near future.
> Could someone educate me on the current challenges that might make that
> hard to do? I've worked quite a bit with the HCat libraries, and may be
> overconfident about how complicated this is. Is anyone currently working
> on
> the issue?
>
> I'd go ahead and submit a Jira ticket for this, but am deterred by the
> thought that someone should have already created such a ticket, and
> wondering why it isn't already there. It may be a priority thing, but this
> is my personal priority at the moment.
>
> Best,
>
> Robert
>
>

Re: Greetings and question

Reply via email to