On a similar note, I just checked that the Flink currently uses orc 1.4.3
in the dependencies. IMO, it is a little outdated. Can we bump the ORC
version to a slightly newer version - maybe 1.5.x or even 1.6.0?

-
Sivaprasanna

On Tue, Apr 14, 2020 at 1:42 PM Jingsong Li <jingsongl...@gmail.com> wrote:

> Hi,
>
> Maybe you should use flink-orc. And use orc-core instead of orc-core with
> nohive classifier. We can provide nohive version in the future.
>
> Because orc and hive are so close, orc still relies on some classes of hive
> currently.
> Apache orc with nohive classifier is for create a variant of core and
> mapreduce jars that don't conflict with hive 1.x [1]
>
> So the orc and orc-nohive have same class name, but orc-nohive
> shade/relocation lots of classes, like "ColumnVector" and
> "VectorizedRowBatch".
> Now the flink-orc-nohive depends on flink-orc, they share lots of codes.
> They can not be unified to a separate module, there will be a lot of
> conflicts.
>
> [1]https://issues.apache.org/jira/browse/ORC-174
>
> Best,
> Jingsong Lee
>
> On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <sivaprasanna...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I'm working on an implementation of ORC BulkWriter[1]. As of now, I have
> > the entire implementation in a separate module called
> "flink-orc-compress"
> > under "flink-formats" since I'm not entirely sure whether it should go
> into
> > the existing ORC modules i.e flink-orc & flink-orc-nohive.
> >
> > So my questions are:
> > 1. What's the difference between these two ORC modules?
> > 2. Should the ORC BulkWriter implementation go into one of these existing
> > modules? If yes, which one? Or can we keep it in a separate module to
> avoid
> > duplicating or causing any conflicts?
> >
> > Note: My current implementation of ORC BulkWriter uses orc-core with
> nohive
> > classifier as the dependency.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-10114
> >
>
>
> --
> Best, Jingsong Lee
>

Reply via email to