Re: flink-orc or flink-orc-nohive

Jingsong Li Tue, 14 Apr 2020 01:13:50 -0700

Hi,

Maybe you should use flink-orc. And use orc-core instead of orc-core with
nohive classifier. We can provide nohive version in the future.

Because orc and hive are so close, orc still relies on some classes of hive
currently.
Apache orc with nohive classifier is for create a variant of core and
mapreduce jars that don't conflict with hive 1.x [1]

So the orc and orc-nohive have same class name, but orc-nohive
shade/relocation lots of classes, like "ColumnVector" and
"VectorizedRowBatch".
Now the flink-orc-nohive depends on flink-orc, they share lots of codes.
They can not be unified to a separate module, there will be a lot of
conflicts.

[1]https://issues.apache.org/jira/browse/ORC-174

Best,
Jingsong Lee

On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <sivaprasanna...@gmail.com>
wrote:

> Hello,
>
> I'm working on an implementation of ORC BulkWriter[1]. As of now, I have
> the entire implementation in a separate module called "flink-orc-compress"
> under "flink-formats" since I'm not entirely sure whether it should go into
> the existing ORC modules i.e flink-orc & flink-orc-nohive.
>
> So my questions are:
> 1. What's the difference between these two ORC modules?
> 2. Should the ORC BulkWriter implementation go into one of these existing
> modules? If yes, which one? Or can we keep it in a separate module to avoid
> duplicating or causing any conflicts?
>
> Note: My current implementation of ORC BulkWriter uses orc-core with nohive
> classifier as the dependency.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10114
>

-- 
Best, Jingsong Lee

Re: flink-orc or flink-orc-nohive

Reply via email to