Hi, Maybe you should use flink-orc. And use orc-core instead of orc-core with nohive classifier. We can provide nohive version in the future.
Because orc and hive are so close, orc still relies on some classes of hive currently. Apache orc with nohive classifier is for create a variant of core and mapreduce jars that don't conflict with hive 1.x [1] So the orc and orc-nohive have same class name, but orc-nohive shade/relocation lots of classes, like "ColumnVector" and "VectorizedRowBatch". Now the flink-orc-nohive depends on flink-orc, they share lots of codes. They can not be unified to a separate module, there will be a lot of conflicts. [1]https://issues.apache.org/jira/browse/ORC-174 Best, Jingsong Lee On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <sivaprasanna...@gmail.com> wrote: > Hello, > > I'm working on an implementation of ORC BulkWriter[1]. As of now, I have > the entire implementation in a separate module called "flink-orc-compress" > under "flink-formats" since I'm not entirely sure whether it should go into > the existing ORC modules i.e flink-orc & flink-orc-nohive. > > So my questions are: > 1. What's the difference between these two ORC modules? > 2. Should the ORC BulkWriter implementation go into one of these existing > modules? If yes, which one? Or can we keep it in a separate module to avoid > duplicating or causing any conflicts? > > Note: My current implementation of ORC BulkWriter uses orc-core with nohive > classifier as the dependency. > > [1] https://issues.apache.org/jira/browse/FLINK-10114 > -- Best, Jingsong Lee