On a similar note, I just checked that the Flink currently uses orc 1.4.3 in the dependencies. IMO, it is a little outdated. Can we bump the ORC version to a slightly newer version - maybe 1.5.x or even 1.6.0?
- Sivaprasanna On Tue, Apr 14, 2020 at 1:42 PM Jingsong Li <jingsongl...@gmail.com> wrote: > Hi, > > Maybe you should use flink-orc. And use orc-core instead of orc-core with > nohive classifier. We can provide nohive version in the future. > > Because orc and hive are so close, orc still relies on some classes of hive > currently. > Apache orc with nohive classifier is for create a variant of core and > mapreduce jars that don't conflict with hive 1.x [1] > > So the orc and orc-nohive have same class name, but orc-nohive > shade/relocation lots of classes, like "ColumnVector" and > "VectorizedRowBatch". > Now the flink-orc-nohive depends on flink-orc, they share lots of codes. > They can not be unified to a separate module, there will be a lot of > conflicts. > > [1]https://issues.apache.org/jira/browse/ORC-174 > > Best, > Jingsong Lee > > On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <sivaprasanna...@gmail.com> > wrote: > > > Hello, > > > > I'm working on an implementation of ORC BulkWriter[1]. As of now, I have > > the entire implementation in a separate module called > "flink-orc-compress" > > under "flink-formats" since I'm not entirely sure whether it should go > into > > the existing ORC modules i.e flink-orc & flink-orc-nohive. > > > > So my questions are: > > 1. What's the difference between these two ORC modules? > > 2. Should the ORC BulkWriter implementation go into one of these existing > > modules? If yes, which one? Or can we keep it in a separate module to > avoid > > duplicating or causing any conflicts? > > > > Note: My current implementation of ORC BulkWriter uses orc-core with > nohive > > classifier as the dependency. > > > > [1] https://issues.apache.org/jira/browse/FLINK-10114 > > > > > -- > Best, Jingsong Lee >