On 4/10/15, 8:05 PM, "Xuefu Zhang" <xzh...@cloudera.com> wrote:
>To Owen's explanation - Thanks. I guess my major concern is that we >seemingly are breaking apart Hive's integrity and making it hard to >release >and maintain due to increasing number of external dependents. Let's say >that Hive depends on a certain version of ORC (as TLP) and it's found that >ORC has a bug that seriously impacts Hive users. We cannot release Hive as >fast as we can, since dong so would need ORC community to fix the problem >and make a release, for which Hive PMC has no control. On the contrary, >Hive community can quickly fix the problem and make a release without >waiting for other projects to make a release. I'm not sure this move (ORC >as TLP) will be beneficial to vast Hive users. You need to understand exactly what this brings about for Hive, in fact to those who do not use ORC today. With the proposed changes, competing formats like Parquet might be able to compete with ORC in terms of hive features. That is the direct impact of standardization of a Storage-API implementation. As an independent project, new ORC features cannot use the fact that it is included in the ql/ source to introduce circular dependencies between ql.exec -> orc -> ql.exec.vector classes. As far as your concern for risks go, I would ask for a comparison against the bugs/release cycles of ³STORED AS PARQUET². As a Hive contributor, I¹m certain that if I find a core issue in Parquet, my patches would be welcome there. That should be beneficial to the Parquet community, but might not be aligned entirely along employer lines, since my patch might be good, but my intention would be to migrating warehouses with parquet.hive.DeprecatedParquetInputFormat Impala tables to Hive. Resolving that conflict should be ideally left to the Parquet IPMC & the ASF rather than the Hive PMC (or let¹s do a bias check *to* Hive?). Now - reverse that argument and replay it, except instead we¹re talking about the C++ ORC reader plus a non-ASF SQL competitor to Hive. >If this not convincing, let me propose that we spin off metastore also as >TLP tomorrow! http://incubator.apache.org/projects/hcatalog.html Cheers, Gopal