Hi Daniel et al, Thanks for bringing up this topic and the detailed status update.
I am sharing my thoughts point by point, please find them below. 1) How to get a new Kite release? Maybe we should remove the Kite > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)? I think making a new Kite release would be a huge effort as it would require upgrading the versions, making the necessary code modifications, testing it thoroughly, etc. then making the release itself meanwhile Kite is a very passively handled tool having minimal activity on it thus it would definitely mean a lot of effort to get it done. It would have a dependency on Solr community too as the Morphlines module of Kite is heavily used and somewhat actively developed by them. Also indeed there is a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely, i.e. all release efforts would become throw-away very soon. Focusing on the Kite removal seems to be more reasonable to me. However it would be great to see an estimation regarding this effort, @Szabolcs could you maybe share your thoughts on this? 2) Should we drop support for Hadoop 2? > I think we can drop support for Hadoop 2 especially if we use straightforward versioning with the new release. > 3) What version number should we use? To avoid confusion with Sqoop2 I'd go > with 3.0. > I like this idea, +1 for making a 3.0 release containing these changes. > 4) Does (should?) this affect the 1.5 release? I think the answer is yes. Currently the following breaking changes are on the horizon which could be part of a next Sqoop release: * com.cloudera package removal (done) * Gradle introduction (in progress) * Hadoop/Hive/HBase version upgrade (in progress) * Kite deprecation/removal (planned) * Bump Java version to 8 (planned ) Looking at this list I would say that making a Sqoop 1.5 release containing only the com.cloudera package removal, the Gradle introduction and the Java version bump would mean a somewhat small and irrelevant scope from a user perspective so maybe having two releases (1.5 and 3.0) would be a little bit overkill. I would instead suggest to go with a Sqoop 3.0 release containing all the changes listed above. What do you think? Summarizing it up I see the following dependencies for a next Sqoop release currently: * Finishing up the Gradle patch * Hive 3 release * Kite removal - this could be the next common effort in the community Anyhow I would be happy to take the Release Manager role for the next release, please let me know if everyone would be OK with that. I am looking forward to see others thoughts on this too. Many thanks, Bogi On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <daniel.vo...@gmail.com> wrote: > Dear All, > > After some development towards supporting Hadoop 3 (and latest version of > downstream components) I'd like to summarize the current state of the > upgrade and start the conversation about releasing a new version of Sqoop > with Hadoop 3 support. > > Here's what happened so far: > - Upgraded Hadoop dependency to 3.0.0 > - Hive had to be upgraded, since old Hive didn't work with Hadoop 3. > - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha) > - Dealt with a bunch of minor issues like changed Hadoop configuration > names and different packaging of Maven artifacts. > > For details please refer to this ticket and the attached review request: > https://issues.apache.org/jira/browse/SQOOP-3305 > > Remaining work: > - Parquet importing doesn't work. It was broken by a standalone-metastore > change in Hive and fixing would require a new Kite version to be built > against Hive 3. > - Hive 3 is going to enable ACID tables by default. We should support > importing into these. Details: > https://issues.apache.org/jira/browse/SQOOP-3311 > > Other blocking issues: > - There's no Hive 3 release (no alpha/beta) yet. > > I'd like to kindly ask you all to share any other tasks/issues you know of > that we should address to support the latest versions. Also, there are a > couple open questions: > 1) How to get a new Kite release? Maybe we should remove the Kite > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)? > 2) Should we drop support for Hadoop 2? > 3) What version number should we use? To avoid confusion with Sqoop2 I'd > go with 3.0. > 4) Does (should?) this affect the 1.5 release? > > Regards, > Daniel >