Hi all, I also think that completely eliminating the Kite dependency from Sqoop would be the easiest way of going forward, I will try to analyze this topic a bit more next week and come up with subtasks so we could work on it in parallel potentially.
I am happy with the Sqoop 3.0 scope proposal too and Bogi being the release manager of it. Szabolcs On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org> wrote: > Hi Daniel et al, > > Thanks for bringing up this topic and the detailed status update. > > I am sharing my thoughts point by point, please find them below. > > 1) How to get a new Kite release? Maybe we should remove the Kite > > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)? > > > I think making a new Kite release would be a huge effort as it would > require upgrading the versions, making the necessary code modifications, > testing it thoroughly, etc. then making the release itself meanwhile Kite > is a very passively handled tool having minimal activity on it thus it > would definitely mean a lot of effort to get it done. It would have a > dependency on Solr community too as the Morphlines module of Kite is > heavily used and somewhat actively developed by them. Also indeed there is > a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely, > i.e. all release efforts would become throw-away very soon. > > Focusing on the Kite removal seems to be more reasonable to me. However it > would be great to see an estimation regarding this effort, @Szabolcs could > you maybe share your thoughts on this? > > 2) Should we drop support for Hadoop 2? > > > > I think we can drop support for Hadoop 2 especially if we use > straightforward versioning with the new release. > > > > 3) What version number should we use? To avoid confusion with Sqoop2 I'd > go > > with 3.0. > > > > I like this idea, +1 for making a 3.0 release containing these changes. > > > > 4) Does (should?) this affect the 1.5 release? > > > I think the answer is yes. Currently the following breaking changes are on > the horizon which could be part of a next Sqoop release: > * com.cloudera package removal (done) > * Gradle introduction (in progress) > * Hadoop/Hive/HBase version upgrade (in progress) > * Kite deprecation/removal (planned) > * Bump Java version to 8 (planned ) > > Looking at this list I would say that making a Sqoop 1.5 release containing > only the com.cloudera package removal, the Gradle introduction and the Java > version bump would mean a somewhat small and irrelevant scope from a user > perspective so maybe having two releases (1.5 and 3.0) would be a little > bit overkill. I would instead suggest to go with a Sqoop 3.0 release > containing all the changes listed above. What do you think? > > Summarizing it up I see the following dependencies for a next Sqoop release > currently: > * Finishing up the Gradle patch > * Hive 3 release > * Kite removal - this could be the next common effort in the community > > Anyhow I would be happy to take the Release Manager role for the next > release, please let me know if everyone would be OK with that. > > I am looking forward to see others thoughts on this too. > > Many thanks, > Bogi > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <daniel.vo...@gmail.com> > wrote: > > > Dear All, > > > > After some development towards supporting Hadoop 3 (and latest version of > > downstream components) I'd like to summarize the current state of the > > upgrade and start the conversation about releasing a new version of Sqoop > > with Hadoop 3 support. > > > > Here's what happened so far: > > - Upgraded Hadoop dependency to 3.0.0 > > - Hive had to be upgraded, since old Hive didn't work with Hadoop 3. > > - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha) > > - Dealt with a bunch of minor issues like changed Hadoop configuration > > names and different packaging of Maven artifacts. > > > > For details please refer to this ticket and the attached review request: > > https://issues.apache.org/jira/browse/SQOOP-3305 > > > > Remaining work: > > - Parquet importing doesn't work. It was broken by a > standalone-metastore > > change in Hive and fixing would require a new Kite version to be built > > against Hive 3. > > - Hive 3 is going to enable ACID tables by default. We should support > > importing into these. Details: > > https://issues.apache.org/jira/browse/SQOOP-3311 > > > > Other blocking issues: > > - There's no Hive 3 release (no alpha/beta) yet. > > > > I'd like to kindly ask you all to share any other tasks/issues you know > of > > that we should address to support the latest versions. Also, there are a > > couple open questions: > > 1) How to get a new Kite release? Maybe we should remove the Kite > > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)? > > 2) Should we drop support for Hadoop 2? > > 3) What version number should we use? To avoid confusion with Sqoop2 I'd > > go with 3.0. > > 4) Does (should?) this affect the 1.5 release? > > > > Regards, > > Daniel > > >