Hi All, We are currently doing the scoping of the next release(s) and no official release process has started yet.
I got JIRA admin rights to take the load off the PMC members on this front. Creating version 3.0 in the JIRA was a purely administrational thing just as was creating 1.5 before. Discussions are still ongoing about * the version(s): 1.5 and/or 3.0 * to drop/keep support for Hadoop2 No explicit decisions were made and no PMC members have expressed their concerns on these yet. Based on this thanks for your thoughts Attila, I think we are on the path to have a compromise about these questions and to make a decision that everyone can accept. Regards, Bogi On Fri, May 11, 2018 at 2:51 AM, Attila Szabó <mau...@apache.org> wrote: > Dear Sqoop community, > > Am I the only one who is missing some formal decision making, announcement > and process here? > > - When did the PMC made decision about that we're going for version 3.0 > (instead of any other version alternatives)? When and where was it > announced? How is it possible that some of the contributors know this fact > earlier then the rests of the community? > > - When did Bogi become a PMC member, and why was that fact not announced > to the community as it used to be (and still not yet visible on the project > site)? Of course this is just an assumption, but according to some PMC > chair emails back this February: JIRA administrator privileges are only PMC > members, and I guess we still follow this rule, as no other information was > announced, thus I guess the reason why Bogi was able to administrate the > available versions in the JIRA means she has been lifted (and here I'd > like to send my congrats, if this is true, and my assumption was valid). > > - When did the PMC made decision about dropping Hadoop 2 compatibility? > When and where was it announced? > > If as a community which is officially part of the ASF we do have rules, why > do we look like not following them? > > Regards, > Attila > > ps: > Objections against dropping 1.5, is clearly the fact that having 1.5 was > decided as a community, and yet we're still not sure if those changes would > be only delivered in a next major version, how they would be backported, > cherry-picked, etc. And as the majority of the users are still on 2.x, I > think we cannot force ppl to upgrade, just to being able to use some of the > originally 1.5 planned changes. > So this item is absolutely -1 on my side! > > ps2: > Daniel! I've provided some comments on your ORC Jira. > > On Thu, May 10, 2018 at 4:00 PM, Boglarka Egyed <b...@apache.org> wrote: > > > Hi All, > > > > Thank you Daniel for the update! I was also writing one when your email > > arrived so I'm just adding a couple of comments to that. > > > > New major version in JIRA: > > Version 3.0.0 has been created in JIRA > > <https://issues.apache.org/jira/projects/SQOOP/summary>, please feel > free > > to use it on the corresponding JIRAs from now. As per my previous email I > > see no point in doing an 1.5.0 release currently so I'm OK with moving > all > > the JIRAs having fix/target version of 1.5.0 to 3.0.0. Any objections? > > > > Update on the dependencies of the release: > > * Gradle patch needs some finalization and can be committed soon: > > https://reviews.apache.org/r/66067/ > > * Kite removal effort has been started: SQOOP-3313 > > <https://issues.apache.org/jira/browse/SQOOP-3313> > > * Hive 3.0.0 release is still in an early phase based on this email > thread > > <https://mail-archives.apache.org/mod_mbox/hive-dev/201804.m > > box/%3c2ec60da6-0a2e-4f3a-92f2-e3ce9d497...@hortonworks.com%3E> > > and has no ETA yet > > > > Thanks Daniel for looking into the Hadoop compatibility question, please > > let us know your findings. > > > > Cheers, > > Bogi > > > > > > > > On Thu, May 10, 2018 at 3:27 PM, Dániel Vörös <daniel.vo...@gmail.com> > > wrote: > > > > > Dear All, > > > > > > After Bogi has created the 3.0.0 version in Jira I've applied it to a > > > couple of tickets that don't make sense on the 1.x line (without > > > Hadoop3/Hive3). > > > > > > However, as Bogi has mentioned in her previous email, it probably > doesn't > > > make sense to work on a 1.5 release in parallel with 3.0.0. How would > you > > > feel if we were to move all 1.5 issues [1] to 3.0.0? > > > > > > In the meantime I've experimented with running Sqoop 1.4.7 against > Hadoop > > > 3.1.0, and I'm planning to do the opposite, running Sqoop > 3.0.0-SNAPSHOT > > > against Hadoop 2.x. That way we'd be able to better assess Attila's > > > question about backward compatibility. Please note, that the hard part > > will > > > be Hive integration I'm afraid, and until there's no Hive 3.0 release > > it's > > > hard to test. If anyone's interested in this topic, check out [2]. > > > > > > Regards, > > > Daniel > > > > > > [1] > > > https://issues.apache.org/jira/issues?jql=project%20%3D%20SQ > > > OOP%20and%20fixVersion%20%3D%201.5.0%20and%20resolutionDate% > > > 20is%20not%20%20empty%20order%20by%20resolutiondate%20desc > > > [2] https://github.com/dvoros/docker-sqoop > > > > > > On Mon, Apr 16, 2018 at 2:20 PM Szabolcs Vasas <va...@apache.org> > wrote: > > > > > > > Hi All, > > > > > > > > Sqoop NG/Sqoop 3: > > > > As far as I remember Sqoop NG was an alternative name suggested for > > > Sqoop 2 > > > > which has a totally different architecture than Sqoop 1. I would not > > use > > > > now since in this release we do not include changes affecting the > > > > architecture but bumping the versions of the dependencies. However > > since > > > > dependencies are bumped to another major releases I think we should > > also > > > > change the major version number of Sqoop. > > > > > > > > Hadoop 2 support: > > > > I agree with Daniel that we should not introduce extra complexity to > > > > support Hadoop 2 as well. However even if we don't support Hadoop 2 > in > > > our > > > > next major Sqoop release some features which do not require Hadoop 3 > > > could > > > > be backported by the vendors to their earlier releases as well. I > think > > > > introducing a 1.x branch upstream would lead to an increased > complexity > > > of > > > > committing bug fixes and I am not sure the community wants to make a > > > > release in Sqoop 1.x branch. Even if at some point somebody wants to > do > > > > this they could cut the branch and cherry-pick the necessary bug > fixes > > > > right before the release. > > > > > > > > Kite removal: > > > > I agree that this is quite complex task on its own but we can't bump > > the > > > > Hadoop/Hive/HBase dependencies without deciding what to do with Kite. > > One > > > > option is to bump these dependencies in Kite too, create a new Kite > > > release > > > > and bump Sqoop's Kite dependency to this new release. Another option > is > > > to > > > > get rid of the Kite dependency before we bump Hadoop/Hive/HBase > > version. > > > In > > > > my opinion the latter one makes more sense since we wanted to > eliminate > > > the > > > > Kite dependency anyway and the Kite project seems to be dead so > bumping > > > the > > > > dependencies, making the necessary code changes, fixing tests and > > > creating > > > > the release might be an overkill. > > > > > > > > Szabolcs > > > > > > > > On Mon, Apr 16, 2018 at 11:50 AM, Dániel Vörös < > daniel.vo...@gmail.com > > > > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > I believe we're all on the same page on removing Kite, so I've > opened > > > > > SQOOP-3313 to track that. @Attila I'm glad to see you're interest > in > > > the > > > > > ORC part. It would be highly appreciated if you could take a look > at > > > this > > > > > review request[1]. > > > > > > > > > > I'm not that familiar with Flume, but it seems they've added NG > after > > > > > architectural changes and released FlumeNG 1.0 after Flume 0.9.4 > [2]. > > > > Even > > > > > if we go with NG, I'd suggest calling it 3.0, to avoid confusion > with > > > > > earlier releases. > > > > > > > > > > I think the biggest part of keeping Hadoop 2 (and previous versions > > of > > > > > downstream projects like Hive) supported would be testing against > > > those. > > > > It > > > > > would also require at least another build profile to build against > > > them, > > > > > and probably another layer of abstraction in the code (like Hadoop > > > shims > > > > in > > > > > Hive). > > > > > Not sure about vendors, but I think they're usually not adding new > > > > features > > > > > to older release lines. In my opinion we should branch off from > > current > > > > > trunk to track the 1.x release line (where we keep supporting > Hadoop > > 2) > > > > and > > > > > keep adding bugfixes there, but add new features to trunk only and > > > don't > > > > > worry about Hadoop 2 there. > > > > > > > > > > I agree with Attila on the dependencies. We shouldn't release based > > on > > > > > non-final releases. We might bump the dependencies to some > alpha/beta > > > > > during development, but don't forget to move to the final version > in > > > the > > > > > end. > > > > > > > > > > +1 for Bogi as release manager. > > > > > > > > > > Regards, > > > > > Daniel > > > > > > > > > > [1] https://reviews.apache.org/r/66548/ > > > > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture > > > > > > > > > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <mau...@inf.elte.hu> > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > I'd like to also attach my thoughts: > > > > > > > > > > > > > > > > > > New Sqoop version: Last time when I'd the chance to talk about > this > > > > with > > > > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the > > front > > > to > > > > > > create Sqoop-NG (NG == Next Generation), quite the same what the > > > Flume > > > > > > community did (and AFAIK from Mike Percy it's been a quite > > successful > > > > act > > > > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0, > > > though > > > > > > IMHO Sqoop-NG 1.0 would be a better choice. > > > > > > > > > > > > > > > > > > Kite: I would totally split this effort into two subtasks. First > I > > > > would > > > > > > get in contact with the Parquet team, and would create a KITE > > > > independent > > > > > > execution path in Sqoop for the Parquet backed tables > > > > (Hive/Impala/etc.). > > > > > > As a part of this effort I would also add direct support for ORC > > > format > > > > > (in > > > > > > the past few years I've found it very useful in several different > > > > > > situation, and usually it's quite inconvenient that Sqoop does > not > > > > > support > > > > > > it "out of the box"). > > > > > > > > > > > > As the second substask I would start to remove every KITE based > > > > > dependency > > > > > > (but according to my gut feeling it could break the codebase on > too > > > > many > > > > > > places, and might not be that EZ to succeed on that front). > > > > > > > > > > > > > > > > > > Hadoop 2: > > > > > > > > > > > > Could anyone please highlight me what would be the pros/cons on > > this > > > > > > front? AFAIK several vendors (including Cloudera, Hortonworks, > > MapR, > > > > EMR, > > > > > > etc.) are still supporting Hadoop 2, and according to my best > > > knowledge > > > > > > most of the userbase are connected to their releases, so I'd like > > to > > > > > > provide the chance for those users to use the newest features of > > > Sqoop, > > > > > > thus I would vote for the compatibility for a bit more > > time/versions. > > > > > > > > > > > > > > > > > > Dependencies: > > > > > > > > > > > > I'd like to cast my very direct and LOUD vote against any alpha > > > > > > dependencies (including HBase or anything else!). IMHO Sqoop is > > > > already a > > > > > > stable component of the Apache Foundation, and the users can > depend > > > on > > > > > it, > > > > > > thus I'd like to avoid any kind of "immature" dependency related > > > > issues. > > > > > Of > > > > > > course this is also just my solo opinion, but as a community I > > think > > > we > > > > > > must not undermine our stability. > > > > > > > > > > > > On the other fronts I totally agree and +1 with the planned > > efforts, > > > > > > > > > > > > Best regards, > > > > > > Attila > > > > > > > > > > > > ________________________________ > > > > > > From: Szabolcs Vasas <va...@apache.org> > > > > > > Sent: Friday, April 13, 2018 3:43 PM > > > > > > To: dev@sqoop.apache.org > > > > > > Subject: Re: Release to support Hadoop 3 > > > > > > > > > > > > Hi all, > > > > > > > > > > > > I also think that completely eliminating the Kite dependency from > > > Sqoop > > > > > > would be the easiest way of going forward, I will try to analyze > > this > > > > > topic > > > > > > a bit more next week and come up with subtasks so we could work > on > > it > > > > in > > > > > > parallel potentially. > > > > > > > > > > > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being > the > > > > > release > > > > > > manager of it. > > > > > > > > > > > > Szabolcs > > > > > > > > > > > > > > > > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org > > > > > > wrote: > > > > > > > > > > > > > Hi Daniel et al, > > > > > > > > > > > > > > Thanks for bringing up this topic and the detailed status > update. > > > > > > > > > > > > > > I am sharing my thoughts point by point, please find them > below. > > > > > > > > > > > > > > 1) How to get a new Kite release? Maybe we should remove the > Kite > > > > > > > > dependency altogether (as Szabolcs hinted in comments of > > > > SQOOP-3171)? > > > > > > > > > > > > > > > > > > > > > I think making a new Kite release would be a huge effort as it > > > would > > > > > > > require upgrading the versions, making the necessary code > > > > > modifications, > > > > > > > testing it thoroughly, etc. then making the release itself > > > meanwhile > > > > > Kite > > > > > > > is a very passively handled tool having minimal activity on it > > thus > > > > it > > > > > > > would definitely mean a lot of effort to get it done. It would > > > have a > > > > > > > dependency on Solr community too as the Morphlines module of > Kite > > > is > > > > > > > heavily used and somewhat actively developed by them. Also > indeed > > > > there > > > > > > is > > > > > > > a shorter/longer term goal to get rid of Kite dependency in > Sqoop > > > > > > entirely, > > > > > > > i.e. all release efforts would become throw-away very soon. > > > > > > > > > > > > > > Focusing on the Kite removal seems to be more reasonable to me. > > > > However > > > > > > it > > > > > > > would be great to see an estimation regarding this effort, > > > @Szabolcs > > > > > > could > > > > > > > you maybe share your thoughts on this? > > > > > > > > > > > > > > 2) Should we drop support for Hadoop 2? > > > > > > > > > > > > > > > > > > > > > > I think we can drop support for Hadoop 2 especially if we use > > > > > > > straightforward versioning with the new release. > > > > > > > > > > > > > > > > > > > > > > 3) What version number should we use? To avoid confusion with > > > > Sqoop2 > > > > > > I'd > > > > > > > go > > > > > > > > with 3.0. > > > > > > > > > > > > > > > > > > > > > > I like this idea, +1 for making a 3.0 release containing these > > > > changes. > > > > > > > > > > > > > > > > > > > > > > 4) Does (should?) this affect the 1.5 release? > > > > > > > > > > > > > > > > > > > > > I think the answer is yes. Currently the following breaking > > changes > > > > are > > > > > > on > > > > > > > the horizon which could be part of a next Sqoop release: > > > > > > > * com.cloudera package removal (done) > > > > > > > * Gradle introduction (in progress) > > > > > > > * Hadoop/Hive/HBase version upgrade (in progress) > > > > > > > * Kite deprecation/removal (planned) > > > > > > > * Bump Java version to 8 (planned ) > > > > > > > > > > > > > > Looking at this list I would say that making a Sqoop 1.5 > release > > > > > > containing > > > > > > > only the com.cloudera package removal, the Gradle introduction > > and > > > > the > > > > > > Java > > > > > > > version bump would mean a somewhat small and irrelevant scope > > from > > > a > > > > > user > > > > > > > perspective so maybe having two releases (1.5 and 3.0) would > be a > > > > > little > > > > > > > bit overkill. I would instead suggest to go with a Sqoop 3.0 > > > release > > > > > > > containing all the changes listed above. What do you think? > > > > > > > > > > > > > > Summarizing it up I see the following dependencies for a next > > Sqoop > > > > > > release > > > > > > > currently: > > > > > > > * Finishing up the Gradle patch > > > > > > > * Hive 3 release > > > > > > > * Kite removal - this could be the next common effort in the > > > > community > > > > > > > > > > > > > > Anyhow I would be happy to take the Release Manager role for > the > > > next > > > > > > > release, please let me know if everyone would be OK with that. > > > > > > > > > > > > > > I am looking forward to see others thoughts on this too. > > > > > > > > > > > > > > Many thanks, > > > > > > > Bogi > > > > > > > > > > > > > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös < > > > > daniel.vo...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Dear All, > > > > > > > > > > > > > > > > After some development towards supporting Hadoop 3 (and > latest > > > > > version > > > > > > of > > > > > > > > downstream components) I'd like to summarize the current > state > > of > > > > the > > > > > > > > upgrade and start the conversation about releasing a new > > version > > > of > > > > > > Sqoop > > > > > > > > with Hadoop 3 support. > > > > > > > > > > > > > > > > Here's what happened so far: > > > > > > > > - Upgraded Hadoop dependency to 3.0.0 > > > > > > > > - Hive had to be upgraded, since old Hive didn't work with > > > Hadoop > > > > 3. > > > > > > > > - HBase had to be upgraded since Hive 3 depends on HBase > > > 2(alpha) > > > > > > > > - Dealt with a bunch of minor issues like changed Hadoop > > > > > configuration > > > > > > > > names and different packaging of Maven artifacts. > > > > > > > > > > > > > > > > For details please refer to this ticket and the attached > review > > > > > > request: > > > > > > > > https://issues.apache.org/jira/browse/SQOOP-3305 > > > > > > > > > > > > > > > > Remaining work: > > > > > > > > - Parquet importing doesn't work. It was broken by a > > > > > > > standalone-metastore > > > > > > > > change in Hive and fixing would require a new Kite version to > > be > > > > > built > > > > > > > > against Hive 3. > > > > > > > > - Hive 3 is going to enable ACID tables by default. We > should > > > > > support > > > > > > > > importing into these. Details: > > > > > > > > https://issues.apache.org/jira/browse/SQOOP-3311 > > > > > > > > > > > > > > > > Other blocking issues: > > > > > > > > - There's no Hive 3 release (no alpha/beta) yet. > > > > > > > > > > > > > > > > I'd like to kindly ask you all to share any other > tasks/issues > > > you > > > > > know > > > > > > > of > > > > > > > > that we should address to support the latest versions. Also, > > > there > > > > > are > > > > > > a > > > > > > > > couple open questions: > > > > > > > > 1) How to get a new Kite release? Maybe we should remove the > > > Kite > > > > > > > > dependency altogether (as Szabolcs hinted in comments of > > > > SQOOP-3171)? > > > > > > > > 2) Should we drop support for Hadoop 2? > > > > > > > > 3) What version number should we use? To avoid confusion > with > > > > Sqoop2 > > > > > > I'd > > > > > > > > go with 3.0. > > > > > > > > 4) Does (should?) this affect the 1.5 release? > > > > > > > > > > > > > > > > Regards, > > > > > > > > Daniel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >