Attila, I think improving build infrastructure for Sqoop makes a lot of sense. I think focusing on build system and tests works. Please be cognizant of the following:
- Docs for Sqoop1 and Sqoop2 are built in different ways. It might be interesting to update how docs are built in Sqoop1. - Apache Rat and Cobertura are being used in Sqoop1 I believe. Please make sure these aren't left out. Also, would the build system change or stay the same? c.f. https://builds.apache.org/job/Sqoop-hadoop100/1036/search/?q=Sqoop. Abe On Wed, Nov 30, 2016 at 12:27 PM Attila Szabo <mau...@apache.org> wrote: > Dear PMC, dear community, > > In the past few months some of us has been already identified that there > would be a need to improve the test coverage, cleanup the CI system, make > the build system much more straightforward for the trunk version of Sqoop. > > I think our goals are very simple here: > > - It's very difficult to onboard new committers for the community if the > component itself is very difficult to test/learn to test, and the > CI/JIRA > system sends out false failures after each commit. > - It would be also nice to leverage again from the safety belt of CI, > and not push all of the testing burden on the shoulder of the committer > alone (of course the contributors has to do tests, but as said it's not > straightforward now to execute all tests, and also it's still the > committers responsibility to ensure nothing is broken after a commit, > thus > every safety belt there is needed). > > > It's been also identified that it would be good to release a new version of > trunk, thus including the improvements of the past 8-9 months, and also > show the livingness of the component for the outside of the world. > > Most probably because of lack of time we didn't achieve all of these goals > in the past few months. Thus right now I'd like to take the initiative, and > share my thoughts and goals with you, trying to push forward the above > mentioned things, but of course before I'd like to collect the invaluable > input and wisdom of the community. > > The plan would be the following: > > - A few weeks ago I've opened three JIRAs (depending on each other > linearly) upstream (SQOOP-3050, SQOOP-3051 and SQOOP-3052). > - SQOOP-3050 is about fix the current test cases (mainly failing because > of configuration issues) > - SQOOP-3051 is about cleaning up the build.xml from the obsolete > profiles (e.g. currently 20,23,100 are failing with tons of test cases, > and > even 200 is not able to correctly run HCatalog tests for example > because of > incompatible class changes). > - SQOOP-3052 is about to create a new and more robust build system for > Sqoop (e.g. Maven/Gradle, maybe both) > - Anna Szonyi (a quite new contributor, but quite active in the past 1 > month) has jumped on SQOOP-3050 and it seems she's finished with those > efforts (so all unit+third_party test cases are running with the > recently > created hadoop260 profile). > - AFAIK Anna is also willing to jump on SQOOP-3051, which would be about > cleanup the old profiles from the build.xml and ivysettings, and keep > only > one profile which is capable for compiling/packaging a correct version > of > Sqoop against we can run all of the available test cases. > - In connection with the goals SQOOP-3052 I do remember that Sowmya and > Venkat has identified their willingness to do that. If they currently > have > time to do that (after SQOOP-3051 is committed which is about to happen > by > the end of the week) I would be very glad to see that achievement > coming by > their contribution. If by any reason they would not have time for that > or > not interested any more, I'm also willing to find volunteers to do that. > - Personally I would be very eager to drive through these build related > changes through the CI system + JIRA + JIRA bots + etc. > - On the top of these things I'd like to also make one another thing > happen before the next release, and that would be about to create a > proper > quoting support for MySQL and PostgreSQL as well. If I'm not mistaken > the > Oracle quoting was originated from Sowmya, so it would be also a good > candidate for her to design+create a generalized and centralized quoting > mechanism for Sqoop, and have 2-3 different strategies/db (e.g. > Oracle/MySQL/PostgreSQL, etc.). I would be happy to provide my help (by > creating the related JIRA tasks, and sharing my insights, and coaching) > to > her, or to anyone willing to do this feature if Sowmya wouldn't have > time > right now to do that. > - If we're aiming for 1.4.7 I think this scope should be enough. > - If we're aiming for 1.5. I think we should also include the > elimination of the com.cloudera classes+packages. > - I would be also volunteering to drive+own the release and the release > process. > > What do you think about this plan? Could you please share your thoughts? > > Many thanks in advance, > Attila >