a) Multi-module patches are always troublesome because it makes the test system do significantly more work. For Yetus, we've pared it down as far as we can go to get *some* speed increases, but if a patch does something like hit every pom.xml file, there's nothing that can be done to make it better other than splitting up the patch.
b) It's worth noting that it happens more often to HDFS patches because HDFS unit tests take too damn long. Some individual tests take 10 minutes! They invariably collide with the various full builds (NOT pre commit! Those other things that Steve pointed out that we're ignoring). While Yetus has support for running unit tests in parallel, Hadoop does not. c) mvn install is pretty much required for a not insignificant amount of multi-module patches, esp if they hit hadoop-common. For a large chunk of "oh just make it one patch", it's effectively a death sentence on the Jenkins side. d) I'm a big fan of d. e) File a bug against Yetus and we'll add the ability to set ant/gradle/maven args from the command line. I thought I had it in there when I rewrote the support for multiple build tools, gradle, etc, but I clearly dropped it on the floor. f) Any time you "give the option to the patch submitter", you generate a not insignificant amount of work on the test infrastructure to determine intent because it effectively means implementing some parsing of a comment. It's not particularly easy because humans rarely follow the rules. Just see how well we are at following the Hadoop Compatibility Guidelines. Har har. No really: people still struggle with filling in JIRA headers correctly and naming patches to trigger the appropriate branch for the test. g) It's worth noting that Hadoop trunk is *not* using the latest test-patch code. So there are some significant improvements on the way as soon as we get a release out the door. On Sep 18, 2015, at 7:56 PM, Ming Ma <min...@twitter.com.INVALID> wrote: > The increase of frequency might have been due to the refactor of > hadoop-hdfs-client-*.jar > out of the main > hadoop-hdfs-*.jar. I don't have the oveall metrics of how often this > happens when anyone changes protobuf. But based on HDFS-9004, 4 of 5 runs > have this issue, which is a lot for any patch that changes APIs. This isn't > limited to HDFS. There are cases YARN API changes causing MR unit tests to > fail. > > So far, the work around I use is to keep resubmitting the build until it > succeed. Another approach we can consider is to provide an option for the > patch submitter to use its local repo when it submits the patch. In that > way, the majority of patches can still use the shared local repo. > > On Fri, Sep 18, 2015 at 3:14 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > >> Okay, some browsing of Jenkins docs [1] says that we could key the >> maven.repo.local off of $EXECUTOR_NUMBER to do per-executor repos like >> Bernd recommended, but that still requires some hook into test-patch.sh. >> >> Regarding install, I thought all we needed to install was >> hadoop-maven-plugins, but we do more than that now in test-patch.sh. Not >> sure if we can reduce that. >> >> [1] >> >> https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables >> >> On Fri, Sep 18, 2015 at 2:42 PM, Allen Wittenauer <a...@altiscale.com> >> wrote: >> >>> >>> The collisions have been happening for about a year now. The frequency >>> is increasing, but not enough to be particularly worrisome. (So I'm >>> slightly amused that one blowing up is suddenly a major freakout.) >>> >>> Making changes to the configuration without knowing what one is doing is >>> probably a bad idea. For example, if people are removing the shared >> cache, >>> I hope they're also prepared for the bitching that is going to go with >> the >>> extremely significant slow down caused by downloading the java prereqs >> for >>> building for every test... >>> >>> As far as Yetus goes, we've got a JIRA open to provide for per-instance >>> caches when using the docker container code. I've got it in my head how I >>> think we can do it, but just haven't had a chance to code it. So once >> that >>> gets written up + turning on containers should make the problem go away >>> without any significant impact on test time. Of course, that won't help >>> the scheduled builds but those happen at an even smaller rate. >>> >>> >>> On Sep 18, 2015, at 12:19 PM, Andrew Wang <andrew.w...@cloudera.com> >>> wrote: >>> >>>> Sangjin, you should have access to the precommit jobs if you log in >> with >>>> your Apache credentials, even as a branch committer. >>>> >>>> https://builds.apache.org/job/PreCommit-HDFS-Build/configure >>>> >>>> The actual maven invocation is managed by test-patch.sh though. >>>> test-patch.sh has a MAVEN_ARGS which looks like what we want, but I >> don't >>>> think we can just set it before calling test-patch, since it'd get >>> squashed >>>> by setup_defaults. >>>> >>>> Allen/Chris/Yetus folks, any guidance here? >>>> >>>> Thanks, >>>> Andrew >>>> >>>> On Fri, Sep 18, 2015 at 11:55 AM, <e...@zusammenkunft.net> wrote: >>>> >>>>> You can use one per build processor, that reduces concurrent updates >> but >>>>> still keeps the cache function. And then try to avoid using install. >>>>> >>>>> -- >>>>> http://bernd.eckenfels.net >>>>> >>>>> -----Original Message----- >>>>> From: Andrew Wang <andrew.w...@cloudera.com> >>>>> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> >>>>> Cc: Andrew Bayer <andrew.ba...@gmail.com>, Sangjin Lee < >>> sj...@twitter.com>, >>>>> Lei Xu <l...@cloudera.com>, infrastruct...@apache.org >>>>> Sent: Fr., 18 Sep. 2015 20:42 >>>>> Subject: Re: Local repo sharing for maven builds >>>>> >>>>> I think each job should use a maven.repo.local within its workspace >> like >>>>> abayer said. This means lots of downloading, but it's isolated. >>>>> >>>>> If we care about download time, we could also bootstrap with a tarred >>>>> .m2/repository after we've run a `mvn compile`, so before it installs >>> the >>>>> hadoop artifacts. >>>>> >>>>> On Fri, Sep 18, 2015 at 11:02 AM, Ming Ma <min...@twitter.com.invalid >>> >>>>> wrote: >>>>> >>>>>> +hadoop common dev. Any suggestions? >>>>>> >>>>>> >>>>>> On Fri, Sep 18, 2015 at 10:41 AM, Andrew Bayer < >> andrew.ba...@gmail.com >>>> >>>>>> wrote: >>>>>> >>>>>>> You can change your maven call to use a different repository - I >>>>> believe >>>>>>> you do that with -Dmaven.repository.local=path/to/repo >>>>>>> On Sep 18, 2015 19:39, "Ming Ma" <min...@twitter.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We are seeing some strange behaviors in HDFS precommit build. It >>> seems >>>>>>>> like it is caused by the local repo on the same machine being used >> by >>>>>>>> different concurrent jobs which can cause issues. >>>>>>>> >>>>>>>> In HDFS, the build and test of "hadoop-hdfs-project/hdfs" depend on >>>>>>>> "hadoop-hdfs-project/hdfs-client"'s hadoop-hdfs-client-3.0.0- >>>>>>>> SNAPSHOT.jar. HDFS-9004 adds some new method to >>>>>> hadoop-hdfs-client-3.0.0-SNAPSHOT.jar. >>>>>>>> In the precommit build for HDFS-9004, unit tests for >>>>>> "hadoop-hdfs-project/hdfs" >>>>>>>> complain the method isn't defined >>>>>>>> >> https://builds.apache.org/job/PreCommit-HDFS-Build/12522/testReport/ >>> . >>>>>>>> Interestingly sometimes it just works fine >>>>>>>> >> https://builds.apache.org/job/PreCommit-HDFS-Build/12507/testReport/ >>> . >>>>>>>> >>>>>>>> So we are suspecting that there is another job running at the same >>>>> time >>>>>>>> that published different version of >>>>>> hadoop-hdfs-client-3.0.0-SNAPSHOT.jar >>>>>>>> which doesn't have the new methods defined to the local repo which >> is >>>>>>>> shared by all jobs on that machine. >>>>>>>> >>>>>>>> If the above analysis is correct, what is the best way to fix the >>>>> issue >>>>>>>> so that different jobs can use their own maven local repo for build >>>>> and >>>>>>>> test? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Ming >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> >>