TestNegativeCliDriver failures are concerning. We are practically having no negative test coverage since a last couple of weeks. I think we should treat it as a blocker for Hive 3.0.0 release. Thoughts?
On Thu, Apr 5, 2018 at 9:50 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > It may be because of legitimate memory issue/leak. Alternative is to > decrease the batch size of negative cli driver on ptest cluster but again > we wouldn't know if there is any actual memory issue. > > Thanks > Prasanth > > > > On Thu, Apr 5, 2018 at 1:36 PM -0700, "Vineet Garg" <vg...@hortonworks.com > <mailto:vg...@hortonworks.com>> wrote: > > > TestNegativeCli driver tests are still failing with > java.lang.OutOfMemoryError: GC overhead limit exceeded error. > Can we increase the amount of memory for tests? > > Vineet > > On Mar 5, 2018, at 11:35 AM, Sergey Shelukhin > wrote: > > On a semi-related note, I noticed recently that negative tests seem to OOM > in setup from time to time. > Can we increase the amount of memory for the tests a little bit, and/or > maybe add the dump on OOM to them, saved to test logs directory, so we > could investigate? > > On 18/3/5, 11:07, "Vineet Garg" > wrote: > > +1 for nightly build. We could generate reports to identify both frequent > and sporadic test failures plus other interesting bits like average build > time, yetus failures etc. It'll also help narrow down the culprit > commit(s) range to one day. > If you guys decide to go ahead with this I would like to help. > > Vineet > > On Mar 5, 2018, at 8:50 AM, Sahil Takiar > wrote: > > Wow that HBase UI looks super useful. +1 to having something like that. > > If not, +1 to having a proper nightly build, it would help devs identify > which commits break which tests. I find using git-bisect can take a long > time to run, and can be difficult to use (e.g. finding a known good > commit > isn't always easy). > > On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary > wrote: > > Without a nightly build and with this many flaky tests it is very hard > to > identify the braking commits. We can use something like bisect and > multiple > test runs. > > There is a more elegant way to do this with nightly test runs: > https://issues.apache.org/jira/browse/HBASE-15917 < > https://issues.apache.org/jira/browse/HBASE-15917> > https://builds.apache.org/job/HBASE-Find-Flaky-Tests/ > lastSuccessfulBuild/artifact/dashboard.html > > This also helps to identify the flaky tests, and creates a continuos, > updated list of them. > > On Feb 23, 2018, at 6:55 PM, Sahil Takiar > wrote: > > +1 > > Does anyone have suggestions about how to efficiently identify which > commit > is breaking a test? Is it just git-bisect or is there an easier way? > Hive > QA isn't always that helpful, it will say a test is failing for the > past > "x" builds, but that doesn't help much since Hive QA isn't a nightly > build. > > On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar < > vih...@cloudera.com> > wrote: > > +1 > Commenting on JIRA and giving a 24hr heads-up (excluding weekends) > would be > good. > > On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates > wrote: > > +1. > > Alan. > > On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair > wrote: > > +1 > I agree, this makes sense. The number of failures keeps increasing. > A 24 hour heads up in either case before revert would be good. > > > On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary > wrote: > > I agree with Zoltan. The continuously braking tests make it very > hard > to > spot real issues. > Any thoughts on doing it automatically? > > On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich > wrote: > > * > > Hello, > > * > * > > ** > > In the last couple weeks the number of broken tests have started > to > go > up...and even tho I run bisect/etc from time to time ; sometimes > people > don't react to my comments/tickets/etc. > > Because keeping this many failing tests makes it easier for a new > one > to > slip in...I think reverting the patch introducing the test > failures > would > also help in some case. > > I think it would help a lot to prevent further test breaks to > revert > the > patch if any of the following conditions is met: > > * > * > > C1) if the notification/comment about the fact that the patch > indeed > broken a test somehow have been unanswered for at least 24 hours. > > C2) if the patch is in for 7 days; but the test failure is still > not > addressed (note that in this case there might be a conversation > about > fixing it...but in this case ; to enable other people to work in a > cleaner > environment is more important than a single patch - and if it > can't > be > fixed in 7 days...well it might not get fixed in a month). > > * > * > > I would like to also note that I've seen a few tickets which have > been > picked up by people who were not involved in creating the original > change - > and although the intention was good, they might miss the context > of > the > original patch and may "fix" the tests in the wrong way: accept a > q.out > which is inappropriate or ignore the test... > > * > * > > would it be ok to implement this from now on? because it makes my > efforts practically useless if people are not reacting... > > * > * > > note: just to be on the same page - this is only about running a > single > test which falls on its own - I feel that flaky tests are an > entirely > different topic. > > * > * > > cheers, > > Zoltan > > ** > * > > > > > > > > > -- > Sahil Takiar > Software Engineer > takiar.sa...@gmail.com | (510) 673-0309 > > > > > -- > Sahil Takiar > Software Engineer > takiar.sa...@gmail.com | (510) 673-0309 > > > > >