Having posted about our success with testing Alpha 7, I need to ‘fess up about the end result of all this palaver from July:
With a bit of instrumentation in our build pipeline it turned out that we were thrashing on heap space in a few places and, for whatever reason, that was causing all the ClassNotFoundException problems. Sigh. I bumped the heap size up in the JVM options for Boot and everything ran flawlessly… …the hint that pushed me in this direction was that, even on Clojure 1.9.0, we _occasionally_ saw problems loading instant18 (which I’d assumed was a weird edge case in Boot’s async pod setup/refresh)… and that started cropping up more often, and then we started seeing the same CFNE problems with 1.9.0 and that WebDriver test when run in a long Boot pipeline… …so clearly the issue WASN’T the new Alpha build: that just happened to push us nearer the heap/GC issue. Sean Corfield -- (970) FOR-SEAN -- (904) 302-SEAN An Architect's View -- http://corfield.org/ "If you're not annoying somebody, you're not really alive." -- Margaret Atwood From: Sean Corfield <s...@corfield.org> Sent: Saturday, July 21, 2018 8:22:21 PM To: clojure@googlegroups.com Subject: RE: [ANN] Clojure 1.10.0-alpha5 Things tried so far: * Update JDK to 1.8.0_144 (to match one of our other environments). No effect on the problem (well, I haven’t seem any JVM crashes since!). * Update Boot to 2.8.1. No effect. Ghadi wondered if Boot’s class loading might have changed since 2.7.2 and that might be a root cause. * Change our multi-version testing so we’re only loading Clojure 1.10.0-alpha6 (we were trying both alpha 4 and alpha 6 before). No effect. I wondered if loading Clojure based on two different versions of ASM (even isolated via pods) might cause issues. We can reliably run the Boot task on its own – clj-webdriver loads and all our tests pass as expected. All of our tasks combined up until this point run just fine. All tasks individually run just fine. The only time we see a failure is when we combine the WebDriver-based test task with all the other tasks – and it reliably fails to load some class (usually a clj-webdriver class but on one run it failed to load clojure.tools.logging.impl.Logger) originating from Boot’s pod refresh, preparing to load clj-webdriver, as far as I can tell. I haven’t been able to produce a smaller combination of tasks that exhibits the problem. I will observe that I have _occasionally_ seen CNFEs during pod refresh in the past, when Components are being stopped (and unloaded?) asynchronously, so it may be that this is an intermittent/non-deterministic bug in Boot pods that is exacerbated with Clojure 1.10.0-alpha5 and later? For now we’re staying on Clojure 1.9.0 (and running just our unit tests against both that and master-SNAPSHOT). As far as we can tell, all our apps run fine on Clojure 1.10.0-alpha6 so this seems to just impact our Boot-based build toolchain and we can probably work around that, so we’ll probably switch to 1.10 at some point (before release) so we can test it in production and provide feedback. Sean Corfield -- (970) FOR-SEAN -- (904) 302-SEAN An Architect's View -- http://corfield.org/ "If you're not annoying somebody, you're not really alive." -- Margaret Atwood From: Sean Corfield <s...@corfield.org> Sent: Thursday, July 19, 2018 4:55:47 PM To: clojure@googlegroups.com Subject: RE: [ANN] Clojure 1.10.0-alpha5 Progress so far… with Alpha 6. I’ve encountered a number of (random) JVM SEGV fatal errors running our Boot build pipeline and if I don’t hit any of those, I hit an exception like this fairly reliably: https://gist.github.com/seancorfield/f29bdb948a2a533c14a07ff6ffd6548a (the missing class seems to vary from run to run but the exception is always in the same place) It _seems_ to be an interaction between something new in Alpha 5 and Boot’s pod machinery since all of the failures I’m encountering seem to have when Boot is attempting to refresh pods in its pool of pods. We rely heavily on pods to isolate various parts of our build pipeline (since we load in different sets of dependencies for different sets of tests). Ghadi suggested I update my local JDK to a more recent version and try again so that’s next on my list. Narrowing this down is going to be hard: if I run the build pipeline in separate “chunks” – which means less interaction between pods – each chunk always passes. So, overall, no failures from our test suite itself for any of our application components (good). Just random failures within the build tool itself ☹ Sean Corfield -- (970) FOR-SEAN -- (904) 302-SEAN An Architect's View -- http://corfield.org/ "If you're not annoying somebody, you're not really alive." -- Margaret Atwood From: Sean Corfield <s...@corfield.org> Sent: Thursday, July 19, 2018 1:08:33 PM To: clojure@googlegroups.com Subject: RE: [ANN] Clojure 1.10.0-alpha5 Yes, which allowed us to actually _try_ to run our build pipeline – so the problems we’re seeing are fallout from the big changes in Alpha 5… I just haven’t nailed them down yet 😊 Everything works fine on Alpha 4. Sean Corfield -- (970) FOR-SEAN -- (904) 302-SEAN An Architect's View -- http://corfield.org/ "If you're not annoying somebody, you're not really alive." -- Margaret Atwood From: clojure@googlegroups.com <clojure@googlegroups.com> on behalf of Alex Miller <a...@puredanger.com> Sent: Wednesday, July 18, 2018 10:50:59 AM To: Clojure Subject: RE: [ANN] Clojure 1.10.0-alpha5 The only change in alpha6 was the asm fix (your patch!).... :) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.