Summary of IRC Meeting in #aurora at Mon Oct 27 18:02:11 2014: Attendees: davmclau, wickman, jcohen, wfarner, Yasumoto, kts, mkhutornenko, zmanji, dlester
- Preface - 0.6.0 release - Client bug bash - Mesos eggs for more platforms - Action: kts to reach out to mesos dev list about providing more eggs - Client stack trace logging - Review bot - All Things Hadoop podcast Aurora episode - CI builds - Action: davmaclau to investigate contributing to pex IRC log follows: ## Preface ## [Mon Oct 27 18:02:26 2014] <wfarner>: welcome, folks. kicking off the weekly community meeting [Mon Oct 27 18:02:36 2014] <wfarner>: Let's start with roll call [Mon Oct 27 18:02:38 2014] <wfarner>: here [Mon Oct 27 18:02:39 2014] <jcohen>: here [Mon Oct 27 18:03:18 2014] <dlester>: present [Mon Oct 27 18:03:24 2014] <mkhutornenko>: here [Mon Oct 27 18:03:28 2014] <wfarner>: while we give that a few minutes, i'd like to restructure this a bit by gathering topics at the beginning [Mon Oct 27 18:04:06 2014] <Yasumoto>: howdy howdy [Mon Oct 27 18:04:06 2014] <wfarner>: topics i would like to discuss: 0.6.0 release, new review bot, client bug bash, and the 'all things hadoop' podcast about Aurora [Mon Oct 27 18:04:19 2014] <wfarner>: please offer up any other topics now [Mon Oct 27 18:04:20 2014] <jcohen>: Iâd like to discuss whether itâs worthwhile to provide mesos eggs for platforms other than used by the vagrant image [Mon Oct 27 18:04:28 2014] <wfarner>: jcohen: added [Mon Oct 27 18:04:49 2014] <Yasumoto>: Can we discuss reverting the client logging to a file on stack-trace? [Mon Oct 27 18:04:56 2014] <wfarner>: Yasumoto: added [Mon Oct 27 18:05:44 2014] <wickman>: here [Mon Oct 27 18:06:22 2014] <wfarner>: any other topics? [Mon Oct 27 18:07:43 2014] <wfarner>: if you come up with any as we proceed, feel free to PM me ## 0.6.0 release ## [Mon Oct 27 18:07:48 2014] <wfarner>: AURORA-711 [Mon Oct 27 18:08:34 2014] <wfarner>: We've finally cleared out all the planned work for 0.6.0, i'll begin cutting the release today and if all goes well, i will kick off a vote by EOD [Mon Oct 27 18:09:01 2014] <jcohen>: great :) [Mon Oct 27 18:09:07 2014] <wfarner>: Once the vote has started, please help out by running the build through the courses so we can flush out any issues early. [Mon Oct 27 18:10:05 2014] <wfarner>: this is a good segue to the next topic... ## Client bug bash ## [Mon Oct 27 18:10:35 2014] <wfarner>: to avoid scope creep of 0.6.0, we removed a bunch of client-related work [Mon Oct 27 18:11:25 2014] <wfarner>: to catch up on this, we have committed to a client bug-fix sprint at twitter [Mon Oct 27 18:11:41 2014] <wfarner>: you can see the somewhat-prioritized backlog here: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=37&view=planning&quickFilter=156 [Mon Oct 27 18:12:36 2014] <wfarner>: we've made an effort to surface issues actively causing pain in the client, which we intend to pave the way for an 0.7.0 release [Mon Oct 27 18:13:19 2014] <wfarner>: 0.7.0 is shaping up to primarily be removal of many minor deprecated features, and deprecation of the 'v1' client ## Mesos eggs for more platforms ## [Mon Oct 27 18:13:58 2014] <wfarner>: jcohen: the floor is yours [Mon Oct 27 18:14:40 2014] <jcohen>: So, folks trying to install Aurora on other platforms are running into missing eggs it seems [Mon Oct 27 18:15:02 2014] <jcohen>: Iâm wondering if it makes sense for us to commit to publication of a known set of eggs to make life easier [Mon Oct 27 18:15:34 2014] <jcohen>: In theory this shouldnât be our responsibility (Iâd imagine Mesos themselves would want to do this)? [Mon Oct 27 18:15:49 2014] <kts>: +1, but for a different reason [Mon Oct 27 18:15:51 2014] <jcohen>: But if the need to build an egg is a blocker for people trying out Aurora itâs probably in our best interest? [Mon Oct 27 18:15:59 2014] <kts>: I'd like to start running CI against multiple platforms [Mon Oct 27 18:16:05 2014] <jcohen>: thatâd be great as well [Mon Oct 27 18:16:43 2014] <wfarner>: kts: do you have any plans for how to accomplish that? [Mon Oct 27 18:17:16 2014] <kts>: None yet, ideally we'd have a CI environment that provided root in a container of the target OS [Mon Oct 27 18:17:21 2014] <mkhutornenko>: It's a bit unusual to build eggs ourselves though. I'd expect Mesos to be a more logical owner of that process. [Mon Oct 27 18:17:55 2014] <jcohen>: Could we provide multiple vagrant configs with different base boxes? [Mon Oct 27 18:18:09 2014] <kts>: jcohen: that's what the make-mesos-eggs.sh script does [Mon Oct 27 18:18:14 2014] <dlester>: has anyone brought this up on the Mesos mailing list? [Mon Oct 27 18:18:17 2014] <kts>: at least to build the eggs [Mon Oct 27 18:18:22 2014] <jcohen>: (I meant for ci) [Mon Oct 27 18:18:34 2014] <kts>: yeah that's the most likely approach [Mon Oct 27 18:18:48 2014] <mkhutornenko>: dlester: +1 on brining this up with Mesos first [Mon Oct 27 18:18:49 2014] <wfarner>: kts: do you mind if i tag you to open the discussion on mesos' dev list? [Mon Oct 27 18:19:27 2014] <wickman>: alternately we push harder for pure bindings. [Mon Oct 27 18:19:44 2014] <kts>: wfarner: sure, but I don't think that should be a blocker to improving our own CI [Mon Oct 27 18:19:50 2014] <wfarner>: wickman: i think that's the right long-term approach, but we're actively losing users with the current state [Mon Oct 27 18:19:59 2014] <jcohen>: thatâd be great as well, but itâs a longer term solution I suspect? [Mon Oct 27 18:20:11 2014] <kts>: wickman: +1, ultimately we want to push this to pure-language bindings [Mon Oct 27 18:20:18 2014] <wfarner>: kts: i agree, but lets not get too cozy with doing all of this [Mon Oct 27 18:20:32 2014] <wfarner>: #action kts to reach out to mesos dev list about providing more eggs [Mon Oct 27 18:20:41 2014] <wfarner>: s/more // [Mon Oct 27 18:21:14 2014] <kts>: will do ## Client stack trace logging ## [Mon Oct 27 18:21:24 2014] <wfarner>: Yasumoto: floor is yours [Mon Oct 27 18:21:48 2014] <Yasumoto>: We've attempted to clean up the process for end-users when the client presents a stack trace [Mon Oct 27 18:22:23 2014] <Yasumoto>: In practice, I've found that as a user it actually leads to more confusion, and then I'm winding up with a directory of files that stick around for a while [Mon Oct 27 18:22:47 2014] <Yasumoto>: While running some tests, stack traces were being caught by the re-routing, which led to https://reviews.apache.org/r/26802/ [Mon Oct 27 18:23:09 2014] <Yasumoto>: but I've discarded that review, as that highlighted the level of patching we're really doing to make it work [Mon Oct 27 18:23:36 2014] <Yasumoto>: I'm proposing we remove the log redirection for now, and re-consider the approach so we don't leave quite as many edge-cases hanging [Mon Oct 27 18:25:08 2014] <wfarner>: Relevant dev@ thread that took us down this road: http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201410.mbox/%3CCAFGkSCm%2B5jJZPXmEm1%3DWNz2tSh8Ld%2BEiO2KYE6Yco%3DpB_chekQ%40mail.gmail.com%3E [Mon Oct 27 18:26:10 2014] <mkhutornenko>: +1 on rolling it back and rethinking the approach to at least not hinder unit test failures. [Mon Oct 27 18:26:57 2014] <wfarner>: let's not have lazy consensus here - i know there are more stakeholders on this [Mon Oct 27 18:27:39 2014] <kts>: +1 on rolling back - my position (http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201410.mbox/%3ccaaath-aoyz3srtypwi+bu5p7xvtg3+8ybfydiuz2fwuweij...@mail.gmail.com%3E) hasn't changed here [Mon Oct 27 18:28:17 2014] <jcohen>: Iâm +1 on rolling it back and getting a better implementation in place (I thought we already *had* rolled it back tbh). [Mon Oct 27 18:28:54 2014] <wfarner>: I'm also +1, i would rather tackle the causes of uncaught exceptions [Mon Oct 27 18:29:03 2014] <Yasumoto>: I just filed https://issues.apache.org/jira/browse/AURORA-896 if anyone wants to discuss afterward [Mon Oct 27 18:29:35 2014] <Yasumoto>: Sounds like there's mainly a majority- at the very least to remove it in the short-term so we can re-think the implementation [Mon Oct 27 18:29:41 2014] <Yasumoto>: I'll have a review out later this week ## Review bot ## [Mon Oct 27 18:29:57 2014] <wfarner>: AURORA-883 [Mon Oct 27 18:30:20 2014] <wfarner>: last week i added a jenkins job that replies to code reviews with build results, so don't be surprised when you see these review replies [Mon Oct 27 18:30:42 2014] <wfarner>: for example: https://reviews.apache.org/r/27058/ [Mon Oct 27 18:31:17 2014] <mkhutornenko>: wfarner: thanks for doing that! it's already proved itself useful for catching python style issues. [Mon Oct 27 18:31:26 2014] <wfarner>: the current implementation will build every diff, feel free to hack on the code: https://github.com/apache/incubator-aurora/blob/master/build-support/jenkins/review_feedback.py [Mon Oct 27 18:31:37 2014] <mkhutornenko>: wfarner: any chance we could suppress emails from it though? [Mon Oct 27 18:32:18 2014] <wfarner>: your best bet is a client-side filter, reviewboard is configured to email the group on every reply, and i don't think it has more control than that [Mon Oct 27 18:32:37 2014] <mkhutornenko>: that's what I thought but wanted to give it shot anyway :) ## All Things Hadoop podcast Aurora episode ## [Mon Oct 27 18:36:21 2014] <wfarner>: Joe Stein (creator of All Things Hadoop podcast) published an episode in which he and i are chatting about Aurora. You might find it interesting: https://twitter.com/allthingshadoop/status/526763573964701697 [Mon Oct 27 18:36:59 2014] <wfarner>: That's all i have for today, any other last-minute topics? ## CI builds ## [Mon Oct 27 18:37:28 2014] <zmanji>: @wfarner: You might want to send an email to the dev list about the podcast [Mon Oct 27 18:37:52 2014] <wfarner>: zmanji: will do [Mon Oct 27 18:38:05 2014] <kts>: it looks like CI reliability has improved a bit since switching to pip-bootstrapped pants [Mon Oct 27 18:38:14 2014] <kts>: however, we're still seeing python timeout errors: https://builds.apache.org/job/Aurora/ [Mon Oct 27 18:38:44 2014] <kts>: (but now in a later stage of the build) [Mon Oct 27 18:39:26 2014] <kts>: we've got essentially 3 options to improve this [Mon Oct 27 18:39:44 2014] <kts>: 1) make a python sdist vendor cache somewhere that CI can see it (probably svn.apache.org) [Mon Oct 27 18:40:18 2014] <kts>: 2) switch to a tool that gives better control over the timeouts in play here [Mon Oct 27 18:40:42 2014] <kts>: 3) contribute upstream to pants/pex to get better control of these timeouts [Mon Oct 27 18:41:11 2014] <kts>: im personally leaning toward 1) [Mon Oct 27 18:41:48 2014] <kts>: does anyone have opinions here? [Mon Oct 27 18:41:49 2014] <Yasumoto>: I feel like that bandaids the issue.. we might want to consider improving the tool reliability more [Mon Oct 27 18:41:55 2014] <Yasumoto>: (aka #3) [Mon Oct 27 18:42:14 2014] <wfarner>: i recall some interest in using requirements.txt for all python dependencies, am i crossing wires or is that rolled up in (2)? [Mon Oct 27 18:42:32 2014] <kts>: that would be a prerequisite to 2) [Mon Oct 27 18:43:04 2014] <kts>: AURORA-617 [Mon Oct 27 18:43:08 2014] <davmclau>: just to clarify - since switching to pip-bootstrapped pants, we have seen no more problems with that part of the build? or just less? [Mon Oct 27 18:43:42 2014] <kts>: builds 686 through 693 all passed with that change [Mon Oct 27 18:44:13 2014] <wfarner>: our build queue, for context: https://builds.apache.org/job/Aurora [Mon Oct 27 18:44:21 2014] <kts>: nothing has failed due to that [Mon Oct 27 18:44:48 2014] <davmclau>: for (1) would we have to manually maintain that cache as dependencies change? [Mon Oct 27 18:44:54 2014] <kts>: yes we would [Mon Oct 27 18:44:55 2014] <wfarner>: some data from the other side - we had a build failure this morning in pex resolution [Mon Oct 27 18:45:00 2014] <davmclau>: I'm in favor of (2) or (3) then [Mon Oct 27 18:45:40 2014] <kts>: there's an automated solution to 1) as well - we could ask infra to setup a devpi instance http://doc.devpi.net/latest/ [Mon Oct 27 18:45:59 2014] <kts>: no idea how much work/what that would entail [Mon Oct 27 18:46:07 2014] <wfarner>: kts: is AURORA-617 potentially an immediate improvement, without necessarily going all the way to pip? [Mon Oct 27 18:46:09 2014] <jcohen>: Any of those options seem reasonable to me. I guess in order Iâd say 1 (via devpi), 3, 2 [Mon Oct 27 18:46:22 2014] <kts>: wfarner: it's a functional noop [Mon Oct 27 18:46:25 2014] <davmclau>: yes, I agree with that [Mon Oct 27 18:46:26 2014] <wfarner>: ok [Mon Oct 27 18:46:30 2014] <jcohen>: AURORA-617 [Mon Oct 27 18:46:42 2014] <davmclau>: (with jcohen) [Mon Oct 27 18:46:53 2014] <kts>: basically instead of a BUILD file in 3rdparty there'd be a requirements.txt file [Mon Oct 27 18:46:57 2014] <kts>: it's mostly a trivial change [Mon Oct 27 18:48:05 2014] <davmclau>: Do we have any idea how much work (3) is? [Mon Oct 27 18:48:20 2014] <kts>: I don't [Mon Oct 27 18:48:36 2014] <davmclau>: okay, I can investigate [Mon Oct 27 18:48:40 2014] <wickman>: davmclau: it will be a matter of adding retries to the pex resolvers [Mon Oct 27 18:48:46 2014] <wickman>: davmclau: and then upgrading pants to use an upgraded version of pex [Mon Oct 27 18:48:58 2014] <davmclau>: and then upgrading pants in our repo [Mon Oct 27 18:49:02 2014] <wickman>: correct [Mon Oct 27 18:50:08 2014] <kts>: #action davmaclau to investigate contributing to pex [Mon Oct 27 18:50:25 2014] <kts>: *davmclau [Mon Oct 27 18:50:32 2014] <wfarner>: Sounds like we've quiesced, going to close up [Mon Oct 27 18:50:45 2014] <wfarner>: ASFBot702: meeting stop Meeting ended at Mon Oct 27 18:50:45 2014