Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Josh Rosen
I think that the fix was applied.  Take a look at  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21874/consoleFull Here, I see a fetch command that mentions this specific PR branch rather than the wildcard that we had before: > git fetch --tags --progress https://github.com

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Davies Liu
How can we know the changes has been applied? I had checked several recent builds, they all use the original configs. Davies On Fri, Oct 17, 2014 at 6:17 PM, Josh Rosen wrote: > FYI, I edited the Spark Pull Request Builder job to try this out. Let’s see > if it works (I’ll be around to revert i

Re: Spark on Mesos 0.20

2014-10-17 Thread Fairiz Azizi
' and also see that it loads the SNAPPY library). I ran the example against a directory of ApacheLog files containing about 4.4GB, and things seem to work fine. time MASTER="mesos://*:5050*" /opt/spark/current/bin/run-example LogQuery "maprfs:///user/hive/warehouse/apac

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Josh Rosen
FYI, I edited the Spark Pull Request Builder job to try this out.  Let’s see if it works (I’ll be around to revert if it doesn’t). On October 17, 2014 at 5:26:56 PM, Davies Liu (dav...@databricks.com) wrote: One finding is that all the timeout happened with this command: git fetch --tags --pr

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Davies Liu
One finding is that all the timeout happened with this command: git fetch --tags --progress https://github.com/apache/spark.git +refs/pull/*:refs/remotes/origin/pr/* I'm thinking that maybe this may be a expensive call, we could try to use a more cheap one: git fetch --tags --progress https://gi

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread shane knapp
yep, and i will tell you guys ONLY if you promise to NOT try this yourselves... checking the rate limit also counts as a hit and increments our numbers: # curl -i https://api.github.com/users/whatever 2> /dev/null | egrep ^X-Rate X-RateLimit-Limit: 60 X-RateLimit-Remaining: 51 X-RateLimit-Reset:

Raise Java dependency from 6 to 7

2014-10-17 Thread Andrew Ash
Hi Spark devs, I've heard a few times that keeping support for Java 6 is a priority for Apache Spark. Given that Java 6 has been publicly EOL'd since Feb 2013 and the last public update was Apr 2013

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread shane knapp
actually, nvm, you have to be run that command from our servers to affect our limit. run it all you want from your own machines! :P On Fri, Oct 17, 2014 at 4:59 PM, shane knapp wrote: > yep, and i will tell you guys ONLY if you promise to NOT try this > yourselves... checking the rate limit a

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Nicholas Chammas
Wow, thanks for this deep dive Shane. Is there a way to check if we are getting hit by rate limiting directly, or do we need to contact GitHub for that? 2014년 10월 17일 금요일, shane knapp님이 작성한 메시지: > quick update: > > here are some stats i scraped over the past week of ALL pull request > builder pro

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread shane knapp
quick update: here are some stats i scraped over the past week of ALL pull request builder projects and timeout failures. due to the large number of spark ghprb jobs, i don't have great records earlier than oct 7th. the data is current up until ~230pm today: spark and new spark ghprb total buil

sampling broken in PySpark with recent NumPy

2014-10-17 Thread Jeremy Freeman
Hi all, I found a significant bug in PySpark's sampling methods, due to a recent NumPy change (as of v1.9). I created a JIRA (https://issues.apache.org/jira/browse/SPARK-3995), but wanted to share here as well in case anyone hits it. Steps to reproduce are: > foo = sc.parallelize(range(1000),

Re: accumulators

2014-10-17 Thread Reynold Xin
It certainly makes sense for a single streaming job. But it is definitely non-trivial to make this useful to all Spark programs. If I were to have a long running SParkContext and submit a wide variety of jobs to it, this would make the list of accumulators very, very large. Maybe the solution is

Using Docker to Parallelize Tests

2014-10-17 Thread Nicholas Chammas
https://news.ycombinator.com/item?id=8471812 The parent thread has lots of interesting use cases for Docker, and the linked comment seems most relevant to our testing predicament. I might look into this after I finish something presentable with Packer and our EC2 scripts, but if anyone else is in