Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-18 Thread Davies Liu
Cool, the recent 4 build had used the new configs, thanks!

Let's run more builds.

Davies

On Fri, Oct 17, 2014 at 11:06 PM, Josh Rosen  wrote:
> I think that the fix was applied.  Take a look at
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21874/consoleFull
>
> Here, I see a fetch command that mentions this specific PR branch rather
> than the wildcard that we had before:
>
>  > git fetch --tags --progress https://github.com/apache/spark.git
> +refs/pull/2840/*:refs/remotes/origin/pr/2840/* # timeout=15
>
>
> Do you have an example of a Spark PRB build that’s still failing with the
> old fetch failure?
>
> - Josh
>
> On October 17, 2014 at 11:03:14 PM, Davies Liu (dav...@databricks.com)
> wrote:
>
> How can we know the changes has been applied? I had checked several
> recent builds, they all use the original configs.
>
> Davies
>
> On Fri, Oct 17, 2014 at 6:17 PM, Josh Rosen  wrote:
>> FYI, I edited the Spark Pull Request Builder job to try this out. Let’s
>> see
>> if it works (I’ll be around to revert if it doesn’t).
>>
>> On October 17, 2014 at 5:26:56 PM, Davies Liu (dav...@databricks.com)
>> wrote:
>>
>> One finding is that all the timeout happened with this command:
>>
>> git fetch --tags --progress https://github.com/apache/spark.git
>> +refs/pull/*:refs/remotes/origin/pr/*
>>
>> I'm thinking that maybe this may be a expensive call, we could try to
>> use a more cheap one:
>>
>> git fetch --tags --progress https://github.com/apache/spark.git
>> +refs/pull/XXX/*:refs/remotes/origin/pr/XXX/*
>>
>> XXX is the PullRequestID,
>>
>> The configuration support parameters [1], so we could put this in :
>>
>> +refs/pull//${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
>>
>> I have not tested this yet, could you give this a try?
>>
>> Davies
>>
>>
>> [1]
>>
>> https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin
>>
>> On Fri, Oct 17, 2014 at 5:00 PM, shane knapp  wrote:
>>> actually, nvm, you have to be run that command from our servers to affect
>>> our limit. run it all you want from your own machines! :P
>>>
>>> On Fri, Oct 17, 2014 at 4:59 PM, shane knapp  wrote:
>>>
 yep, and i will tell you guys ONLY if you promise to NOT try this
 yourselves... checking the rate limit also counts as a hit and
 increments
 our numbers:

 # curl -i https://api.github.com/users/whatever 2> /dev/null | egrep
 ^X-Rate
 X-RateLimit-Limit: 60
 X-RateLimit-Remaining: 51
 X-RateLimit-Reset: 1413590269

 (yes, that is the exact url that they recommended on the github site
 lol)

 so, earlier today, we had a spark build fail w/a git timeout at 10:57am,
 but there were only ~7 builds run that hour, so that points to us NOT
 hitting the rate limit... at least for this fail. whee!

 is it beer-thirty yet?

 shane



 On Fri, Oct 17, 2014 at 4:52 PM, Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> Wow, thanks for this deep dive Shane. Is there a way to check if we are
> getting hit by rate limiting directly, or do we need to contact GitHub
> for that?
>
> 2014년 10월 17일 금요일, shane knapp님이 작성한 메시지:
>
> quick update:
>>
>> here are some stats i scraped over the past week of ALL pull request
>> builder projects and timeout failures. due to the large number of
>> spark
>> ghprb jobs, i don't have great records earlier than oct 7th. the data
>> is
>> current up until ~230pm today:
>>
>> spark and new spark ghprb total builds vs git fetch timeouts:
>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -i
>> spark | wc -l); failed=$(grep $x SORTED | grep -i spark | wc -l); let
>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" |
>> bc
>> |
>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
>> $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
>> 10-09 -- total builds: 140 p/f: 92/48 fail%: 34%
>> 10-10 -- total builds: 65 p/f: 59/6 fail%: 09%
>> 10-11 -- total builds: 29 p/f: 29/0 fail%: 0%
>> 10-12 -- total builds: 24 p/f: 21/3 fail%: 12%
>> 10-13 -- total builds: 39 p/f: 35/4 fail%: 10%
>> 10-14 -- total builds: 7 p/f: 5/2 fail%: 28%
>> 10-15 -- total builds: 37 p/f: 34/3 fail%: 08%
>> 10-16 -- total builds: 71 p/f: 59/12 fail%: 16%
>> 10-17 -- total builds: 26 p/f: 20/6 fail%: 23%
>>
>> all other ghprb builds vs git fetch timeouts:
>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -vi
>> spark | wc -l); failed=$(grep $x SORTED | grep -vi spark | wc -l); let
>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" |
>> bc
>> |
>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
>> $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
>> 10-09 -- total builds: 16 p/f: 16/0 fail%: 0%
>> 10-10 -- total bu

Oryx + Spark mllib

2014-10-18 Thread Debasish Das
Hi,

Is someone working on a project on integrating Oryx model serving layer
with Spark ? Models will be built using either Streaming data / Batch data
in HDFS and cross validated with mllib APIs but the model serving layer
will give API endpoints like Oryx
and read the models may be from hdfs/impala/SparkSQL

One of the requirement is that the API layer should be scalable and
elastic...as requests grow we should be able to add more nodes...using play
and akka clustering module...

If there is a ongoing project on github please point to it...

Is there a plan of adding model serving and experimentation layer to mllib ?

Thanks.
Deb


Re: Oryx + Spark mllib

2014-10-18 Thread Rajiv Abraham
Oryx 2 seems to be geared for Spark

https://github.com/OryxProject/oryx

2014-10-18 11:46 GMT-04:00 Debasish Das :

> Hi,
>
> Is someone working on a project on integrating Oryx model serving layer
> with Spark ? Models will be built using either Streaming data / Batch data
> in HDFS and cross validated with mllib APIs but the model serving layer
> will give API endpoints like Oryx
> and read the models may be from hdfs/impala/SparkSQL
>
> One of the requirement is that the API layer should be scalable and
> elastic...as requests grow we should be able to add more nodes...using play
> and akka clustering module...
>
> If there is a ongoing project on github please point to it...
>
> Is there a plan of adding model serving and experimentation layer to mllib
> ?
>
> Thanks.
> Deb
>



-- 
Take care,
Rajiv


Joining the spark dev community

2014-10-18 Thread Saurabh Wadhawan
How can I become a spark contributor.
What's the good path that I can follow to become an active code submitter for 
spark from a newbie.

Regards
- Saurabh



Re: Raise Java dependency from 6 to 7

2014-10-18 Thread Koert Kuipers
my experience is that there are still a lot of java 6 clusters out there.
also distros that bundle spark still support java 6
On Oct 17, 2014 8:01 PM, "Andrew Ash"  wrote:

> Hi Spark devs,
>
> I've heard a few times that keeping support for Java 6 is a priority for
> Apache Spark.  Given that Java 6 has been publicly EOL'd since Feb 2013
>  and the last
> public update was Apr 2013
> , why
> are we still maintaing support for 6?  The only people using it now must be
> paying for the extended support to continue receiving security fixes.
>
> Bumping the lower bound of Java versions up to Java 7 would allow us to
> upgrade from Jetty 8 to 9, which is currently a conflict with the
> Dropwizard framework and a personal pain point.
>
> Java 6 vs 7 for Spark links:
> Try with resources
>  for
> SparkContext et al
> Upgrade to Jetty 9
> 
> Warn when not compiling with Java6
> 
>
>
> Who are the people out there that still need Java 6 support?
>
> Thanks!
> Andrew
>


Re: Raise Java dependency from 6 to 7

2014-10-18 Thread Matei Zaharia
I'd also wait a bit until these are gone. Jetty is unfortunately a much hairier 
topic by the way, because the Hadoop libraries also depend on Jetty. I think it 
will be hard to update. However, a patch that shades Jetty might be nice to 
have, if that doesn't require shading a lot of other stuff.

Matei

> On Oct 18, 2014, at 4:37 PM, Koert Kuipers  wrote:
> 
> my experience is that there are still a lot of java 6 clusters out there.
> also distros that bundle spark still support java 6
> On Oct 17, 2014 8:01 PM, "Andrew Ash"  wrote:
> 
>> Hi Spark devs,
>> 
>> I've heard a few times that keeping support for Java 6 is a priority for
>> Apache Spark.  Given that Java 6 has been publicly EOL'd since Feb 2013
>>  and the last
>> public update was Apr 2013
>> , why
>> are we still maintaing support for 6?  The only people using it now must be
>> paying for the extended support to continue receiving security fixes.
>> 
>> Bumping the lower bound of Java versions up to Java 7 would allow us to
>> upgrade from Jetty 8 to 9, which is currently a conflict with the
>> Dropwizard framework and a personal pain point.
>> 
>> Java 6 vs 7 for Spark links:
>> Try with resources
>>  for
>> SparkContext et al
>> Upgrade to Jetty 9
>> 
>> Warn when not compiling with Java6
>> 
>> 
>> 
>> Who are the people out there that still need Java 6 support?
>> 
>> Thanks!
>> Andrew
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Raise Java dependency from 6 to 7

2014-10-18 Thread Marcelo Vanzin
Hadoop, for better or worse, depends on an ancient version of Jetty
(6), that is even on a different package. So Spark (or anyone trying
to use a newer Jetty) is lucky on that front...

IIRC Hadoop is planning to move to Java 7-only starting with 2.7. Java
7 is also supposed to be EOL some time next year, so a plan to move to
Java 7 and, eventually, Java 8 would be nice.

On Sat, Oct 18, 2014 at 5:44 PM, Matei Zaharia  wrote:
> I'd also wait a bit until these are gone. Jetty is unfortunately a much 
> hairier topic by the way, because the Hadoop libraries also depend on Jetty. 
> I think it will be hard to update. However, a patch that shades Jetty might 
> be nice to have, if that doesn't require shading a lot of other stuff.
>
> Matei
>
>> On Oct 18, 2014, at 4:37 PM, Koert Kuipers  wrote:
>>
>> my experience is that there are still a lot of java 6 clusters out there.
>> also distros that bundle spark still support java 6
>> On Oct 17, 2014 8:01 PM, "Andrew Ash"  wrote:
>>
>>> Hi Spark devs,
>>>
>>> I've heard a few times that keeping support for Java 6 is a priority for
>>> Apache Spark.  Given that Java 6 has been publicly EOL'd since Feb 2013
>>>  and the last
>>> public update was Apr 2013
>>> , why
>>> are we still maintaing support for 6?  The only people using it now must be
>>> paying for the extended support to continue receiving security fixes.
>>>
>>> Bumping the lower bound of Java versions up to Java 7 would allow us to
>>> upgrade from Jetty 8 to 9, which is currently a conflict with the
>>> Dropwizard framework and a personal pain point.
>>>
>>> Java 6 vs 7 for Spark links:
>>> Try with resources
>>>  for
>>> SparkContext et al
>>> Upgrade to Jetty 9
>>> 
>>> Warn when not compiling with Java6
>>> 
>>>
>>>
>>> Who are the people out there that still need Java 6 support?
>>>
>>> Thanks!
>>> Andrew
>>>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Submissions open for Spark Summit East 2015

2014-10-18 Thread Matei Zaharia
After successful events in the past two years, the Spark Summit conference has 
expanded for 2015, offering both an event in New York on March 18-19 and one in 
San Francisco on June 15-17. The conference is a great chance to meet people 
from throughout the Spark community and see the latest news, tips and use cases.

Submissions are now open for Spark Summit East 2015, to be held in New York on 
March 18-19. If you’d like to give a talk on use cases, neat applications, or 
ongoing Spark development, submit your talk online today at 
http://prevalentdesignevents.com/sparksummit2015/east/speaker/. Submissions 
will be open until December 6th, 2014.

If you missed this year’s Spark Summit, you can still find videos from all 
talks online at http://spark-summit.org/2014.

Hope to see you there,

Matei
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Oryx + Spark mllib

2014-10-18 Thread Sean Owen
Yes, that is exactly what the next 2.x version does. Still in progress but
the recommender app and framework are code - complete. It is not even
specific to MLlib and could plug in other model build functions.

The current 1.x version will not use MLlib. Neither uses Play but is
intended to scale just by adding web servers however you usually do.

See graphflow too.
On Oct 18, 2014 5:06 PM, "Rajiv Abraham"  wrote:

> Oryx 2 seems to be geared for Spark
>
> https://github.com/OryxProject/oryx
>
> 2014-10-18 11:46 GMT-04:00 Debasish Das :
>
> > Hi,
> >
> > Is someone working on a project on integrating Oryx model serving layer
> > with Spark ? Models will be built using either Streaming data / Batch
> data
> > in HDFS and cross validated with mllib APIs but the model serving layer
> > will give API endpoints like Oryx
> > and read the models may be from hdfs/impala/SparkSQL
> >
> > One of the requirement is that the API layer should be scalable and
> > elastic...as requests grow we should be able to add more nodes...using
> play
> > and akka clustering module...
> >
> > If there is a ongoing project on github please point to it...
> >
> > Is there a plan of adding model serving and experimentation layer to
> mllib
> > ?
> >
> > Thanks.
> > Deb
> >
>
>
>
> --
> Take care,
> Rajiv
>


Re: Oryx + Spark mllib

2014-10-18 Thread Nick Pentreath
We've built a model server internally, based on Scalatra and Akka
Clustering. Our use case is more geared towards serving possibly thousands
of smaller models.

It's actually very basic, just reads models from S3 as strings (!!) (uses
HDFS FileSystem so can read from local, HDFS, S3) and uses Breeze for
linear algebra. (Technically it is also not dependent on Spark, it could be
reading models generated by any computation layer).

It's designed to allow scaling via cluster sharding, by adding nodes (but
could also support a load-balanced approach). Not using persistent actors
as doing a model reload on node failure is not a disaster as we have
multiple levels of fallback.

Currently it is a bit specific to our setup (and only focused on
recommendation models for now), but could with some work be made generic.
I'm certainly considering if we can find the time to make it a releasable
project.

One major difference to Oryx is that it only handles the model loading and
vector computations, not the filtering-related and other things that come
as part of a recommender system (that is done elsewhere in our system). It
also does not handle the ingesting of data at all.

On Sun, Oct 19, 2014 at 7:10 AM, Sean Owen  wrote:

> Yes, that is exactly what the next 2.x version does. Still in progress but
> the recommender app and framework are code - complete. It is not even
> specific to MLlib and could plug in other model build functions.
>
> The current 1.x version will not use MLlib. Neither uses Play but is
> intended to scale just by adding web servers however you usually do.
>
> See graphflow too.
> On Oct 18, 2014 5:06 PM, "Rajiv Abraham"  wrote:
>
> > Oryx 2 seems to be geared for Spark
> >
> > https://github.com/OryxProject/oryx
> >
> > 2014-10-18 11:46 GMT-04:00 Debasish Das :
> >
> > > Hi,
> > >
> > > Is someone working on a project on integrating Oryx model serving layer
> > > with Spark ? Models will be built using either Streaming data / Batch
> > data
> > > in HDFS and cross validated with mllib APIs but the model serving layer
> > > will give API endpoints like Oryx
> > > and read the models may be from hdfs/impala/SparkSQL
> > >
> > > One of the requirement is that the API layer should be scalable and
> > > elastic...as requests grow we should be able to add more nodes...using
> > play
> > > and akka clustering module...
> > >
> > > If there is a ongoing project on github please point to it...
> > >
> > > Is there a plan of adding model serving and experimentation layer to
> > mllib
> > > ?
> > >
> > > Thanks.
> > > Deb
> > >
> >
> >
> >
> > --
> > Take care,
> > Rajiv
> >
>