Re: Hyper Parameter Tuning Algorithms

2014-10-06 Thread Ameet Talwalkar
Hi Lochana, This post is also referring to the MLbase project I mentioned in my previous email. We have not open-sourced this work, but plan to do so. Moreover, you might want to check out the following JIRA ticket that includes the design doc fo

Re: Pull Requests

2014-10-06 Thread Bill Bejeck
Can someone review patch #2309 (jira task SPARK-3178) Thanks On Mon, Oct 6, 2014 at 10:41 PM, Patrick Wendell wrote: > Hey Bill, > > Automated testing is just one small part of the process that performs > basic sanity checks on code. All patches need to be championed and > merged by a committer

Pull Requests

2014-10-06 Thread Bill Bejeck
Once a PR has been tested and verified, when does it get pulled back into the trunk?

Re: What is the best way to build my developing Spark for testing on EC2?

2014-10-06 Thread Yu Ishikawa
Hi Evan, Sorry for my replay late. And Thank you for your comment. > As far as cluster set up goes, I usually launch spot instances with the > spark-ec2 scripts, > and then check out a repo which contains a simple driver application for > my code. > Then I have something crude like bash scripts

Re: Spark on Mesos 0.20

2014-10-06 Thread Fairiz Azizi
That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (l

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-06 Thread David Rowe
I agree with this - there is also the issue of different sized masters and slaves, and numbers of executors for hefty machines (e.g. r3.8xlarges), tagging of instances and volumes (we use this for cost attribution at my workplace), and running in VPCs. I think think it might be useful to take a la

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-06 Thread Nicholas Chammas
FYI: I've created SPARK-3821: Develop an automated way of creating Spark images (AMI, Docker, and others) On Mon, Oct 6, 2014 at 4:48 PM, Daniil Osipov wrote: > I've also been looking at this. Basically, the Spark EC2 script is > excellent for s

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-06 Thread Daniil Osipov
I've also been looking at this. Basically, the Spark EC2 script is excellent for small development clusters of several nodes, but isn't suitable for production. It handles instance setup in a single threaded manner, while it can easily be parallelized. It also doesn't handle failure well, ex when a

Re: Spark on Mesos 0.20

2014-10-06 Thread Timothy Chen
Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling wrote: > I've recently run into this issue as well. I get it from running Spark > examples such as log query. Maybe that'll help reproduce the issue. > > > On Monday, October 6, 2

Re: Parquet schema migrations

2014-10-06 Thread Cody Koeninger
Sorry, by "raw parquet" I just meant there is no external metadata store, only the schema written as part of the parquet format. We've done several different kinds of changes, including column rename and widening the data type of an existing column. I don't think it's feasible to support those.

Re: Spark on Mesos 0.20

2014-10-06 Thread RJ Nowling
I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh wrote: > The issue does not occur if the task at hand has small number of map > tasks. I have a task which has 9

TorrentBroadcast slow performance

2014-10-06 Thread Guillaume Pitel
Hi, I've had no answer to this on u...@spark.apache.org, so I post it on dev before filing a JIRA (in case the problem or solution is already identified) We've had some performance issues since switching to 1.1.0, and we finally found the origin : TorrentBroadcast seems to be very slow in our

Re: Spark on Mesos 0.20

2014-10-06 Thread Gurvinder Singh
The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from th

Re: Spark on Mesos 0.20

2014-10-06 Thread Timothy Chen
(Hit enter too soon...) What is your setup and steps to repro this? Tim On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen wrote: > Hi Gurvinder, > > I tried fine grain mode before and didn't get into that problem. > > > On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh > wrote: >> On 10/06/2014 08:

Re: Spark on Mesos 0.20

2014-10-06 Thread Timothy Chen
Hi Gurvinder, I tried fine grain mode before and didn't get into that problem. On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh wrote: > On 10/06/2014 08:19 AM, Fairiz Azizi wrote: >> The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 >> >> I've gotten it to work just fin