Unsubscribe

2014-10-27 Thread Ian Ferreira
unsubscribe

Is Hadoop MR now comparable with Spark?

2014-06-02 Thread Ian Ferreira
http://hortonworks.com/blog/ddm/#.U4yn3gJgfts.twitter

RE: Announcing Spark 1.0.0

2014-05-30 Thread Ian Ferreira
Congrats Sent from my Windows Phone From: Dean Wampler Sent: ‎5/‎30/‎2014 6:53 AM To: user@spark.apache.org Subject: Re: Announcing Spark 1.0.0 Congratulations!! On Fri, May 30, 2014 at 5:12 AM, Patrick

Re: Debugging Spark AWS S3

2014-05-16 Thread Ian Ferreira
Did you check the executor stderr logs? On 5/16/14, 2:37 PM, "Robert James" wrote: >I have Spark code which runs beautifully when MASTER=local. When I >run it with MASTER set to a spark ec2 cluster, the workers seem to >run, but the results, which are supposed to be put to AWS S3, don't >appear

Real world

2014-05-15 Thread Ian Ferreira
Folks, I keep getting questioned on real world experience of Spark as in mission critical production deployments. Does anyone have some war stories to share or know of resources to review? Cheers - Ian

Re: Easy one

2014-05-07 Thread Ian Ferreira
xport SPARK_WORKER_MEMORY=4g On Tue, May 6, 2014 at 5:29 PM, Ian Ferreira wrote: > Hi there, > > Why can¹t I seem to kick the executor memory higher? See below from EC2 > deployment using m1.large > > > And in the spark-env.sh > export SPARK_MEM=6154m > > > And in th

Easy one

2014-05-06 Thread Ian Ferreira
Hi there, Why can¹t I seem to kick the executor memory higher? See below from EC2 deployment using m1.large And in the spark-env.sh export SPARK_MEM=6154m And in the spark context sconf.setExecutorEnv("spark.executor.memory", "4g²) Cheers - Ian

Getting the following error using EC2 deployment

2014-05-01 Thread Ian Ferreira
I have a custom app that was compiled with scala 2.10.3 which I believe is what the latest spark-ec2 script installs. However running it on the master yields this cryptic error which according to the web implies incompatible jar versions. Exception in thread "main" java.lang.NoClassDefFoundError:

Setting the Scala version in the EC2 script?

2014-05-01 Thread Ian Ferreira
Is this possible, it is very annoying to have such a great script, but still have to manually update stuff afterwards.

Re: Can't be built on MAC

2014-05-01 Thread Ian Ferreira
HI Zhige, I had the same issue and revert to using JDK 1.7.055 From: Zhige Xin Reply-To: Date: Thursday, May 1, 2014 at 12:32 PM To: Subject: Can't be built on MAC Hi dear all, When I tried to build Spark 0.9.1 on my Mac OS X 10.9.2 with Java 8, I found the following errors: [error] err

Running parallel jobs in the same driver with Futures?

2014-04-28 Thread Ian Ferreira
I recall asking about this, and I think Matei suggest it was, but is the scheduler thread safe? I am running mllib libraries as futures in the same driver using the same dataset as input and this error 14/04/28 08:29:48 ERROR TaskSchedulerImpl: Exception in statusUpdate java.util.concurrent.Reje

Failed to run count?

2014-04-23 Thread Ian Ferreira
I am getting this cryptic error running LinearRegressionwithSGD Data sample LabeledPoint(39.0, [144.0, 1521.0, 20736.0, 59319.0, 2985984.0]) 14/04/23 15:15:34 INFO SparkContext: Starting job: first at GeneralizedLinearAlgorithm.scala:121 14/04/23 15:15:34 INFO DAGScheduler: Got job 2 (first at G

Adding to an RDD

2014-04-21 Thread Ian Ferreira
Feels like a silly questions, But what if I wanted to apply a map to each element in a RDD, but instead of replacing it, I wanted to add new columns of the manipulate value I.e. res0: Array[String] = Array(1 2, 1 3, 1 4, 2 1, 3 1, 4 1) Becomes res0: Array[String] = Array(1 2 2 4, 1 3 1 6,

Combining RDD's columns

2014-04-18 Thread Ian Ferreira
This may seem contrived but, suppose I wanted to create a collection of "single column" RDD's that contain calculated values, so I want to cache these to avoid re-calc. i.e. rdd1 = {Names] rdd2 = {Star Sign} rdd3 = {Age} Then I want to create a new virtual RDD that is a collection of thes

RE: Multi-tenant?

2014-04-15 Thread Ian Ferreira
ook at http://spark.apache.org/docs/latest/job-scheduling.html, which includes scheduling concurrent jobs within the same driver. Matei On Apr 15, 2014, at 4:08 PM, Ian Ferreira wrote: > What is the support for multi-tenancy in Spark. > > I assume more than one driver can share the same clus

Multi-tenant?

2014-04-15 Thread Ian Ferreira
What is the support for multi-tenancy in Spark. I assume more than one driver can share the same cluster, but can a driver run two jobs in parallel?

Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
This would be super useful. Thanks. On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: >Hi Andrew, > >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on >ML algorithms, as I'm particularly curious about the relative performance >of >MLlib in Scala vs the Python MLlib API vs pur

Pyspark with Cython

2014-04-14 Thread Ian Ferreira
Has anyone used Cython closures with Spark? We have a large investment in Python code that we don¹t want to port to Scala. Curious about any performance issues with the interop between the Scala engine and the Cython closures. I believe it is sockets on the driver and pipe on the executors?

Re: Spark resilience

2014-04-14 Thread Ian Ferreira
t does not affect currently-running jobs. Workers can fail and will simply cause jobs to lose their current Executors. New Workers can be added at any point. On Mon, Apr 14, 2014 at 11:00 AM, Ian Ferreira wrote: > Folks, > > I was wondering what the failure support modes where for Spark

Spark resilience

2014-04-14 Thread Ian Ferreira
Folks, I was wondering what the failure support modes where for Spark while running jobs 1. What happens when a master fails 2. What happens when a slave fails 3. Can you mid job add and remove slaves Regarding the install on Meso, if I understand correctly the Spark master is behind a Zookeeper

Re: Spark - ready for prime time?

2014-04-10 Thread Ian Ferreira
Do you have the link to the Cloudera comment? Sent from Windows Mail From: Dean Wampler Sent: ‎Thursday‎, ‎April‎ ‎10‎, ‎2014 ‎7‎:‎39‎ ‎AM To: Spark Users Cc: Daniel Darabos, Andras Barjak Spark has been endorsed by Cloudera as the successor to MapReduce. That says a lot... On

Re: Error when run Spark on mesos

2014-04-02 Thread Ian Ferreira
I think this is related to a known issue (regression) in 0.9.0. Try using explicit IP other than loop back. Sent from a mobile device > On Apr 2, 2014, at 8:53 PM, "panfei" wrote: > > any advice ? > > > 2014-04-03 11:35 GMT+08:00 felix : >> I deployed mesos and test it using the exmaple/tes

Protobuf 2.5 Mesos

2014-04-01 Thread Ian Ferreira
>From what I can tell I need to use mesos 0-17 to support protobuf 2.5 which is required for hadoop 2.3.0. However I still run into the JVM error which appears to be related to protobuf compatibility. Any recommendations?

Mllib in pyspark for 0.8.1

2014-04-01 Thread Ian Ferreira
Hi there, For some reason the distribution and build for 0.8.1 does not include the MLLib libraries for pyspark i.e. import from mllib fails. Seems to be addressed in 0.9.0, but that has other issue running on mesos in standalone mode :) Any pointers? Cheers - Ian