Re: Spark job submission REST API

2015-12-30 Thread Fernando O.
One of the advantages of using spark-jobserver is that it lets you reuse your contexts (create one context and run multiple jobs on it) Since you can multiple jobs in one context, you can also share RDDs (NamedRDD) between jobs ie: create a MLLib model and share it without the need to persist it.

Problem building master on 2.11

2015-05-16 Thread Fernando O.
Is anyone else having issues when building spark from git? I created a jira ticket with a Docker file that reproduces the issue. The error: /spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { retu

Re: Error while running SparkPi in Hadoop HA

2015-04-21 Thread Fernando O.
Solved Looks like it's some incompatibility in the build when using -Phadoop-2.4 , made the distribution with -Phadoop-provided and that fixed the issue On Tue, Apr 21, 2015 at 2:03 PM, Fernando O. wrote: > Hi all, > > I'm wondering if SparkPi works with hadoop HA

Error while running SparkPi in Hadoop HA

2015-04-21 Thread Fernando O.
Hi all, I'm wondering if SparkPi works with hadoop HA (I guess it should) Hadoop's pi example works great on my cluster, so after having that done I installed spark and in the worker log I'm seeing two problems that might be related. Versions: Hadoop 2.6.0 Spark 1.3.1 I'm runn

Re: Running Spark on Gateway - Connecting to Resource Manager Retries

2015-04-20 Thread Fernando O.
I'm experimenting the same issue with spark 1.3.1 I verified that hadoop works (ie: running hadoop's pi example) It seems like hadoop conf is in the classpath (/opt/test/service/hadoop/etc/hadoop ) SPARK_PRINT_LAUNCH_COMMAND=1 ./bin/spark-shell --master yarn-client Spark Command: /usr/lib/jvm/jr

Re: Building Spark 1.3 for Scala 2.11 using Maven

2015-03-12 Thread Fernando O.
Just FYI: what @Marcelo said fixed the issue for me. On Fri, Mar 6, 2015 at 7:11 AM, Sean Owen wrote: > -Pscala-2.11 and -Dscala-2.11 will happen to do the same thing for this > profile. > > Why are you running "install package" and not just "install"? Probably > doesn't matter. > > This sounds

Re: Removing JARs from spark-jobserver

2015-01-12 Thread Fernando O.
just an FYI: you can configure that using spark.jobserver.filedao.rootdir On Mon, Jan 12, 2015 at 1:52 AM, abhishek wrote: > Nice! Good to know > On 11 Jan 2015 21:10, "Sasi [via Apache Spark User List]" <[hidden email] > > wrote: > >> Thank

Re: [MLLib] storageLevel in ALS

2015-01-07 Thread Fernando O.
; >> >> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L202 >> >> -Xiangrui >> >> On Tue, Jan 6, 2015 at 12:57 PM, Fernando O. wrote: >> >>> Hi, >>>I was doing a tests wit

Re: [MLLib] storageLevel in ALS

2015-01-07 Thread Fernando O.
his configurable in 1.1: > > > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L202 > > -Xiangrui > > On Tue, Jan 6, 2015 at 12:57 PM, Fernando O. wrote: > >> Hi, >>I was doing a tests with AL

[MLLib] storageLevel in ALS

2015-01-06 Thread Fernando O.
Hi, I was doing a tests with ALS and I noticed that if I persist the inner RDDs from a MatrixFactorizationModel the RDD is not replicated, it seems like the storagelevel is hardcoded to MEMORY_AND_DISK, do you think it makes sense to make that configurable? [image: Inline image 1]

Re: Trying to make spark-jobserver work with yarn

2015-01-01 Thread Fernando O.
uster details while creating the > SparkContext. > > Thanks > Best Regards > > On Wed, Dec 31, 2014 at 10:54 PM, Fernando O. wrote: > >> Before jumping into a sea of dependencies and bash files: >> Does anyone have an example of how to run a spark job without using

Re: limit vs sample for indexing a small amount of data quickly?

2014-12-31 Thread Fernando O.
There's a take method that might do what you need: *def take(**num**: **Int**): Array[T]* Take the first num elements of the RDD. On Jan 1, 2015 12:02 AM, "Kevin Burton" wrote: > Is there a limit function which just returns the first N records? > > Sample is nice but I’m trying to do this so it

Re: Trying to make spark-jobserver work with yarn

2014-12-31 Thread Fernando O.
Before jumping into a sea of dependencies and bash files: Does anyone have an example of how to run a spark job without using spark-submit or shell ? On Tue, Dec 30, 2014 at 3:23 PM, Fernando O. wrote: > Hi all, > I'm investigating spark for a new project and I'm tryin

Re: FlatMapValues

2014-12-31 Thread Fernando O.
Hi Sanjay, Doing an if inside a Map sounds like a bad idea, it seems like you actually want to filter and then apply map On Wed, Dec 31, 2014 at 9:54 AM, Kapil Malik wrote: > Hi Sanjay, > > > > I tried running your code on spark shell piece by piece – > > > > // Setup > > val line1 = ā€œ025126,C

Trying to make spark-jobserver work with yarn

2014-12-30 Thread Fernando O.
Hi all, I'm investigating spark for a new project and I'm trying to use spark-jobserver because... I need to reuse and share RDDs and from what I read in the forum that's the "standard" :D Turns out that spark-jobserver doesn't seem to work on yarn, or at least it does not on 1.1.1 My config

Re: MLlib + Streaming

2014-12-23 Thread Fernando O.
Hey Xiangrui, Is there any plan to have a streaming compatible ALS version? Or if it's currently doable, is there any example? On Tue, Dec 23, 2014 at 4:31 PM, Xiangrui Meng wrote: > We have streaming linear regression (since v1.1) and k-means (v1.2) in > MLlib. You can check the user gu

Newbie Question

2014-12-11 Thread Fernando O.
Hi guys, I'm planning to use spark on a project and I'm facing a problem, I couldn't find a log that explains what's wrong with what I'm doing. I have 2 vms that run a small hadoop (2.6.0) cluster. I added a file that has a 50 lines of json data Compiled spark, all tests passed, I run some si