Re: IDF model error

2014-11-26 Thread Shivani Rao
ndexedRow(2L, new SSV(22, Array(1, 2, 4, 13), Array(0.0, > 1.0, 2.0, 0.0))) > val doc3s = new IndexedRow(3L, new SSV(22, Array(10, 14, 20, > 21),Array(2.0, 0.0, 2.0, 1.0))) > val doc4s = new IndexedRow(4L, new SSV(22, Array(3, 7, 13, 20),Array(2.0, > 0.0, 2.0, 1.0))) > > 2014-11-2

IDF model error

2014-11-25 Thread Shivani Rao
Hello Spark fans, I am trying to use the IDF model available in the spark mllib to create an tf-idf representation of a n RDD[Vectors]. Below i have attached my MWE I get the following error "java.lang.IndexOutOfBoundsException: 7 not in [-4,4) at breeze.linalg.DenseVector.apply$mcI$sp(DenseVect

Jobs get stuck at reduceByKey stage with spark 1.0.1

2014-08-12 Thread Shivani Rao
Hello spark aficionados, We upgraded from spark 1.0.0 to 1.0.1 when the new release came out and started noticing some weird errors. Even a simple operation like "reduceByKey" or "count" on an RDD gets stuck in "cluster mode". This issue does not occur with spark 1.0.0 (in cluster or local mode)

Spark, Logging Issues: slf4j or log4j

2014-07-02 Thread Shivani Rao
Hello Spark fans, I am unable to figure out how Spark figures out which logger to use. I know that Spark decides upon this at the time of initialization of the Spark Context. From Spark documentation it is clear that Spark uses log4j, and not slf4j, but I have been able to successfully get spark t

Re: Bug in Spark REPL

2014-06-23 Thread Shivani Rao
Actually I figured it out. There was a problem was that I was loading the "sbt package"-ed jar into the class path and not the "sbt assembly"-ed jar. Once I put the right jar in for package a.b.c.d.z everything worked thanks shivani On Mon, Jun 23, 2014 at 4:38 PM, Shivani R

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-23 Thread Shivani Rao
have SPARK_CLASSPATH var but it does not distribute the code, it > is only used to compute the driver classpath. > > > BTW, you are not supposed to change the compute_classpath.script > > > 2014-06-20 19:45 GMT+02:00 Shivani Rao : > > Hello Eugene, >> >> You

Bug in Spark REPL

2014-06-23 Thread Shivani Rao
I have two jars with the following packages package a.b.c.d.z found in jar1 package a.b.e found in jar2 In scala REPL (no spark) both imports work just fine, but in the Spark REPL, I found that import a.b.c.d.z gives me the following error object "c" is not a member of package a.b Has a

Re: Worker dies while submitting a job

2014-06-20 Thread Shivani Rao
That error typically means that there is a communication error (wrong ports) between master and worker. Also check if the worker has "write" permissions to create the "work" directory. We were getting this error due one of the above two reasons On Tue, Jun 17, 2014 at 10:04 AM, Luis Ángel Vicent

Re: How do you run your spark app?

2014-06-20 Thread Shivani Rao
; And I run as mentioned below. > > LOCALLY : > 1) sbt 'run AP1z4IYraYm5fqWhITWArY53x > Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 > 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN > Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014' > > If you want to submit on the cluster > >

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Shivani Rao
package, deploy, > run, iterate). Try for example > 1) read the lines (without any processing) and count them > 2) apply processing and count > > > > 2014-06-20 17:15 GMT+02:00 Shivani Rao : > > Hello Abhi, I did try that and it did not work >> >> And Eugene,

Re: How do you run your spark app?

2014-06-20 Thread Shivani Rao
Hello Michael, I have a quick question for you. Can you clarify the statement " build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job". Can you give an example. I am using sbt assembly as well to create a fat jar, and supplying the spa

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Shivani Rao
Hello Abhi, I did try that and it did not work And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So how did you overcome this problem? Shivani On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi wrote: > > Le 20 juin 2014 01:46, "Shivani Rao" a écrit : > &

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-19 Thread Shivani Rao
e doing some intense processing on every line but just > writing parsed case classes back to disk sounds very lightweight. > > I > > > On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao wrote: > >> I am trying to process a file that contains 4 log lines (not very long) >> and th

Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-18 Thread Shivani Rao
I am trying to process a file that contains 4 log lines (not very long) and then write my parsed out case classes to a destination folder, and I get the following error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java

Re: Adding external jar to spark-shell classpath in spark 1.0

2014-06-12 Thread Shivani Rao
@Marcelo: The command ./bin/spark-shell --jars jar1,jar2,etc,etc did not work for me on a linux machine What I did is to append the class path in the bin/compute-classpath.sh file. Ran the script, then started the spark shell, and that worked Thanks Shivani On Wed, Jun 11, 2014 at 10:52 AM, A

Re: Hanging Spark jobs

2014-06-12 Thread Shivani Rao
I learned this from my co-worker, but it is relevant here. Spark has lazy evaluation by default, which means that all of your code does not get executed until you run your "saveAsTextFile", which does not tell you much about where the problem is occurring. In order to debug this better, you might

Re: using Log4j to log INFO level messages on workers

2014-06-04 Thread Shivani Rao
4717af68bbba81 > > > Alex > > > On Mon, Jun 2, 2014 at 7:18 PM, Shivani Rao wrote: > >> Hello Spark fans, >> >> I am trying to log messages from my spark application. When the main() >> function attempts to log, using log.info() it works great, but w

using Log4j to log INFO level messages on workers

2014-06-02 Thread Shivani Rao
Hello Spark fans, I am trying to log messages from my spark application. When the main() function attempts to log, using log.info() it works great, but when I try the same command from the code that probably runs on the worker, I initially got an serialization error. To solve that, I created a new

Re: logging in pyspark

2014-05-22 Thread Shivani Rao
I am having trouble adding logging to the class that does serialization and deserialization. Where is the code for org.apache.spark.Logging located? and is this serializable? On Mon, May 12, 2014 at 10:02 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Ah, yes, that is correct. You

Imports that need to be specified in a Spark application jar?

2014-05-20 Thread Shivani Rao
Hello All, I am learning that there are certain imports done by Spark REPL that is used to invoke and run code in a spark shell, that I would have to import specifically if I need the same functionality in a spark jar run by command line. I am getting into a repeated serialization error of an RDD

Re: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-05-15 Thread Shivani Rao
This is something that I have bumped into time and again. the object that contains your main() should also be serializable then you won't have this issue. For example object Test extends serializable{ def main(){ // set up spark context // read your data // create your RDD's (grouped by key) /

Re: Unable to load native-hadoop library problem

2014-05-14 Thread Shivani Rao
Hello Sophia You are only providing the Spark jar here (nevertheless, a spark jar that contains hadoop libraries in it, but that is not sufficient). Where is your hadoop installed? (Most probably: /usr/lib/hadoop/*) So you need to add that to your class path (by using -cp) I guess. Let me know if

Re: is it possible to initiate Spark jobs from Oozie?

2014-05-02 Thread Shivani Rao
I have mucked around this a little bit. The first step to make this happen is to build a fat jar. I wrote a quick blogdocumenting my learning curve w.r.t that. The next step is to schedule this as a java action. Since y

Re: Spark: issues with running a sbt fat jar due to akka dependencies

2014-05-02 Thread Shivani Rao
t 5:21 AM, Koert Kuipers wrote: > not sure why applying concat to reference. conf didn't work for you. since > it simply concatenates the files the key akka.version should be preserved. > we had the same situation for a while without issues. > On May 1, 2014 8:46 PM, "Shivani Rao

Re: Spark: issues with running a sbt fat jar due to akka dependencies

2014-05-01 Thread Shivani Rao
d for for spark itself: > case "reference.conf" => MergeStrategy.concat > > > On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao wrote: > >> Hello folks, >> >> I was going to post this question to spark user group as well. If you >> have any leads on how

Running Spark jobs via oozie

2014-05-01 Thread Shivani Rao
Hello Spark Fans, I am trying to run a spark job via oozie as a java action. The spark code is packaged as a MySparkJob.jar compiled using sbt assembly (excluding spark and hadoop dependencies). I am able to invoke the spark job from any client using java -cp lib/MySparkJob.jar:lib/spark-0.9-ass

Spark: issues with running a sbt fat jar due to akka dependencies

2014-04-29 Thread Shivani Rao
Hello folks, I was going to post this question to spark user group as well. If you have any leads on how to solve this issue please let me know: I am trying to build a basic spark project (spark depends on akka) and I am trying to create a fatjar using sbt assembly. The goal is to run the fatjar

Trouble getting hadoop and spark run along side on my vm

2014-03-17 Thread Shivani Rao
>From what i understand getting Spark to run alongside a hadoop cluster requires the following a) a working hadoop b) a compiled Spark c) configuration parameters that point spark to the right hadoop conf files i ) Can you let me know the specific steps to take after spark was compiled (via sbt a

Re: Problem when execute spark-shell

2014-03-17 Thread Shivani Rao
I am new and i don't know much either. But this is what helped me. a) Check if the compiled jar is in /spark-0.9.0-incubating/assembly/target/scala-2.10.1/ b) Try sbt package command c) spark-shell will only run from the root of the spark-0.9.0-incubating directory. I think the path of the shell s