Hello Eugene,
Thanks for your patience and answers. The issue was that one of the third
party libraries was not build with "sbt assembly" but just packaged as "sbt
package". So it did not contain all the source dependencies.
Thanks for all your help
Shivani
On Fri, Jun 20, 2014 at 1:46 PM, Eug
In short, ADD_JARS will add the jar to your driver classpath and also send
it to the workers (similar to what you are doing when you do sc.addJars).
ex: MASTER=master/url ADD_JARS=/path/to/myJob.jar ./bin/spark-shell
You also have SPARK_CLASSPATH var but it does not distribute the code, it
is on
Hello Eugene,
You are right about this. I did encounter the "pergmgenspace" in the spark
shell. Can you tell me a little more about "ADD_JARS". In order to ensure
my spark_shell has all required jars, I added the jars to the "$CLASSPATH"
in the compute_classpath.sh script. is there another way of
In my case it was due to a case class I was defining in the spark-shell and
not being available on the workers. So packaging it in a jar and adding it
with ADD_JARS solved the problem. Note that I don't exactly remember if it
was an out of heap space exception or pergmen space. Make sure your
jarsP
Hello Abhi, I did try that and it did not work
And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So
how did you overcome this problem?
Shivani
On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi wrote:
>
> Le 20 juin 2014 01:46, "Shivani Rao" a écrit :
>
> >
> > Hello Andrew,
>
Le 20 juin 2014 01:46, "Shivani Rao" a écrit :
>
> Hello Andrew,
>
> i wish I could share the code, but for proprietary reasons I can't. But I
can give some idea though of what i am trying to do. The job reads a file
and for each line of that file and processors these lines. I am not doing
anythin
Once you have generated the final RDD before submitting it to reducer try to
repartition the RDD either using coalesce(partitions) or repartition() into
known partitions. 2. Rule of thumb to create number of data partitions (3 *
num_executors * cores_per_executor).
--
View this message in conte
Hello Andrew,
i wish I could share the code, but for proprietary reasons I can't. But I
can give some idea though of what i am trying to do. The job reads a file
and for each line of that file and processors these lines. I am not doing
anything intense in the "processLogs" function
import argonau
Wait, so the file only has four lines and the job running out of heap
space? Can you share the code you're running that does the processing?
I'd guess that you're doing some intense processing on every line but just
writing parsed case classes back to disk sounds very lightweight.
I
On Wed, Ju