Dears,
I recompiled Spark on Windows, sounds to work better. My problem with
Pyspark remains :
https://issues.apache.org/jira/browse/SPARK-12261
I do not know how to debug this, sounds to be linked with Pickle, the
garbage collector... I would like to clear the Spark context to see if I
can gain
Here is a pic of memory
If I put --conf spark.driver.memory=3g, it increases the displaid memory,
but the problem remains... for a file that is only 13M.
Christopher Bourez
06 17 17 50 60
On Mon, Jan 25, 2016 at 10:06 PM, Christopher Bourez <
christopher.bou...@gmail.com> wrote:
> The same probl
The same problem occurs on my desktop at work.
What's great with AWS Workspace is that you can easily reproduce it.
I created the test file with commands :
for i in {0..30}; do
VALUE="$RANDOM"
for j in {0..6}; do
VALUE="$VALUE;$RANDOM";
done
echo $VALUE >> test.csv
done
Christoph
Josh,
Thanks a lot !
You can download a video I created :
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov
I created a sample file of 13 MB as explained :
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
Here are the commands I did :
I created an Aws Wo
Hi Christopher,
What would be super helpful here is a standalone reproduction. Ideally this
would be a single Scala file or set of commands that I can run in
`spark-shell` in order to reproduce this. Ideally, this code would generate
a giant file, then try to read it in a way that demonstrates the