Dears, I recompiled Spark on Windows, sounds to work better. My problem with Pyspark remains : https://issues.apache.org/jira/browse/SPARK-12261
I do not know how to debug this, sounds to be linked with Pickle, the garbage collector... I would like to clear the Spark context to see if I can gain anything. Christopher Bourez 06 17 17 50 60 On Mon, Jan 25, 2016 at 10:14 PM, Christopher Bourez < christopher.bou...@gmail.com> wrote: > Here is a pic of memory > If I put --conf spark.driver.memory=3g, it increases the displaid memory, > but the problem remains... for a file that is only 13M. > > Christopher Bourez > 06 17 17 50 60 > > On Mon, Jan 25, 2016 at 10:06 PM, Christopher Bourez < > christopher.bou...@gmail.com> wrote: > >> The same problem occurs on my desktop at work. >> What's great with AWS Workspace is that you can easily reproduce it. >> >> I created the test file with commands : >> >> for i in {0..300000}; do >> VALUE="$RANDOM" >> for j in {0..6}; do >> VALUE="$VALUE;$RANDOM"; >> done >> echo $VALUE >> test.csv >> done >> >> Christopher Bourez >> 06 17 17 50 60 >> >> On Mon, Jan 25, 2016 at 10:01 PM, Christopher Bourez < >> christopher.bou...@gmail.com> wrote: >> >>> Josh, >>> >>> Thanks a lot ! >>> >>> You can download a video I created : >>> https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov >>> >>> I created a sample file of 13 MB as explained : >>> https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv >>> >>> Here are the commands I did : >>> >>> I created an Aws Workspace with Windows 7 (that I can share you if you'd >>> like) with Standard instance, 2GiB RAM >>> On this instance : >>> I downloaded spark (1.5 or 1.6 same pb) with hadoop 2.6 >>> installed java 8 jdk >>> downloaded python 2.7.8 >>> >>> downloaded the sample file >>> https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv >>> >>> And then the command lines I launch are : >>> bin\pyspark --master local[1] >>> sc.textFile("test.csv").take(1) >>> >>> As you can see, sc.textFile("test.csv", 2000).take(1) works well >>> >>> Thanks a lot ! >>> >>> >>> Christopher Bourez >>> 06 17 17 50 60 >>> >>> On Mon, Jan 25, 2016 at 8:02 PM, Josh Rosen <joshro...@databricks.com> >>> wrote: >>> >>>> Hi Christopher, >>>> >>>> What would be super helpful here is a standalone reproduction. Ideally >>>> this would be a single Scala file or set of commands that I can run in >>>> `spark-shell` in order to reproduce this. Ideally, this code would generate >>>> a giant file, then try to read it in a way that demonstrates the bug. If >>>> you have such a reproduction, could you attach it to that JIRA ticket? >>>> Thanks! >>>> >>>> On Mon, Jan 25, 2016 at 7:53 AM Christopher Bourez < >>>> christopher.bou...@gmail.com> wrote: >>>> >>>>> Dears, >>>>> >>>>> I would like to re-open a case for a potential bug (current status is >>>>> resolved but it sounds not) : >>>>> >>>>> *https://issues.apache.org/jira/browse/SPARK-12261 >>>>> <https://issues.apache.org/jira/browse/SPARK-12261>* >>>>> >>>>> I believe there is something wrong about the memory management under >>>>> windows >>>>> >>>>> It has no sense to work with files smaller than a few Mo... >>>>> >>>>> Do not hesitate to ask me questions if you try to help and reproduce >>>>> the bug, >>>>> >>>>> Best >>>>> >>>>> Christopher Bourez >>>>> 06 17 17 50 60 >>>>> >>>> >>> >> >