The same problem occurs on my desktop at work. What's great with AWS Workspace is that you can easily reproduce it.
I created the test file with commands : for i in {0..300000}; do VALUE="$RANDOM" for j in {0..6}; do VALUE="$VALUE;$RANDOM"; done echo $VALUE >> test.csv done Christopher Bourez 06 17 17 50 60 On Mon, Jan 25, 2016 at 10:01 PM, Christopher Bourez < christopher.bou...@gmail.com> wrote: > Josh, > > Thanks a lot ! > > You can download a video I created : > https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov > > I created a sample file of 13 MB as explained : > https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv > > Here are the commands I did : > > I created an Aws Workspace with Windows 7 (that I can share you if you'd > like) with Standard instance, 2GiB RAM > On this instance : > I downloaded spark (1.5 or 1.6 same pb) with hadoop 2.6 > installed java 8 jdk > downloaded python 2.7.8 > > downloaded the sample file > https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv > > And then the command lines I launch are : > bin\pyspark --master local[1] > sc.textFile("test.csv").take(1) > > As you can see, sc.textFile("test.csv", 2000).take(1) works well > > Thanks a lot ! > > > Christopher Bourez > 06 17 17 50 60 > > On Mon, Jan 25, 2016 at 8:02 PM, Josh Rosen <joshro...@databricks.com> > wrote: > >> Hi Christopher, >> >> What would be super helpful here is a standalone reproduction. Ideally >> this would be a single Scala file or set of commands that I can run in >> `spark-shell` in order to reproduce this. Ideally, this code would generate >> a giant file, then try to read it in a way that demonstrates the bug. If >> you have such a reproduction, could you attach it to that JIRA ticket? >> Thanks! >> >> On Mon, Jan 25, 2016 at 7:53 AM Christopher Bourez < >> christopher.bou...@gmail.com> wrote: >> >>> Dears, >>> >>> I would like to re-open a case for a potential bug (current status is >>> resolved but it sounds not) : >>> >>> *https://issues.apache.org/jira/browse/SPARK-12261 >>> <https://issues.apache.org/jira/browse/SPARK-12261>* >>> >>> I believe there is something wrong about the memory management under >>> windows >>> >>> It has no sense to work with files smaller than a few Mo... >>> >>> Do not hesitate to ask me questions if you try to help and reproduce the >>> bug, >>> >>> Best >>> >>> Christopher Bourez >>> 06 17 17 50 60 >>> >> >