Re: bug for large textfiles on windows

Christopher Bourez Mon, 25 Jan 2016 13:07:12 -0800

The same problem occurs on my desktop at work.
What's great with AWS Workspace is that you can easily reproduce it.


I created the test file with commands :

for i in {0..300000}; do
  VALUE="$RANDOM"
  for j in {0..6}; do
    VALUE="$VALUE;$RANDOM";
  done
  echo $VALUE >> test.csv
done

Christopher Bourez
06 17 17 50 60

On Mon, Jan 25, 2016 at 10:01 PM, Christopher Bourez <
christopher.bou...@gmail.com> wrote:

> Josh,
>
> Thanks a lot !
>
> You can download a video I created :
> https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov
>
> I created a sample file of 13 MB as explained :
> https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
>
> Here are the commands I did :
>
> I created an Aws Workspace with Windows 7 (that I can share you if you'd
> like) with Standard instance, 2GiB RAM
> On this instance :
> I downloaded spark (1.5 or 1.6 same pb) with hadoop 2.6
> installed java 8 jdk
> downloaded python 2.7.8
>
> downloaded the sample file
> https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv
>
> And then the command lines I launch are :
> bin\pyspark --master local[1]
> sc.textFile("test.csv").take(1)
>
> As you can see, sc.textFile("test.csv", 2000).take(1) works well
>
> Thanks a lot !
>
>
> Christopher Bourez
> 06 17 17 50 60
>
> On Mon, Jan 25, 2016 at 8:02 PM, Josh Rosen <joshro...@databricks.com>
> wrote:
>
>> Hi Christopher,
>>
>> What would be super helpful here is a standalone reproduction. Ideally
>> this would be a single Scala file or set of commands that I can run in
>> `spark-shell` in order to reproduce this. Ideally, this code would generate
>> a giant file, then try to read it in a way that demonstrates the bug. If
>> you have such a reproduction, could you attach it to that JIRA ticket?
>> Thanks!
>>
>> On Mon, Jan 25, 2016 at 7:53 AM Christopher Bourez <
>> christopher.bou...@gmail.com> wrote:
>>
>>> Dears,
>>>
>>> I would like to re-open a case for a potential bug (current status is
>>> resolved but it sounds not) :
>>>
>>> *https://issues.apache.org/jira/browse/SPARK-12261
>>> <https://issues.apache.org/jira/browse/SPARK-12261>*
>>>
>>> I believe there is something wrong about the memory management under
>>> windows
>>>
>>> It has no sense to work with files smaller than a few Mo...
>>>
>>> Do not hesitate to ask me questions if you try to help and reproduce the
>>> bug,
>>>
>>> Best
>>>
>>> Christopher Bourez
>>> 06 17 17 50 60
>>>
>>
>

Re: bug for large textfiles on windows

Reply via email to