Alain wrote
<<<
Hi Andy,

If I understand your question, you want to remake Dan Ingall's lively 
kernel ?
>>>

Hi Alain,
Although I am really impressed with Dan's work, Emscripten seems to be very 
different. 

In theory - and my knowledge in this is very limited - it might allow the VM C 
files to be transpiled into a very tight subset of JS. On the face of it, this 
is a crazy idea, but they have achieved amazing performance with things like 
the QT library. This made me wonder whether a Js version of pharo is possible. 

Just Idle conjecturing in. Friday night :-)



> On 14 Nov 2014, at 18:09, pharo-users-requ...@lists.pharo.org wrote:
> 
> Send Pharo-users mailing list submissions to
>    pharo-users@lists.pharo.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://lists.pharo.org/mailman/listinfo/pharo-users_lists.pharo.org
> or, via email, send a message with subject or body 'help' to
>    pharo-users-requ...@lists.pharo.org
> 
> You can reach the person managing the list at
>    pharo-users-ow...@lists.pharo.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pharo-users digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: running out of memory while processing a 220MB csv file
>      with NeoCSVReader - tips? (Paul DeBruicker)
>   2. Re: running out of memory while processing a 220MB    csv file
>      with NeoCSVReader - tips? (Sven Van Caekenberghe)
>   3. Re: Has anyone tried compiling the Pharo VM into JS    using
>      Emscripten? (Alain Rastoul)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 14 Nov 2014 14:14:51 -0800 (PST)
> From: Paul DeBruicker <pdebr...@gmail.com>
> To: pharo-users@lists.pharo.org
> Subject: Re: [Pharo-users] running out of memory while processing a
>    220MB csv file with NeoCSVReader - tips?
> Message-ID: <1416003291281-4790341.p...@n4.nabble.com>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi Sven
> 
> Yes, like I said earlier, after your first email, that I think its not a
> problem with NeoCSV as with what I'm doing and an out of memory condition.  
> 
> Have you ever seen a stack after sending kill -SIGUSR1 that looks like this:
> 
> output file stack is full.
> output file stack is full.
> output file stack is full.
> output file stack is full.
> output file stack is full.
> ....
> 
> 
> What does that mean?
> 
> Answers to your questions below.
> 
> Thanks again for helping me out
> 
> 
> 
> Sven Van Caekenberghe-2 wrote
>> OK then, you *can* read/process 300MB .csv files ;-)
>> 
>> What does your CSV file look like, can you show a couple of lines ?
>> 
>> here are 2 lines + a header:
>> 
>> "provnum","Provname","address","city","state","zip","survey_date_output","SurveyType","defpref","tag","tag_desc","scope","defstat","statdate","cycle","standard","complaint","filedate"
>> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET
>> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0314","Give
>> residents proper treatment to prevent new bed (pressure) sores or heal
>> existing bed sores.","D","Deficient, Provider has date of
>> correction","2013-10-10",1,"Y","N","2014-01-01"
>> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET
>> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0315","Ensure
>> that each resident who enters the nursing home without a catheter is not
>> given a catheter, unless medically necessary, and that incontinent
>> patients receive proper services to prevent urinary tract infections and
>> restore normal bladder functions.","D","Deficient, Provider has date of
>> correction","2013-10-10",1,"Y","N","2014-01-01"
>> 
>> 
>> You are using a custom record class of your own, what does that look like
>> or do ?
>> 
>> A custom record class.    This is all publicly available data but I'm
>> keeping track of the performance of US based health care providers during
>> their annual inspections. So the records are notes of a deficiency during
>> the inspection and I'm keeping those notes in a collection in an instance
>> of the health care provider's class.   The custom record class just
>> converts the CSV record to objects (Integers, Strings, DateAndTime) and
>> then gets stuffed in the health care provider's deficiency history
>> OrderedCollection (which has about 100 items).    Again I don't think its
>> what I'm doing as much as the image isn't growing when it needs to.  
>> 
>> 
>> 
>> 
>> Maybe you can try using Array again ?
>> 
>> I've attempted to do it where I parse and convert the entire CSV into
>> domain objects then add them to the image and the parsing works fine, but
>> the system runs out of resources during the update phase.  
>> 
>> 
>> What percentage of records read do you keep ? In my example it was very
>> small. Have you tried calculating your memory usage ? 
>> 
>> 
>> I'm keeping some data from every record, but it doesn't load more than
>> 500MB of the data before falling over.  I am not attempting to load the
>> 9GB of CSV files into one image.  For 95% of the records in the CSV file
>> 20 of the 22 columns of the data is the same from file to file, just a
>> 'published date' and a 'time to expiration' date changes.   Each file
>> covers a month, with about 500k deficiencies.  Each month some
>> deficiencies are added to the file and some are resolved. So the total
>> number of deficiencies in the image is about 500k.  Of those records that
>> don't expire in a given month I'm adding the published date to a
>> collection of published dates for the record and also adding the "time to
>> expiration" to a collection of those to record what was made public and
>> letting the rest of the data get GC'd.  I don't only load those two
>> records because the other fields of the record in the CSV could change.    
>> 
>> I have not calculated the memory usage for the collection because I
>> thought it would have no problem fitting in the 2GB of RAM I have on this
>> machine.  
>> 
>> 
>> 
>>> On 14 Nov 2014, at 22:34, Paul DeBruicker &lt;
> 
>> pdebruic@
> 
>> &gt; wrote:
>>> 
>>> Yes. With the image & vm I'm having trouble with I get an array with
>>> 9,942
>>> elements in it.  So its works as you'd expect.
>>> 
>>> While processing the CSV file the image stays at about 60MB in RAM.  
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Sven Van Caekenberghe-2 wrote
>>>> Can you successfully run my example code ?
>>>> 
>>>>> On 14 Nov 2014, at 22:03, Paul DeBruicker &lt;
>>> 
>>>> pdebruic@
>>> 
>>>> &gt; wrote:
>>>>> 
>>>>> Hi Sven,
>>>>> 
>>>>> Thanks for taking a look and testing the NeoCSVReader portion for me. 
>>>>> You're right of course that there's something I'm doing that's slow. 
>>>>> But. 
>>>>> There is something I can't figure out yet.  
>>>>> 
>>>>> To provide a little more detail:
>>>>> 
>>>>> When the 'csv reading' process completes successfully profiling shows
>>>>> that
>>>>> most of the time is spent in NeoCSVReader>>#peekChar and using
>>>>> NeoCSVReader>>##addField: to convert a string to a DateAndTime. 
>>>>> Dropping
>>>>> the DateAndTime conversion speeds things up but doesn't stop it from
>>>>> running
>>>>> out of memory.  
>>>>> 
>>>>> I start the image with 
>>>>> 
>>>>> ./pharo-ui --memory 1000m myimage.image   
>>>>> 
>>>>> Splitting the CSV file helps:
>>>>> ~1.5MB  5,000 lines = 1.2 seconds.
>>>>> ~15MB   50,000 lines = 8 seconds.
>>>>> ~30MB   100,000 lines = 16 seconds.
>>>>> ~60MB   200,000 lines  = 45 seconds.
>>>>> 
>>>>> 
>>>>> It seems that when the CSV file crosses ~70MB in size things start
>>>>> going
>>>>> haywire with performance, and leads to the out of memory condition. 
>>>>> The
>>>>> processing never ends.  Sending "kill -SIGUSR1" prints a stack
>>>>> primarily
>>>>> composed of:
>>>>> 
>>>>> 0xbffc5d08 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n)
>>>>> OutOfMemory class
>>>>> 0xbffc5d20 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n)
>>>>> OutOfMemory class
>>>>> 0xbffc5d38 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n)
>>>>> OutOfMemory
>>>>> class
>>>>> 0xbffc5d50 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n)
>>>>> OutOfMemory class
>>>>> 0xbffc5d68 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n)
>>>>> OutOfMemory class
>>>>> 0xbffc5d80 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n)
>>>>> OutOfMemory
>>>>> class
>>>>> 0xbffc5d98 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n)
>>>>> OutOfMemory class
>>>>> 
>>>>> So it seems like its trying to signal that its out of memory after its
>>>>> out
>>>>> of memory which triggers another OutOfMemory error.  So that's why
>>>>> progress
>>>>> stops.  
>>>>> 
>>>>> 
>>>>> ** Aside - OutOfMemory should probably be refactored to be able to
>>>>> signal
>>>>> itself without taking up more memory, triggering itself infinitely. 
>>>>> Maybe
>>>>> it & its signalling morph infrastructure would be good as a singleton
>>>>> **
>>>>> 
>>>>> 
>>>>> 
>>>>> I'm confused about why it runs out of memory.  According to htop the
>>>>> image
>>>>> only takes up about 520-540 MB of RAM when it reaches the 'OutOfMemory'
>>>>> condition.  This Macbook Air laptop has 4GB, and has plenty of room for
>>>>> the
>>>>> image to grow.  Also I've specified a 1,000MB image size when starting. 
>>>>> So
>>>>> it should have plenty of room.  Is there something I should check or a
>>>>> flag
>>>>> somewhere that prevents it from growing on a Mac?  This is the latest
>>>>> Pharo30 VM.  
>>>>> 
>>>>> 
>>>>> Thanks for helping me get to the bottom of this
>>>>> 
>>>>> Paul
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Sven Van Caekenberghe-2 wrote
>>>>>> Hi Paul,
>>>>>> 
>>>>>> I think you must be doing something wrong with your class, the #do: is
>>>>>> implemented as streaming over the record one by one, never holding
>>>>>> more
>>>>>> than one in memory.
>>>>>> 
>>>>>> This is what I tried:
>>>>>> 
>>>>>> 'paul.csv' asFileReference writeStreamDo: [ :file|
>>>>>> ZnBufferedWriteStream on: file do: [ :out |
>>>>>>  (NeoCSVWriter on: out) in: [ :writer |
>>>>>>    writer writeHeader: { #Number. #Color. #Integer. #Boolean}.
>>>>>>    1 to: 1e7 do: [ :each |
>>>>>>      writer nextPut: { each. #(Red Green Blue) atRandom. 1e6
>>>>>> atRandom.
>>>>>> #(true false) atRandom } ] ] ] ].
>>>>>> 
>>>>>> This results in a 300Mb file:
>>>>>> 
>>>>>> $ ls -lah paul.csv 
>>>>>> -rw-r--r--@ 1 sven  staff   327M Nov 14 20:45 paul.csv
>>>>>> $ wc paul.csv 
>>>>>> 10000001 10000001 342781577 paul.csv
>>>>>> 
>>>>>> This is a selective read and collect (loads about 10K records):
>>>>>> 
>>>>>> Array streamContents: [ :out |
>>>>>> 'paul.csv' asFileReference readStreamDo: [ :in |
>>>>>>  (NeoCSVReader on: (ZnBufferedReadStream on: in)) in: [ :reader |
>>>>>>    reader skipHeader; addIntegerField; addSymbolField;
>>>>>> addIntegerField;
>>>>>> addFieldConverter: [ :x | x = #true ].
>>>>>>    reader do: [ :each | each third < 1000 ifTrue: [ out nextPut: each
>>>>>> ]
>>>>>> ] ] ] ].
>>>>>> 
>>>>>> This worked fine on my MacBook Air, no memory problems. It takes a
>>>>>> while
>>>>>> to parse that much data, of course.
>>>>>> 
>>>>>> Sven
>>>>>> 
>>>>>>> On 14 Nov 2014, at 19:08, Paul DeBruicker &lt;
>>>>> 
>>>>>> pdebruic@
>>>>> 
>>>>>> &gt; wrote:
>>>>>>> 
>>>>>>> Hi -
>>>>>>> 
>>>>>>> I'm processing a 9 GBs of CSV files (the biggest file is 220MB or
>>>>>>> so). 
>>>>>>> I'm not sure if its because of the size of the files or the code I've
>>>>>>> written to keep track of the domain objects I'm interested in, but
>>>>>>> I'm
>>>>>>> getting out of memory errors & crashes in Pharo 3 on Mac with the
>>>>>>> latest
>>>>>>> VM.  I haven't checked other vms.  
>>>>>>> 
>>>>>>> I'm going to profile my own code and attempt to split the files
>>>>>>> manually
>>>>>>> for now to see what else it could be. 
>>>>>>> 
>>>>>>> 
>>>>>>> Right now I'm doing something similar to
>>>>>>> 
>>>>>>>    |file reader|
>>>>>>>    file:= '/path/to/file/myfile.csv' asFileReference readStream.
>>>>>>>    reader: NeoCSVReader on: file
>>>>>>> 
>>>>>>>    reader
>>>>>>>        recordClass: MyClass; 
>>>>>>>        skipHeader;
>>>>>>>        addField: #myField:;
>>>>>>>        ....
>>>>>>>    
>>>>>>> 
>>>>>>>    reader do:[:eachRecord | self seeIfRecordIsInterestingAndIfSoKeepIt:
>>>>>>> eachRecord].
>>>>>>>    file close.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Is there a facility in NeoCSVReader to read a file in batches (e.g.
>>>>>>> 1000
>>>>>>> lines at a time) or an easy way to do that ?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Paul
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790319.html
>>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790328.html
>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790341.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sat, 15 Nov 2014 00:07:35 +0100
> From: Sven Van Caekenberghe <s...@stfx.eu>
> To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
> Subject: Re: [Pharo-users] running out of memory while processing a
>    220MB    csv file with NeoCSVReader - tips?
> Message-ID: <4196b7d2-63c3-4e17-89df-a0c7aca91...@stfx.eu>
> Content-Type: text/plain; charset=us-ascii
> 
> 
>> On 14 Nov 2014, at 23:14, Paul DeBruicker <pdebr...@gmail.com> wrote:
>> 
>> Hi Sven
>> 
>> Yes, like I said earlier, after your first email, that I think its not a
>> problem with NeoCSV as with what I'm doing and an out of memory condition.  
>> 
>> Have you ever seen a stack after sending kill -SIGUSR1 that looks like this:
>> 
>> output file stack is full.
>> output file stack is full.
>> output file stack is full.
>> output file stack is full.
>> output file stack is full.
>> ....
>> 
>> 
>> What does that mean?
> 
> I don't know, but I think that you are really out of memory.
> BTW, I think that setting no flags is better, memory will expand maximally 
> then.
> I think the useful maximum is closer to 1GB than 2GB.
> 
>> Answers to your questions below.
> 
> It is difficult to follow what you are doing exactly, but I think that you 
> underestimate how much memory a parsed, structured/nested object uses. Taking 
> the second line of your example, the 20+ fields, with 3 DateAndTimes, easily 
> cost between 512 and 1024 bytes per record. That would limit you to between 
> 1M and 2M records.
> 
> I tried this:
> 
> Array streamContents: [ :data |
>    5e2 timesRepeat: [ 
>        data nextPut: (Array streamContents: [ :out |
>            20 timesRepeat: [ out nextPut: Character alphabet ].
>            3 timesRepeat: [ out nextPut: DateAndTime now ] ]) ] ].
> 
> it worked to 5e5, but not for 5e6 - I didn't try numbers in between as it 
> takes very long.
> 
> Good luck, if you can solve this, please tell us how you did it.
> 
>> Thanks again for helping me out
>> 
>> 
>> 
>> Sven Van Caekenberghe-2 wrote
>>> OK then, you *can* read/process 300MB .csv files ;-)
>>> 
>>> What does your CSV file look like, can you show a couple of lines ?
>>> 
>>> here are 2 lines + a header:
>>> 
>>> "provnum","Provname","address","city","state","zip","survey_date_output","SurveyType","defpref","tag","tag_desc","scope","defstat","statdate","cycle","standard","complaint","filedate"
>>> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET
>>> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0314","Give
>>> residents proper treatment to prevent new bed (pressure) sores or heal
>>> existing bed sores.","D","Deficient, Provider has date of
>>> correction","2013-10-10",1,"Y","N","2014-01-01"
>>> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET
>>> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0315","Ensure
>>> that each resident who enters the nursing home without a catheter is not
>>> given a catheter, unless medically necessary, and that incontinent
>>> patients receive proper services to prevent urinary tract infections and
>>> restore normal bladder functions.","D","Deficient, Provider has date of
>>> correction","2013-10-10",1,"Y","N","2014-01-01"
>>> 
>>> 
>>> You are using a custom record class of your own, what does that look like
>>> or do ?
>>> 
>>> A custom record class.    This is all publicly available data but I'm
>>> keeping track of the performance of US based health care providers during
>>> their annual inspections. So the records are notes of a deficiency during
>>> the inspection and I'm keeping those notes in a collection in an instance
>>> of the health care provider's class.   The custom record class just
>>> converts the CSV record to objects (Integers, Strings, DateAndTime) and
>>> then gets stuffed in the health care provider's deficiency history
>>> OrderedCollection (which has about 100 items).    Again I don't think its
>>> what I'm doing as much as the image isn't growing when it needs to.  
>>> 
>>> 
>>> 
>>> 
>>> Maybe you can try using Array again ?
>>> 
>>> I've attempted to do it where I parse and convert the entire CSV into
>>> domain objects then add them to the image and the parsing works fine, but
>>> the system runs out of resources during the update phase.  
>>> 
>>> 
>>> What percentage of records read do you keep ? In my example it was very
>>> small. Have you tried calculating your memory usage ? 
>>> 
>>> 
>>> I'm keeping some data from every record, but it doesn't load more than
>>> 500MB of the data before falling over.  I am not attempting to load the
>>> 9GB of CSV files into one image.  For 95% of the records in the CSV file
>>> 20 of the 22 columns of the data is the same from file to file, just a
>>> 'published date' and a 'time to expiration' date changes.   Each file
>>> covers a month, with about 500k deficiencies.  Each month some
>>> deficiencies are added to the file and some are resolved. So the total
>>> number of deficiencies in the image is about 500k.  Of those records that
>>> don't expire in a given month I'm adding the published date to a
>>> collection of published dates for the record and also adding the "time to
>>> expiration" to a collection of those to record what was made public and
>>> letting the rest of the data get GC'd.  I don't only load those two
>>> records because the other fields of the record in the CSV could change.    
>>> 
>>> I have not calculated the memory usage for the collection because I
>>> thought it would have no problem fitting in the 2GB of RAM I have on this
>>> machine.  
>>> 
>>> 
>>> 
>>>> On 14 Nov 2014, at 22:34, Paul DeBruicker &lt;
>> 
>>> pdebruic@
>> 
>>> &gt; wrote:
>>>> 
>>>> Yes. With the image & vm I'm having trouble with I get an array with
>>>> 9,942
>>>> elements in it.  So its works as you'd expect.
>>>> 
>>>> While processing the CSV file the image stays at about 60MB in RAM.  
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Sven Van Caekenberghe-2 wrote
>>>>> Can you successfully run my example code ?
>>>>> 
>>>>>> On 14 Nov 2014, at 22:03, Paul DeBruicker &lt;
>>>> 
>>>>> pdebruic@
>>>> 
>>>>> &gt; wrote:
>>>>>> 
>>>>>> Hi Sven,
>>>>>> 
>>>>>> Thanks for taking a look and testing the NeoCSVReader portion for me. 
>>>>>> You're right of course that there's something I'm doing that's slow. 
>>>>>> But. 
>>>>>> There is something I can't figure out yet.  
>>>>>> 
>>>>>> To provide a little more detail:
>>>>>> 
>>>>>> When the 'csv reading' process completes successfully profiling shows
>>>>>> that
>>>>>> most of the time is spent in NeoCSVReader>>#peekChar and using
>>>>>> NeoCSVReader>>##addField: to convert a string to a DateAndTime. 
>>>>>> Dropping
>>>>>> the DateAndTime conversion speeds things up but doesn't stop it from
>>>>>> running
>>>>>> out of memory.  
>>>>>> 
>>>>>> I start the image with 
>>>>>> 
>>>>>> ./pharo-ui --memory 1000m myimage.image   
>>>>>> 
>>>>>> Splitting the CSV file helps:
>>>>>> ~1.5MB  5,000 lines = 1.2 seconds.
>>>>>> ~15MB   50,000 lines = 8 seconds.
>>>>>> ~30MB   100,000 lines = 16 seconds.
>>>>>> ~60MB   200,000 lines  = 45 seconds.
>>>>>> 
>>>>>> 
>>>>>> It seems that when the CSV file crosses ~70MB in size things start
>>>>>> going
>>>>>> haywire with performance, and leads to the out of memory condition. 
>>>>>> The
>>>>>> processing never ends.  Sending "kill -SIGUSR1" prints a stack
>>>>>> primarily
>>>>>> composed of:
>>>>>> 
>>>>>> 0xbffc5d08 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n)
>>>>>> OutOfMemory class
>>>>>> 0xbffc5d20 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n)
>>>>>> OutOfMemory class
>>>>>> 0xbffc5d38 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n)
>>>>>> OutOfMemory
>>>>>> class
>>>>>> 0xbffc5d50 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n)
>>>>>> OutOfMemory class
>>>>>> 0xbffc5d68 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n)
>>>>>> OutOfMemory class
>>>>>> 0xbffc5d80 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n)
>>>>>> OutOfMemory
>>>>>> class
>>>>>> 0xbffc5d98 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n)
>>>>>> OutOfMemory class
>>>>>> 
>>>>>> So it seems like its trying to signal that its out of memory after its
>>>>>> out
>>>>>> of memory which triggers another OutOfMemory error.  So that's why
>>>>>> progress
>>>>>> stops.  
>>>>>> 
>>>>>> 
>>>>>> ** Aside - OutOfMemory should probably be refactored to be able to
>>>>>> signal
>>>>>> itself without taking up more memory, triggering itself infinitely. 
>>>>>> Maybe
>>>>>> it & its signalling morph infrastructure would be good as a singleton
>>>>>> **
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I'm confused about why it runs out of memory.  According to htop the
>>>>>> image
>>>>>> only takes up about 520-540 MB of RAM when it reaches the 'OutOfMemory'
>>>>>> condition.  This Macbook Air laptop has 4GB, and has plenty of room for
>>>>>> the
>>>>>> image to grow.  Also I've specified a 1,000MB image size when starting. 
>>>>>> So
>>>>>> it should have plenty of room.  Is there something I should check or a
>>>>>> flag
>>>>>> somewhere that prevents it from growing on a Mac?  This is the latest
>>>>>> Pharo30 VM.  
>>>>>> 
>>>>>> 
>>>>>> Thanks for helping me get to the bottom of this
>>>>>> 
>>>>>> Paul
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Sven Van Caekenberghe-2 wrote
>>>>>>> Hi Paul,
>>>>>>> 
>>>>>>> I think you must be doing something wrong with your class, the #do: is
>>>>>>> implemented as streaming over the record one by one, never holding
>>>>>>> more
>>>>>>> than one in memory.
>>>>>>> 
>>>>>>> This is what I tried:
>>>>>>> 
>>>>>>> 'paul.csv' asFileReference writeStreamDo: [ :file|
>>>>>>> ZnBufferedWriteStream on: file do: [ :out |
>>>>>>> (NeoCSVWriter on: out) in: [ :writer |
>>>>>>>   writer writeHeader: { #Number. #Color. #Integer. #Boolean}.
>>>>>>>   1 to: 1e7 do: [ :each |
>>>>>>>     writer nextPut: { each. #(Red Green Blue) atRandom. 1e6
>>>>>>> atRandom.
>>>>>>> #(true false) atRandom } ] ] ] ].
>>>>>>> 
>>>>>>> This results in a 300Mb file:
>>>>>>> 
>>>>>>> $ ls -lah paul.csv 
>>>>>>> -rw-r--r--@ 1 sven  staff   327M Nov 14 20:45 paul.csv
>>>>>>> $ wc paul.csv 
>>>>>>> 10000001 10000001 342781577 paul.csv
>>>>>>> 
>>>>>>> This is a selective read and collect (loads about 10K records):
>>>>>>> 
>>>>>>> Array streamContents: [ :out |
>>>>>>> 'paul.csv' asFileReference readStreamDo: [ :in |
>>>>>>> (NeoCSVReader on: (ZnBufferedReadStream on: in)) in: [ :reader |
>>>>>>>   reader skipHeader; addIntegerField; addSymbolField;
>>>>>>> addIntegerField;
>>>>>>> addFieldConverter: [ :x | x = #true ].
>>>>>>>   reader do: [ :each | each third < 1000 ifTrue: [ out nextPut: each
>>>>>>> ]
>>>>>>> ] ] ] ].
>>>>>>> 
>>>>>>> This worked fine on my MacBook Air, no memory problems. It takes a
>>>>>>> while
>>>>>>> to parse that much data, of course.
>>>>>>> 
>>>>>>> Sven
>>>>>>> 
>>>>>>>> On 14 Nov 2014, at 19:08, Paul DeBruicker &lt;
>>>>>> 
>>>>>>> pdebruic@
>>>>>> 
>>>>>>> &gt; wrote:
>>>>>>>> 
>>>>>>>> Hi -
>>>>>>>> 
>>>>>>>> I'm processing a 9 GBs of CSV files (the biggest file is 220MB or
>>>>>>>> so). 
>>>>>>>> I'm not sure if its because of the size of the files or the code I've
>>>>>>>> written to keep track of the domain objects I'm interested in, but
>>>>>>>> I'm
>>>>>>>> getting out of memory errors & crashes in Pharo 3 on Mac with the
>>>>>>>> latest
>>>>>>>> VM.  I haven't checked other vms.  
>>>>>>>> 
>>>>>>>> I'm going to profile my own code and attempt to split the files
>>>>>>>> manually
>>>>>>>> for now to see what else it could be. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Right now I'm doing something similar to
>>>>>>>> 
>>>>>>>>    |file reader|
>>>>>>>>    file:= '/path/to/file/myfile.csv' asFileReference readStream.
>>>>>>>>    reader: NeoCSVReader on: file
>>>>>>>> 
>>>>>>>>    reader
>>>>>>>>        recordClass: MyClass; 
>>>>>>>>        skipHeader;
>>>>>>>>        addField: #myField:;
>>>>>>>>        ....
>>>>>>>>    
>>>>>>>> 
>>>>>>>>    reader do:[:eachRecord | self seeIfRecordIsInterestingAndIfSoKeepIt:
>>>>>>>> eachRecord].
>>>>>>>>    file close.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Is there a facility in NeoCSVReader to read a file in batches (e.g.
>>>>>>>> 1000
>>>>>>>> lines at a time) or an easy way to do that ?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> Paul
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790319.html
>>>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790328.html
>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790341.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
> 
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Sat, 15 Nov 2014 00:09:48 +0100
> From: Alain Rastoul <alf.mmm....@gmail.com>
> To: pharo-users@lists.pharo.org
> Subject: Re: [Pharo-users] Has anyone tried compiling the Pharo VM
>    into JS    using Emscripten?
> Message-ID: <m4623q$btk$1...@ger.gmane.org>
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> Hi Andy,
> 
> If I understand your question, you want to remake Dan Ingall's lively 
> kernel ?
> :)
> http://lively-web.org/welcome.html
> 
> Cheers,
> Alain
> 
> 
> Le 14/11/2014 22:31, Andy Burnett a ?crit :
>> I just saw this implementation of SQLite as a JS system, via Emscripten,
>> and I was curious whether something similar would be even vaguely
>> possible for the VM.
>> 
>> Cheers
>> Andy
>> ?
> 
> 
> 
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> Pharo-users mailing list
> Pharo-users@lists.pharo.org
> http://lists.pharo.org/mailman/listinfo/pharo-users_lists.pharo.org
> 
> 
> ------------------------------
> 
> End of Pharo-users Digest, Vol 19, Issue 53
> *******************************************

Reply via email to