Alain wrote <<< Hi Andy, If I understand your question, you want to remake Dan Ingall's lively kernel ? >>>
Hi Alain, Although I am really impressed with Dan's work, Emscripten seems to be very different. In theory - and my knowledge in this is very limited - it might allow the VM C files to be transpiled into a very tight subset of JS. On the face of it, this is a crazy idea, but they have achieved amazing performance with things like the QT library. This made me wonder whether a Js version of pharo is possible. Just Idle conjecturing in. Friday night :-) > On 14 Nov 2014, at 18:09, pharo-users-requ...@lists.pharo.org wrote: > > Send Pharo-users mailing list submissions to > pharo-users@lists.pharo.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.pharo.org/mailman/listinfo/pharo-users_lists.pharo.org > or, via email, send a message with subject or body 'help' to > pharo-users-requ...@lists.pharo.org > > You can reach the person managing the list at > pharo-users-ow...@lists.pharo.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pharo-users digest..." > > > Today's Topics: > > 1. Re: running out of memory while processing a 220MB csv file > with NeoCSVReader - tips? (Paul DeBruicker) > 2. Re: running out of memory while processing a 220MB csv file > with NeoCSVReader - tips? (Sven Van Caekenberghe) > 3. Re: Has anyone tried compiling the Pharo VM into JS using > Emscripten? (Alain Rastoul) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 14 Nov 2014 14:14:51 -0800 (PST) > From: Paul DeBruicker <pdebr...@gmail.com> > To: pharo-users@lists.pharo.org > Subject: Re: [Pharo-users] running out of memory while processing a > 220MB csv file with NeoCSVReader - tips? > Message-ID: <1416003291281-4790341.p...@n4.nabble.com> > Content-Type: text/plain; charset=us-ascii > > Hi Sven > > Yes, like I said earlier, after your first email, that I think its not a > problem with NeoCSV as with what I'm doing and an out of memory condition. > > Have you ever seen a stack after sending kill -SIGUSR1 that looks like this: > > output file stack is full. > output file stack is full. > output file stack is full. > output file stack is full. > output file stack is full. > .... > > > What does that mean? > > Answers to your questions below. > > Thanks again for helping me out > > > > Sven Van Caekenberghe-2 wrote >> OK then, you *can* read/process 300MB .csv files ;-) >> >> What does your CSV file look like, can you show a couple of lines ? >> >> here are 2 lines + a header: >> >> "provnum","Provname","address","city","state","zip","survey_date_output","SurveyType","defpref","tag","tag_desc","scope","defstat","statdate","cycle","standard","complaint","filedate" >> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET >> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0314","Give >> residents proper treatment to prevent new bed (pressure) sores or heal >> existing bed sores.","D","Deficient, Provider has date of >> correction","2013-10-10",1,"Y","N","2014-01-01" >> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET >> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0315","Ensure >> that each resident who enters the nursing home without a catheter is not >> given a catheter, unless medically necessary, and that incontinent >> patients receive proper services to prevent urinary tract infections and >> restore normal bladder functions.","D","Deficient, Provider has date of >> correction","2013-10-10",1,"Y","N","2014-01-01" >> >> >> You are using a custom record class of your own, what does that look like >> or do ? >> >> A custom record class. This is all publicly available data but I'm >> keeping track of the performance of US based health care providers during >> their annual inspections. So the records are notes of a deficiency during >> the inspection and I'm keeping those notes in a collection in an instance >> of the health care provider's class. The custom record class just >> converts the CSV record to objects (Integers, Strings, DateAndTime) and >> then gets stuffed in the health care provider's deficiency history >> OrderedCollection (which has about 100 items). Again I don't think its >> what I'm doing as much as the image isn't growing when it needs to. >> >> >> >> >> Maybe you can try using Array again ? >> >> I've attempted to do it where I parse and convert the entire CSV into >> domain objects then add them to the image and the parsing works fine, but >> the system runs out of resources during the update phase. >> >> >> What percentage of records read do you keep ? In my example it was very >> small. Have you tried calculating your memory usage ? >> >> >> I'm keeping some data from every record, but it doesn't load more than >> 500MB of the data before falling over. I am not attempting to load the >> 9GB of CSV files into one image. For 95% of the records in the CSV file >> 20 of the 22 columns of the data is the same from file to file, just a >> 'published date' and a 'time to expiration' date changes. Each file >> covers a month, with about 500k deficiencies. Each month some >> deficiencies are added to the file and some are resolved. So the total >> number of deficiencies in the image is about 500k. Of those records that >> don't expire in a given month I'm adding the published date to a >> collection of published dates for the record and also adding the "time to >> expiration" to a collection of those to record what was made public and >> letting the rest of the data get GC'd. I don't only load those two >> records because the other fields of the record in the CSV could change. >> >> I have not calculated the memory usage for the collection because I >> thought it would have no problem fitting in the 2GB of RAM I have on this >> machine. >> >> >> >>> On 14 Nov 2014, at 22:34, Paul DeBruicker < > >> pdebruic@ > >> > wrote: >>> >>> Yes. With the image & vm I'm having trouble with I get an array with >>> 9,942 >>> elements in it. So its works as you'd expect. >>> >>> While processing the CSV file the image stays at about 60MB in RAM. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Sven Van Caekenberghe-2 wrote >>>> Can you successfully run my example code ? >>>> >>>>> On 14 Nov 2014, at 22:03, Paul DeBruicker < >>> >>>> pdebruic@ >>> >>>> > wrote: >>>>> >>>>> Hi Sven, >>>>> >>>>> Thanks for taking a look and testing the NeoCSVReader portion for me. >>>>> You're right of course that there's something I'm doing that's slow. >>>>> But. >>>>> There is something I can't figure out yet. >>>>> >>>>> To provide a little more detail: >>>>> >>>>> When the 'csv reading' process completes successfully profiling shows >>>>> that >>>>> most of the time is spent in NeoCSVReader>>#peekChar and using >>>>> NeoCSVReader>>##addField: to convert a string to a DateAndTime. >>>>> Dropping >>>>> the DateAndTime conversion speeds things up but doesn't stop it from >>>>> running >>>>> out of memory. >>>>> >>>>> I start the image with >>>>> >>>>> ./pharo-ui --memory 1000m myimage.image >>>>> >>>>> Splitting the CSV file helps: >>>>> ~1.5MB 5,000 lines = 1.2 seconds. >>>>> ~15MB 50,000 lines = 8 seconds. >>>>> ~30MB 100,000 lines = 16 seconds. >>>>> ~60MB 200,000 lines = 45 seconds. >>>>> >>>>> >>>>> It seems that when the CSV file crosses ~70MB in size things start >>>>> going >>>>> haywire with performance, and leads to the out of memory condition. >>>>> The >>>>> processing never ends. Sending "kill -SIGUSR1" prints a stack >>>>> primarily >>>>> composed of: >>>>> >>>>> 0xbffc5d08 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n) >>>>> OutOfMemory class >>>>> 0xbffc5d20 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n) >>>>> OutOfMemory class >>>>> 0xbffc5d38 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n) >>>>> OutOfMemory >>>>> class >>>>> 0xbffc5d50 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n) >>>>> OutOfMemory class >>>>> 0xbffc5d68 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n) >>>>> OutOfMemory class >>>>> 0xbffc5d80 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n) >>>>> OutOfMemory >>>>> class >>>>> 0xbffc5d98 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n) >>>>> OutOfMemory class >>>>> >>>>> So it seems like its trying to signal that its out of memory after its >>>>> out >>>>> of memory which triggers another OutOfMemory error. So that's why >>>>> progress >>>>> stops. >>>>> >>>>> >>>>> ** Aside - OutOfMemory should probably be refactored to be able to >>>>> signal >>>>> itself without taking up more memory, triggering itself infinitely. >>>>> Maybe >>>>> it & its signalling morph infrastructure would be good as a singleton >>>>> ** >>>>> >>>>> >>>>> >>>>> I'm confused about why it runs out of memory. According to htop the >>>>> image >>>>> only takes up about 520-540 MB of RAM when it reaches the 'OutOfMemory' >>>>> condition. This Macbook Air laptop has 4GB, and has plenty of room for >>>>> the >>>>> image to grow. Also I've specified a 1,000MB image size when starting. >>>>> So >>>>> it should have plenty of room. Is there something I should check or a >>>>> flag >>>>> somewhere that prevents it from growing on a Mac? This is the latest >>>>> Pharo30 VM. >>>>> >>>>> >>>>> Thanks for helping me get to the bottom of this >>>>> >>>>> Paul >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Sven Van Caekenberghe-2 wrote >>>>>> Hi Paul, >>>>>> >>>>>> I think you must be doing something wrong with your class, the #do: is >>>>>> implemented as streaming over the record one by one, never holding >>>>>> more >>>>>> than one in memory. >>>>>> >>>>>> This is what I tried: >>>>>> >>>>>> 'paul.csv' asFileReference writeStreamDo: [ :file| >>>>>> ZnBufferedWriteStream on: file do: [ :out | >>>>>> (NeoCSVWriter on: out) in: [ :writer | >>>>>> writer writeHeader: { #Number. #Color. #Integer. #Boolean}. >>>>>> 1 to: 1e7 do: [ :each | >>>>>> writer nextPut: { each. #(Red Green Blue) atRandom. 1e6 >>>>>> atRandom. >>>>>> #(true false) atRandom } ] ] ] ]. >>>>>> >>>>>> This results in a 300Mb file: >>>>>> >>>>>> $ ls -lah paul.csv >>>>>> -rw-r--r--@ 1 sven staff 327M Nov 14 20:45 paul.csv >>>>>> $ wc paul.csv >>>>>> 10000001 10000001 342781577 paul.csv >>>>>> >>>>>> This is a selective read and collect (loads about 10K records): >>>>>> >>>>>> Array streamContents: [ :out | >>>>>> 'paul.csv' asFileReference readStreamDo: [ :in | >>>>>> (NeoCSVReader on: (ZnBufferedReadStream on: in)) in: [ :reader | >>>>>> reader skipHeader; addIntegerField; addSymbolField; >>>>>> addIntegerField; >>>>>> addFieldConverter: [ :x | x = #true ]. >>>>>> reader do: [ :each | each third < 1000 ifTrue: [ out nextPut: each >>>>>> ] >>>>>> ] ] ] ]. >>>>>> >>>>>> This worked fine on my MacBook Air, no memory problems. It takes a >>>>>> while >>>>>> to parse that much data, of course. >>>>>> >>>>>> Sven >>>>>> >>>>>>> On 14 Nov 2014, at 19:08, Paul DeBruicker < >>>>> >>>>>> pdebruic@ >>>>> >>>>>> > wrote: >>>>>>> >>>>>>> Hi - >>>>>>> >>>>>>> I'm processing a 9 GBs of CSV files (the biggest file is 220MB or >>>>>>> so). >>>>>>> I'm not sure if its because of the size of the files or the code I've >>>>>>> written to keep track of the domain objects I'm interested in, but >>>>>>> I'm >>>>>>> getting out of memory errors & crashes in Pharo 3 on Mac with the >>>>>>> latest >>>>>>> VM. I haven't checked other vms. >>>>>>> >>>>>>> I'm going to profile my own code and attempt to split the files >>>>>>> manually >>>>>>> for now to see what else it could be. >>>>>>> >>>>>>> >>>>>>> Right now I'm doing something similar to >>>>>>> >>>>>>> |file reader| >>>>>>> file:= '/path/to/file/myfile.csv' asFileReference readStream. >>>>>>> reader: NeoCSVReader on: file >>>>>>> >>>>>>> reader >>>>>>> recordClass: MyClass; >>>>>>> skipHeader; >>>>>>> addField: #myField:; >>>>>>> .... >>>>>>> >>>>>>> >>>>>>> reader do:[:eachRecord | self seeIfRecordIsInterestingAndIfSoKeepIt: >>>>>>> eachRecord]. >>>>>>> file close. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Is there a facility in NeoCSVReader to read a file in batches (e.g. >>>>>>> 1000 >>>>>>> lines at a time) or an easy way to do that ? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Paul >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790319.html >>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790328.html >>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. > > > > > > -- > View this message in context: > http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790341.html > Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. > > > > ------------------------------ > > Message: 2 > Date: Sat, 15 Nov 2014 00:07:35 +0100 > From: Sven Van Caekenberghe <s...@stfx.eu> > To: Any question about pharo is welcome <pharo-users@lists.pharo.org> > Subject: Re: [Pharo-users] running out of memory while processing a > 220MB csv file with NeoCSVReader - tips? > Message-ID: <4196b7d2-63c3-4e17-89df-a0c7aca91...@stfx.eu> > Content-Type: text/plain; charset=us-ascii > > >> On 14 Nov 2014, at 23:14, Paul DeBruicker <pdebr...@gmail.com> wrote: >> >> Hi Sven >> >> Yes, like I said earlier, after your first email, that I think its not a >> problem with NeoCSV as with what I'm doing and an out of memory condition. >> >> Have you ever seen a stack after sending kill -SIGUSR1 that looks like this: >> >> output file stack is full. >> output file stack is full. >> output file stack is full. >> output file stack is full. >> output file stack is full. >> .... >> >> >> What does that mean? > > I don't know, but I think that you are really out of memory. > BTW, I think that setting no flags is better, memory will expand maximally > then. > I think the useful maximum is closer to 1GB than 2GB. > >> Answers to your questions below. > > It is difficult to follow what you are doing exactly, but I think that you > underestimate how much memory a parsed, structured/nested object uses. Taking > the second line of your example, the 20+ fields, with 3 DateAndTimes, easily > cost between 512 and 1024 bytes per record. That would limit you to between > 1M and 2M records. > > I tried this: > > Array streamContents: [ :data | > 5e2 timesRepeat: [ > data nextPut: (Array streamContents: [ :out | > 20 timesRepeat: [ out nextPut: Character alphabet ]. > 3 timesRepeat: [ out nextPut: DateAndTime now ] ]) ] ]. > > it worked to 5e5, but not for 5e6 - I didn't try numbers in between as it > takes very long. > > Good luck, if you can solve this, please tell us how you did it. > >> Thanks again for helping me out >> >> >> >> Sven Van Caekenberghe-2 wrote >>> OK then, you *can* read/process 300MB .csv files ;-) >>> >>> What does your CSV file look like, can you show a couple of lines ? >>> >>> here are 2 lines + a header: >>> >>> "provnum","Provname","address","city","state","zip","survey_date_output","SurveyType","defpref","tag","tag_desc","scope","defstat","statdate","cycle","standard","complaint","filedate" >>> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET >>> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0314","Give >>> residents proper treatment to prevent new bed (pressure) sores or heal >>> existing bed sores.","D","Deficient, Provider has date of >>> correction","2013-10-10",1,"Y","N","2014-01-01" >>> "015009","BURNS NURSING HOME, INC.","701 MONROE STREET >>> NW","RUSSELLVILLE","AL","35653","2013-09-05","Health","F","0315","Ensure >>> that each resident who enters the nursing home without a catheter is not >>> given a catheter, unless medically necessary, and that incontinent >>> patients receive proper services to prevent urinary tract infections and >>> restore normal bladder functions.","D","Deficient, Provider has date of >>> correction","2013-10-10",1,"Y","N","2014-01-01" >>> >>> >>> You are using a custom record class of your own, what does that look like >>> or do ? >>> >>> A custom record class. This is all publicly available data but I'm >>> keeping track of the performance of US based health care providers during >>> their annual inspections. So the records are notes of a deficiency during >>> the inspection and I'm keeping those notes in a collection in an instance >>> of the health care provider's class. The custom record class just >>> converts the CSV record to objects (Integers, Strings, DateAndTime) and >>> then gets stuffed in the health care provider's deficiency history >>> OrderedCollection (which has about 100 items). Again I don't think its >>> what I'm doing as much as the image isn't growing when it needs to. >>> >>> >>> >>> >>> Maybe you can try using Array again ? >>> >>> I've attempted to do it where I parse and convert the entire CSV into >>> domain objects then add them to the image and the parsing works fine, but >>> the system runs out of resources during the update phase. >>> >>> >>> What percentage of records read do you keep ? In my example it was very >>> small. Have you tried calculating your memory usage ? >>> >>> >>> I'm keeping some data from every record, but it doesn't load more than >>> 500MB of the data before falling over. I am not attempting to load the >>> 9GB of CSV files into one image. For 95% of the records in the CSV file >>> 20 of the 22 columns of the data is the same from file to file, just a >>> 'published date' and a 'time to expiration' date changes. Each file >>> covers a month, with about 500k deficiencies. Each month some >>> deficiencies are added to the file and some are resolved. So the total >>> number of deficiencies in the image is about 500k. Of those records that >>> don't expire in a given month I'm adding the published date to a >>> collection of published dates for the record and also adding the "time to >>> expiration" to a collection of those to record what was made public and >>> letting the rest of the data get GC'd. I don't only load those two >>> records because the other fields of the record in the CSV could change. >>> >>> I have not calculated the memory usage for the collection because I >>> thought it would have no problem fitting in the 2GB of RAM I have on this >>> machine. >>> >>> >>> >>>> On 14 Nov 2014, at 22:34, Paul DeBruicker < >> >>> pdebruic@ >> >>> > wrote: >>>> >>>> Yes. With the image & vm I'm having trouble with I get an array with >>>> 9,942 >>>> elements in it. So its works as you'd expect. >>>> >>>> While processing the CSV file the image stays at about 60MB in RAM. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Sven Van Caekenberghe-2 wrote >>>>> Can you successfully run my example code ? >>>>> >>>>>> On 14 Nov 2014, at 22:03, Paul DeBruicker < >>>> >>>>> pdebruic@ >>>> >>>>> > wrote: >>>>>> >>>>>> Hi Sven, >>>>>> >>>>>> Thanks for taking a look and testing the NeoCSVReader portion for me. >>>>>> You're right of course that there's something I'm doing that's slow. >>>>>> But. >>>>>> There is something I can't figure out yet. >>>>>> >>>>>> To provide a little more detail: >>>>>> >>>>>> When the 'csv reading' process completes successfully profiling shows >>>>>> that >>>>>> most of the time is spent in NeoCSVReader>>#peekChar and using >>>>>> NeoCSVReader>>##addField: to convert a string to a DateAndTime. >>>>>> Dropping >>>>>> the DateAndTime conversion speeds things up but doesn't stop it from >>>>>> running >>>>>> out of memory. >>>>>> >>>>>> I start the image with >>>>>> >>>>>> ./pharo-ui --memory 1000m myimage.image >>>>>> >>>>>> Splitting the CSV file helps: >>>>>> ~1.5MB 5,000 lines = 1.2 seconds. >>>>>> ~15MB 50,000 lines = 8 seconds. >>>>>> ~30MB 100,000 lines = 16 seconds. >>>>>> ~60MB 200,000 lines = 45 seconds. >>>>>> >>>>>> >>>>>> It seems that when the CSV file crosses ~70MB in size things start >>>>>> going >>>>>> haywire with performance, and leads to the out of memory condition. >>>>>> The >>>>>> processing never ends. Sending "kill -SIGUSR1" prints a stack >>>>>> primarily >>>>>> composed of: >>>>>> >>>>>> 0xbffc5d08 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n) >>>>>> OutOfMemory class >>>>>> 0xbffc5d20 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n) >>>>>> OutOfMemory class >>>>>> 0xbffc5d38 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n) >>>>>> OutOfMemory >>>>>> class >>>>>> 0xbffc5d50 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n) >>>>>> OutOfMemory class >>>>>> 0xbffc5d68 M OutOfMemory class(Behavior)>basicNew 0x1f7ac060: a(n) >>>>>> OutOfMemory class >>>>>> 0xbffc5d80 M OutOfMemory class(Behavior)>new 0x1f7ac060: a(n) >>>>>> OutOfMemory >>>>>> class >>>>>> 0xbffc5d98 M OutOfMemory class(Exception class)>signal 0x1f7ac060: a(n) >>>>>> OutOfMemory class >>>>>> >>>>>> So it seems like its trying to signal that its out of memory after its >>>>>> out >>>>>> of memory which triggers another OutOfMemory error. So that's why >>>>>> progress >>>>>> stops. >>>>>> >>>>>> >>>>>> ** Aside - OutOfMemory should probably be refactored to be able to >>>>>> signal >>>>>> itself without taking up more memory, triggering itself infinitely. >>>>>> Maybe >>>>>> it & its signalling morph infrastructure would be good as a singleton >>>>>> ** >>>>>> >>>>>> >>>>>> >>>>>> I'm confused about why it runs out of memory. According to htop the >>>>>> image >>>>>> only takes up about 520-540 MB of RAM when it reaches the 'OutOfMemory' >>>>>> condition. This Macbook Air laptop has 4GB, and has plenty of room for >>>>>> the >>>>>> image to grow. Also I've specified a 1,000MB image size when starting. >>>>>> So >>>>>> it should have plenty of room. Is there something I should check or a >>>>>> flag >>>>>> somewhere that prevents it from growing on a Mac? This is the latest >>>>>> Pharo30 VM. >>>>>> >>>>>> >>>>>> Thanks for helping me get to the bottom of this >>>>>> >>>>>> Paul >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sven Van Caekenberghe-2 wrote >>>>>>> Hi Paul, >>>>>>> >>>>>>> I think you must be doing something wrong with your class, the #do: is >>>>>>> implemented as streaming over the record one by one, never holding >>>>>>> more >>>>>>> than one in memory. >>>>>>> >>>>>>> This is what I tried: >>>>>>> >>>>>>> 'paul.csv' asFileReference writeStreamDo: [ :file| >>>>>>> ZnBufferedWriteStream on: file do: [ :out | >>>>>>> (NeoCSVWriter on: out) in: [ :writer | >>>>>>> writer writeHeader: { #Number. #Color. #Integer. #Boolean}. >>>>>>> 1 to: 1e7 do: [ :each | >>>>>>> writer nextPut: { each. #(Red Green Blue) atRandom. 1e6 >>>>>>> atRandom. >>>>>>> #(true false) atRandom } ] ] ] ]. >>>>>>> >>>>>>> This results in a 300Mb file: >>>>>>> >>>>>>> $ ls -lah paul.csv >>>>>>> -rw-r--r--@ 1 sven staff 327M Nov 14 20:45 paul.csv >>>>>>> $ wc paul.csv >>>>>>> 10000001 10000001 342781577 paul.csv >>>>>>> >>>>>>> This is a selective read and collect (loads about 10K records): >>>>>>> >>>>>>> Array streamContents: [ :out | >>>>>>> 'paul.csv' asFileReference readStreamDo: [ :in | >>>>>>> (NeoCSVReader on: (ZnBufferedReadStream on: in)) in: [ :reader | >>>>>>> reader skipHeader; addIntegerField; addSymbolField; >>>>>>> addIntegerField; >>>>>>> addFieldConverter: [ :x | x = #true ]. >>>>>>> reader do: [ :each | each third < 1000 ifTrue: [ out nextPut: each >>>>>>> ] >>>>>>> ] ] ] ]. >>>>>>> >>>>>>> This worked fine on my MacBook Air, no memory problems. It takes a >>>>>>> while >>>>>>> to parse that much data, of course. >>>>>>> >>>>>>> Sven >>>>>>> >>>>>>>> On 14 Nov 2014, at 19:08, Paul DeBruicker < >>>>>> >>>>>>> pdebruic@ >>>>>> >>>>>>> > wrote: >>>>>>>> >>>>>>>> Hi - >>>>>>>> >>>>>>>> I'm processing a 9 GBs of CSV files (the biggest file is 220MB or >>>>>>>> so). >>>>>>>> I'm not sure if its because of the size of the files or the code I've >>>>>>>> written to keep track of the domain objects I'm interested in, but >>>>>>>> I'm >>>>>>>> getting out of memory errors & crashes in Pharo 3 on Mac with the >>>>>>>> latest >>>>>>>> VM. I haven't checked other vms. >>>>>>>> >>>>>>>> I'm going to profile my own code and attempt to split the files >>>>>>>> manually >>>>>>>> for now to see what else it could be. >>>>>>>> >>>>>>>> >>>>>>>> Right now I'm doing something similar to >>>>>>>> >>>>>>>> |file reader| >>>>>>>> file:= '/path/to/file/myfile.csv' asFileReference readStream. >>>>>>>> reader: NeoCSVReader on: file >>>>>>>> >>>>>>>> reader >>>>>>>> recordClass: MyClass; >>>>>>>> skipHeader; >>>>>>>> addField: #myField:; >>>>>>>> .... >>>>>>>> >>>>>>>> >>>>>>>> reader do:[:eachRecord | self seeIfRecordIsInterestingAndIfSoKeepIt: >>>>>>>> eachRecord]. >>>>>>>> file close. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Is there a facility in NeoCSVReader to read a file in batches (e.g. >>>>>>>> 1000 >>>>>>>> lines at a time) or an easy way to do that ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Paul >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790319.html >>>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790328.html >>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. >> >> >> >> >> >> -- >> View this message in context: >> http://forum.world.st/running-out-of-memory-while-processing-a-220MB-csv-file-with-NeoCSVReader-tips-tp4790264p4790341.html >> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. > > > > > ------------------------------ > > Message: 3 > Date: Sat, 15 Nov 2014 00:09:48 +0100 > From: Alain Rastoul <alf.mmm....@gmail.com> > To: pharo-users@lists.pharo.org > Subject: Re: [Pharo-users] Has anyone tried compiling the Pharo VM > into JS using Emscripten? > Message-ID: <m4623q$btk$1...@ger.gmane.org> > Content-Type: text/plain; charset=utf-8; format=flowed > > Hi Andy, > > If I understand your question, you want to remake Dan Ingall's lively > kernel ? > :) > http://lively-web.org/welcome.html > > Cheers, > Alain > > > Le 14/11/2014 22:31, Andy Burnett a ?crit : >> I just saw this implementation of SQLite as a JS system, via Emscripten, >> and I was curious whether something similar would be even vaguely >> possible for the VM. >> >> Cheers >> Andy >> ? > > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Pharo-users mailing list > Pharo-users@lists.pharo.org > http://lists.pharo.org/mailman/listinfo/pharo-users_lists.pharo.org > > > ------------------------------ > > End of Pharo-users Digest, Vol 19, Issue 53 > *******************************************