Sven Van Caekenberghe-2 wrote
> Well, it is quite a bit of data (I didn't look too deeply), 50.000 records
> of structured/nested data with quite a lot of strings. If each record is
> 1Kb, that makes 50Mb.
>
> How do you measure your memory consumption ? What did you expect ?
I did only think about memory, when my first attempts to parse the file
reached the VM's memory limit, which seemed to be at ~500MB on OS X out of
the box. Then I did only watch the memory from outside, using OS X's
Activity Monitor and after I gave the VM more memory, the image grew up to
1.2 GB while parsing and inspecting the 80MB file. But I did not yet
investigate, were the memory went - perhaps it is all in the Inspector that
I opened to view the result :)
Sven Van Caekenberghe-2 wrote
> Right now, your JSON is parsed and the result is a combination of lists
> (Array) and maps (Dictionary). If you know/understand well what is inside
> it, and it is regular enough, you could try to build your own
> specialised/optimised data/domain model for it. NeoJSON can also parse
> directly to your objects, instead of the general ones (a process called
> mapping). This is some work, of course, and it might not be worth it,
> YMMV.
Yes, I have used mappings in the past. Here I was just toying with the New
York Public Library's Open Source data for a second...
Sven Van Caekenberghe-2 wrote
> Sven
>
>> I tried to parse with
>> PetitParser but the results were similar. I guess, i have to learn to
>> find
>> out were all the memory goes.
>>
>> Best regards,
>> Martin.
>>
>>
>>
>> Sven Van Caekenberghe-2 wrote
>>> (I don't do StackOverflow)
>>>
>>> Reading the 'format' is easy, just keep on doing #next for each JSON
>>> expression (whitespace is ignored).
>>>
>>> | data reader |
>>> data := '{"smalltalk": "cool"}
>>> {"pharo": "cooler"}'.
>>> reader := NeoJSONReader on: data readStream.
>>> Array streamContents: [ :out |
>>> [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
>>>
>>> Preventing intermediary data structures is easy too, use streaming.
>>>
>>> | client reader data networkStream |
>>> (client := ZnClient new)
>>> streaming: true;
>>> url:
>>> 'https://github.com/NYPL-publicdomain/data-and-utilities/blob/master/items/pd_items_1.ndjson?raw=true';
>>> get.
>>> networkStream := ZnCharacterReadStream on: client contents.
>>> reader := NeoJSONReader on: networkStream.
>>> data := Array streamContents: [ :out |
>>> [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
>>> client close.
>>> data.
>>>
>>> It took a couple of seconds, it is 80MB+ over the network for 50K items
>>> after all.
>>>
>>>
>>>
>>> HTH,
>>>
>>> Sven
>>>
>>>
>>>> On 21 Jan 2016, at 12:02, Esteban Lorenzano <
>>
>>> estebanlm@
>>
>>> > wrote:
>>>>
>>>> Hi,
>>>>
>>>> there is a question I don’t know how to answer.
>>>>
>>>> http://stackoverflow.com/questions/34904337/how-to-parse-ndjson-in-pharo-with-neojson
>>>>
>>>> Transcript:
>>>>
>>>> I want to parse ndjson (newline delimited json) data with NeoJSON on
>>>> Pharo Smalltalk.
>>>>
>>>> ndjson data looks like this:
>>>>
>>>> {"smalltalk": "cool"}
>>>> {"pharo": "cooler"}
>>>> At the moment I convert my file stream to a string, split it on newline
>>>> and then parse the single parts using NeoJSON. This seems to use an
>>>> unnecessary (and extremely huge) amount of memory and time, probably
>>>> because of converting streams to strings and vice-versa all the time.
>>>> What would be an efficient way to do this task?
>>>>
>>>>
>>>> Takers?
>>>> Esteban
>>>
>>>
>>>
>>> Screen Shot 2016-01-21 at 13.33.57.png (480K)
>>> <http://forum.world.st/attachment/4873112/0/Screen%20Shot%202016-01-21%20at%2013.33.57.png>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://forum.world.st/From-StackOverflow-How-to-parse-ndjson-in-Pharo-with-NeoJSON-tp4873097p4873385.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
--
View this message in context:
http://forum.world.st/From-StackOverflow-How-to-parse-ndjson-in-Pharo-with-NeoJSON-tp4873097p4873399.html
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.