Re: [Pharo-users] running out of memory while processing a 220MB csv file with NeoCSVReader - tips?

Stephan Eggermont Tue, 18 Nov 2014 02:36:13 -0800

Alain wrote:
>you are saying that zip ratio is somewhat related to normalized data, 
>interesting view, and certainly true :)


I find it a nice heuristic to help me get started.
Just sort the tables on size, start compressing them and 
start with ones compressing best.

>About DateTimes, I think this is not different than with other values, 
>using a pointer to an interned value should be equivalent to using an 
>int, as it would be a 32 bits pointer, and with this approach, using 
>compact records should not make a big difference too if there is not a 
>lot of different values. 

Combining multiple booleans in one word still helps a lot, and 
introducing extra objects for highly correlated fields.

>The key I mentioned here is that in real life, this "normalizing ratio" 
>is very high for almost every kind of data and that's what puzzles me 
>(not the technique). 

My impression is that the a lot of the design decisions for relational databases
are cargo cult based on the time where most database did not fit
into ram and the query optimizers were not good at dealing with 
lots of joins.  

Stephan

Re: [Pharo-users] running out of memory while processing a 220MB csv file with NeoCSVReader - tips?

Reply via email to