Hi

I've recently been playing with medical provider data sets which are quite 
large, also around 270K records. I'm using a Moose image Pharo5.0 Latest 
update: #50643 on a Mac OS X.

The initial issue I had was with memory settings for the VM. This has been 
increased and the image ranges from 800MB to 1.3GB and has been fine. There 
have been occasional crashes/hangs but this is to do with memory limits and GC. 
Typically this occurs when making class format changes to existing instances of 
data e.g. new variables introduced to a working image with a large data set. To 
counter this I have a base image which I update the code and then import the 
data (CSV for now) using NeoCSV. This process takes about 30 seconds so it's 
not too painful.

The other issue I've come across is a slow down in querying the data sets using 
the Playground. I profiled the code and found that the culprit to be 
GLMTreeMorphModel>>explicitlySelectMultipleItems: which is terribly slow as it 
iterates over the entire data set. I've made a modification to prevent the 
expensive iteration when there are more than 50000 records to be displayed e.g. 
 
self roots size > 50000
                ifTrue: [ ^ self ].

I'm also using 
Teapot to be able to perform easy querying of the 2 data sets and to build an 
HTML comparison view of the records. This uses the in-memory OO model to 
populate the html.
Teapot to be able to pull a STON representation into a different image, then 
building instances for performing querying or simple reporting.
The Playground workspace to query and analyse the data cheaply e.g. self 
collect: [ :each | each disciplineCode ] as: Bag. Then using the customised 
view to quickly see a distribution of values.

Anyway the reason for this long-winded email is to hopefully provide some 
useful feedback but more to thank everyone involved in building a powerful 
environment. I'd hate to name people, because I'm sure to miss most, but the 
efforts of people like Sven (Neo*, STON), Doru (Moose*), Avi (ROE, BTREE) are 
appreciated. I know there are a lot of hands behind the scenes to make Pharo, 
from the fast VM to the UI, so thanks to all.

Regards
Carlo


On 13 Apr 2016, at 2:49 AM, Offray Vladimir Luna Cárdenas 
<offray.l...@mutabit.com> wrote:

Hi,

On 12/04/16 16:51, Stephan Eggermont wrote:
> On 12/04/16 22:44, Offray Vladimir Luna Cárdenas wrote:
>> I'm working with visualizations a external dataset which contains 270k
>> records. So the best strategy seems to bridge pharo with SQLite to keep
>> requirements low while using Roassal to visualize aggregated information
>> that is obtained from querying the database.
> 
> It won't fit in image?
> 

I tried with RTTabTable and NeoCVS but they can not load the data. I made a 
test drawing 150k points and the image starts to lag and trying to query the 
data becomes inefficient compared to query the data on SQLite. For the moment 
I'll export the query results to CVS, but I'll hope to have the SQLite bridge 
working soon.

Offray


Reply via email to