Hi I've recently been playing with medical provider data sets which are quite large, also around 270K records. I'm using a Moose image Pharo5.0 Latest update: #50643 on a Mac OS X.
The initial issue I had was with memory settings for the VM. This has been increased and the image ranges from 800MB to 1.3GB and has been fine. There have been occasional crashes/hangs but this is to do with memory limits and GC. Typically this occurs when making class format changes to existing instances of data e.g. new variables introduced to a working image with a large data set. To counter this I have a base image which I update the code and then import the data (CSV for now) using NeoCSV. This process takes about 30 seconds so it's not too painful. The other issue I've come across is a slow down in querying the data sets using the Playground. I profiled the code and found that the culprit to be GLMTreeMorphModel>>explicitlySelectMultipleItems: which is terribly slow as it iterates over the entire data set. I've made a modification to prevent the expensive iteration when there are more than 50000 records to be displayed e.g. self roots size > 50000 ifTrue: [ ^ self ]. I'm also using Teapot to be able to perform easy querying of the 2 data sets and to build an HTML comparison view of the records. This uses the in-memory OO model to populate the html. Teapot to be able to pull a STON representation into a different image, then building instances for performing querying or simple reporting. The Playground workspace to query and analyse the data cheaply e.g. self collect: [ :each | each disciplineCode ] as: Bag. Then using the customised view to quickly see a distribution of values. Anyway the reason for this long-winded email is to hopefully provide some useful feedback but more to thank everyone involved in building a powerful environment. I'd hate to name people, because I'm sure to miss most, but the efforts of people like Sven (Neo*, STON), Doru (Moose*), Avi (ROE, BTREE) are appreciated. I know there are a lot of hands behind the scenes to make Pharo, from the fast VM to the UI, so thanks to all. Regards Carlo On 13 Apr 2016, at 2:49 AM, Offray Vladimir Luna Cárdenas <offray.l...@mutabit.com> wrote: Hi, On 12/04/16 16:51, Stephan Eggermont wrote: > On 12/04/16 22:44, Offray Vladimir Luna Cárdenas wrote: >> I'm working with visualizations a external dataset which contains 270k >> records. So the best strategy seems to bridge pharo with SQLite to keep >> requirements low while using Roassal to visualize aggregated information >> that is obtained from querying the database. > > It won't fit in image? > I tried with RTTabTable and NeoCVS but they can not load the data. I made a test drawing 150k points and the image starts to lag and trying to query the data becomes inefficient compared to query the data on SQLite. For the moment I'll export the query results to CVS, but I'll hope to have the SQLite bridge working soon. Offray