Hi sunburned,

I think that a light-weight feature class or FeatureOnDemand is a good 
solution, as well as a FeatureCache.
I already tested Agile's scalable shapefile driver, and I'm currently 
implementing something similar for GeoConcept format(a commercial gis). 
It can save a lot of memory (but as you guess, is not very good for 
performance unless we find very well designed solutions)
I've not yet seen how kosmo implemented their scalable shapefile driver, 
but I'll have to, because it is not only scalable, it is also writable !
Some questions are :
- what must the in-memory representation of the light-weight feature 
include ?
the minimum is an identifier and a file adress for disk-access (unless 
you store data in a database)
but imo the bounding box has also to be in-memory for performance 
reasons (just wonder if it is worth trying to store the bb in a 
structure smaller than 4 doubles)
- another question you ask is about data format. Sigle project is 
exploring GML format storage for direct access. I think you can also 
keep the data in the original file format (this is the way scalable 
shapefile works, and the way I am exploring with geoconcept format). But 
storing data in jump's own format may be useful to solve performance 
issues, or to solve the data access problem in a more independant way. 
For this issue, I made some tests to compare wkb and wkt reading (and 
also writing). Sorry, I did not test serializing which, I think, is not 
very performant. Here are my results with jts 1.8 (every test made with 
my personal laptop computer) :

Reading 100 Complex WKT Polygon (about 7000 points each)    26590267 
bytes    15.073 sec
Reading 1 000 000 WKT Points sequentially                             
    64489511 bytes    47.874 sec

Reading 100 Complex WKB Polygon (about 7000 points each)    26590267 
bytes    1.313 sec
Reading 1 000 000 WKB Points sequentially                               
  64489511 bytes    2.542 sec

Some more tests for database access (binary geometry)
postgreSQL, sequential access :    10 000 pts 0.3 sec
postgreSQL, random access :       10 000 pts 7 sec
H2, sequential or random access : 10 000 pts 0.4 sec

Michaël

Sunburned Surveyor a écrit :

> I've been working on a solution to the problem of working with very 
> large datasets in OpenJUMP at home the past couple of weeks. (For 
> those of you that don't know, OpenJUMP reads all features in from a 
> data source into memory. This isn't a problem until you start working 
> with some very large datasets. For example, OpenJUMP runs out of 
> memory before it can open the shapefile with all of the parcels in my 
> county. The size limit of the data source OpenJUMP can work with is 
> limited by the RAM of the computer OpenJUMP is running on.) I'd like 
> to give a brief explanation of how this system will work, and then ask 
> for some suggestions on an aspect of the design.
>
>  
>
> This system uses a very light-weight in-memory representation of the 
> Feature class. (This is required because portions of OpenJUMP's code 
> requires the ability to manipulate individual features or all the 
> features in a feature collection "in-memeory".) Object's of this 
> light-weight Feature Class are really a façade and forward all method 
> calls to a FeatureCache object. A FeatureCache is an implementation of 
> the FeatureCollection interface that actually manages data behind the 
> light-weight Feature objects.
>
>  
>
> The FeatureCache maintains a "buffer". In this buffer it stores 
> in-memory representations of regular OpenJUMP Feature objects. This 
> buffer will only grow to a maximum size that can be set by the user 
> and based on the balance between speed/performance and memory usage. 
> When a method call is made to the light-weight Feature object it is 
> forwarded to the FeatureCache. The FeatureCache passes this call to 
> the regular Feature object if it is in the buffer. If it is not in the 
> buffer the Feature object is created in memory from information in 
> permanent storage or "on-disk". The method call is then processed and 
> the newly created Feature is placed in the buffer. If the buffer is 
> already at its limit the oldest Feature in the Buffer is stored back 
> in permanent memory and removed from the buffer.
>
>  
>
> There should be no major distinction between Features and a 
> FeatureCollection implemented by a FeatureCache and normal Features 
> and FeatureCollections that are stored entirely in memory. The only 
> significant difference will be the speed of operations and rendering. 
> This will be slower with this system than it is with Features and 
> FeatureCollections stored entirely in memory. However, it will make it 
> possible to work with very large datasets.
>
>  
>
> Here is the part of the system that I would like to get some 
> suggestions on. I need to decide on a storage format for the features 
> placed in permanent memory, or on disk. I think I have 3 choices.
>
>  
>
> [1] Java's Standard Object Serialization Format
>
> [2] A custom binary storage format.
>
> [3] A text based format.
>
>  
>
> I believe the first two formats will be much quicker than the third. I 
> don't really think the second format is something I want to do, 
> because I think cooking up a custom binary format will be a real pain 
> in the neck. So I need to decide between the first format listed and 
> the third format listed.
>
>  
>
> If I use a text-based format external tools will be able to easily 
> work with the FeatureCache, and I won't have to worry about versioning 
> issues. It will also be slower. If I use Java's standard object 
> serialization format I'll have better performance, but I'll have to 
> worry about versioning issues that might come up if we change the 
> interface definition for the Feature interface. It will also make it 
> difficult for external tools, especially those that aren't written in 
> Java, to work with the data in the FeatureCache.
>
>  
>
> I'd like to know what storage format the other developers would 
> recommend and why.
>  
> Thanks,
>  
> The Sunburned Surveyor
>
>------------------------------------------------------------------------
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Jump-pilot-devel mailing list
>Jump-pilot-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>  
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

Reply via email to