Hi sunburned, I think that a light-weight feature class or FeatureOnDemand is a good solution, as well as a FeatureCache. I already tested Agile's scalable shapefile driver, and I'm currently implementing something similar for GeoConcept format(a commercial gis). It can save a lot of memory (but as you guess, is not very good for performance unless we find very well designed solutions) I've not yet seen how kosmo implemented their scalable shapefile driver, but I'll have to, because it is not only scalable, it is also writable ! Some questions are : - what must the in-memory representation of the light-weight feature include ? the minimum is an identifier and a file adress for disk-access (unless you store data in a database) but imo the bounding box has also to be in-memory for performance reasons (just wonder if it is worth trying to store the bb in a structure smaller than 4 doubles) - another question you ask is about data format. Sigle project is exploring GML format storage for direct access. I think you can also keep the data in the original file format (this is the way scalable shapefile works, and the way I am exploring with geoconcept format). But storing data in jump's own format may be useful to solve performance issues, or to solve the data access problem in a more independant way. For this issue, I made some tests to compare wkb and wkt reading (and also writing). Sorry, I did not test serializing which, I think, is not very performant. Here are my results with jts 1.8 (every test made with my personal laptop computer) :
Reading 100 Complex WKT Polygon (about 7000 points each) 26590267 bytes 15.073 sec Reading 1 000 000 WKT Points sequentially 64489511 bytes 47.874 sec Reading 100 Complex WKB Polygon (about 7000 points each) 26590267 bytes 1.313 sec Reading 1 000 000 WKB Points sequentially 64489511 bytes 2.542 sec Some more tests for database access (binary geometry) postgreSQL, sequential access : 10 000 pts 0.3 sec postgreSQL, random access : 10 000 pts 7 sec H2, sequential or random access : 10 000 pts 0.4 sec Michaël Sunburned Surveyor a écrit : > I've been working on a solution to the problem of working with very > large datasets in OpenJUMP at home the past couple of weeks. (For > those of you that don't know, OpenJUMP reads all features in from a > data source into memory. This isn't a problem until you start working > with some very large datasets. For example, OpenJUMP runs out of > memory before it can open the shapefile with all of the parcels in my > county. The size limit of the data source OpenJUMP can work with is > limited by the RAM of the computer OpenJUMP is running on.) I'd like > to give a brief explanation of how this system will work, and then ask > for some suggestions on an aspect of the design. > > > > This system uses a very light-weight in-memory representation of the > Feature class. (This is required because portions of OpenJUMP's code > requires the ability to manipulate individual features or all the > features in a feature collection "in-memeory".) Object's of this > light-weight Feature Class are really a façade and forward all method > calls to a FeatureCache object. A FeatureCache is an implementation of > the FeatureCollection interface that actually manages data behind the > light-weight Feature objects. > > > > The FeatureCache maintains a "buffer". In this buffer it stores > in-memory representations of regular OpenJUMP Feature objects. This > buffer will only grow to a maximum size that can be set by the user > and based on the balance between speed/performance and memory usage. > When a method call is made to the light-weight Feature object it is > forwarded to the FeatureCache. The FeatureCache passes this call to > the regular Feature object if it is in the buffer. If it is not in the > buffer the Feature object is created in memory from information in > permanent storage or "on-disk". The method call is then processed and > the newly created Feature is placed in the buffer. If the buffer is > already at its limit the oldest Feature in the Buffer is stored back > in permanent memory and removed from the buffer. > > > > There should be no major distinction between Features and a > FeatureCollection implemented by a FeatureCache and normal Features > and FeatureCollections that are stored entirely in memory. The only > significant difference will be the speed of operations and rendering. > This will be slower with this system than it is with Features and > FeatureCollections stored entirely in memory. However, it will make it > possible to work with very large datasets. > > > > Here is the part of the system that I would like to get some > suggestions on. I need to decide on a storage format for the features > placed in permanent memory, or on disk. I think I have 3 choices. > > > > [1] Java's Standard Object Serialization Format > > [2] A custom binary storage format. > > [3] A text based format. > > > > I believe the first two formats will be much quicker than the third. I > don't really think the second format is something I want to do, > because I think cooking up a custom binary format will be a real pain > in the neck. So I need to decide between the first format listed and > the third format listed. > > > > If I use a text-based format external tools will be able to easily > work with the FeatureCache, and I won't have to worry about versioning > issues. It will also be slower. If I use Java's standard object > serialization format I'll have better performance, but I'll have to > worry about versioning issues that might come up if we change the > interface definition for the Feature interface. It will also make it > difficult for external tools, especially those that aren't written in > Java, to work with the data in the FeatureCache. > > > > I'd like to know what storage format the other developers would > recommend and why. > > Thanks, > > The Sunburned Surveyor > >------------------------------------------------------------------------ > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys-and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > >------------------------------------------------------------------------ > >_______________________________________________ >Jump-pilot-devel mailing list >Jump-pilot-devel@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Jump-pilot-devel mailing list Jump-pilot-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel