Re: [JPP-Devel] FeatureCache - Request for suggestions...

Michaël Michaud Thu, 29 Mar 2007 23:12:16 -0800

Sunburned Surveyor,

> I'm glad you think so. I like the term "FeatureOnDemand". Do you mind 
> if I use it as the name of the light-weight feature class?


I guess I read this term from Agile's code. I don't mind who uses it and 
hope alvaro zabala, the original developer of agile doesn't mind too.

>  The FeatureCache will be writable as well. The advantage over the 
> scalable shapefile driver that is used by Agile, UDig and (maybe) 
> Kosmo is that we'll be able to use the FeatureCache with any data 
> source that can provide Features. For example, after I get the 
> FeatureCache working with GeoTools Shapefile drivers I want to get it 
> working for AutoDesk's DXF format as well. The other benefit is that 
> we can support storage of data not currently supported in the ESRI 
> Shapefile format if we choose to do so in the future.

I don't know how you want to use the featureCache. As I imagined it 
until now, it was just used to keep a part of your features in memory, 
most of the data staying in the original file. But another solution 
(yours ?) is to entirely copy the original data file in your own 
independant format and to make direct access to this last one. I'll 
consider it attentively.

> Almost, but not quite. I was only going to store a numeric identifier 
> for the Feature, like a serial number, which I would probably store in 
> an integer or a long. The only other item I would store is perhaps a 
> string with the name of the FeatureCache containing the Feature. I 
> think this is about as light weight as you can get.

At the moment, I have only one FeatureOnDemand class, so I need an id 
and an adress, but if you keep 2 objects into memory , FeatureOnDemand 
and FeatureCache, I suppose it is about the same (you will need an 
identifier, a reference to the FeatureCache, and a file adresse in your 
featurecache to make direct access to your data)

> You wrote: "but imo the bounding box has also to be in-memory for 
> performance
> reasons (just wonder if it is worth trying to store the bb in a
> structure smaller than 4 doubles)"
> I didn't think about this. Could you please tell me why you think it 
> will be important to keep the bounding box of the feature in memory? 
> Is this for rendering purposes? Remember we will need to put every 
> feature into memory for rendering anyways, so I don't know if this 
> will save us anything. Unless the bounding box is used for another 
> frequent operation.

Why do you think we need to put every feature into memory ? If so, we'll 
never be able to load huge data file. Until someone explain to me why we 
have to keep all the features in memory, I want to think we have not :-)
I we keep only reference and bounding boxes into memory, it can save 
disk access operations for all the features which bb does not intersect 
the OJ window (which is very very interesting when you read a large 
dataset but only need to zoom on a small part of the dataset.

> After looking at your tests of JTS reading in WKT and WKB formats I 
> can see that using text as the storage format really isn't a good 
> option. The binary storage format is so much faster! 
> I'll have to give this problem a lot more thought. Perhaps I can get a 
> temporary FeatureCache system running with Java's standard object 
> serialization, and work on the custom binary format after that.
> I'll have to take a look at WKB format. Maybe we can base a binary 
> format for Feature attribute values on a similar system.

I also need to have more thought about data access. WKT is interesting 
because it is human readable, but as soon as performance is concerned, 
WKB offers a big advantage. As I said in the previous mail, I don't know 
if serialization is a good solution for the performance point of view, 
but I'm not sure it will save you much work as JTS has a WKB 
reader/writer which is simple to use  

> Thanks again for your comments. They were very helpful.
> Thanks to Erwan as well.

Thanks

Michaël

>  
> The Sunburned Surveyor
>  
>
>
>  
> On 3/29/07, *Michaël Michaud* <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hi sunburned,
>
>     I think that a light-weight feature class or FeatureOnDemand is a good
>     solution, as well as a FeatureCache.
>     I already tested Agile's scalable shapefile driver, and I'm currently
>     implementing something similar for GeoConcept format(a commercial
>     gis).
>     It can save a lot of memory (but as you guess, is not very good for
>     performance unless we find very well designed solutions)
>     I've not yet seen how kosmo implemented their scalable shapefile
>     driver,
>     but I'll have to, because it is not only scalable, it is also
>     writable !
>     Some questions are :
>     - what must the in-memory representation of the light-weight feature
>     include ?
>     the minimum is an identifier and a file adress for disk-access (unless
>     you store data in a database)
>     but imo the bounding box has also to be in-memory for performance
>     reasons (just wonder if it is worth trying to store the bb in a
>     structure smaller than 4 doubles)
>     - another question you ask is about data format. Sigle project is
>     exploring GML format storage for direct access. I think you can also
>     keep the data in the original file format (this is the way scalable
>     shapefile works, and the way I am exploring with geoconcept
>     format). But
>     storing data in jump's own format may be useful to solve performance
>     issues, or to solve the data access problem in a more independant way.
>     For this issue, I made some tests to compare wkb and wkt reading (and
>     also writing). Sorry, I did not test serializing which, I think,
>     is not
>     very performant. Here are my results with jts 1.8 (every test made
>     with
>     my personal laptop computer) :
>
>     Reading 100 Complex WKT Polygon (about 7000 points each)    26590267
>     bytes    15.073 sec
>     Reading 1 000 000 WKT Points sequentially
>        64489511 bytes    47.874 sec
>
>     Reading 100 Complex WKB Polygon (about 7000 points each)    26590267
>     bytes    1.313 sec
>     Reading 1 000 000 WKB Points sequentially
>     64489511 bytes    2.542 sec
>
>     Some more tests for database access (binary geometry)
>     postgreSQL, sequential access :    10 000 pts 0.3 sec
>     postgreSQL, random access :       10 000 pts 7 sec
>     H2, sequential or random access : 10 000 pts 0.4 sec
>
>     Michaël
>
>     Sunburned Surveyor a écrit :
>
>     > I've been working on a solution to the problem of working with very
>     > large datasets in OpenJUMP at home the past couple of weeks. (For
>     > those of you that don't know, OpenJUMP reads all features in from a
>     > data source into memory. This isn't a problem until you start
>     working
>     > with some very large datasets. For example, OpenJUMP runs out of
>     > memory before it can open the shapefile with all of the parcels
>     in my
>     > county. The size limit of the data source OpenJUMP can work with is
>     > limited by the RAM of the computer OpenJUMP is running on.) I'd like
>     > to give a brief explanation of how this system will work, and
>     then ask
>     > for some suggestions on an aspect of the design.
>     >
>     >
>     >
>     > This system uses a very light-weight in-memory representation of the
>     > Feature class. (This is required because portions of OpenJUMP's
>     code
>     > requires the ability to manipulate individual features or all the
>     > features in a feature collection "in-memeory".) Object's of this
>     > light-weight Feature Class are really a façade and forward all
>     method
>     > calls to a FeatureCache object. A FeatureCache is an
>     implementation of
>     > the FeatureCollection interface that actually manages data
>     behind the
>     > light-weight Feature objects.
>     >
>     >
>     >
>     > The FeatureCache maintains a "buffer". In this buffer it stores
>     > in-memory representations of regular OpenJUMP Feature objects. This
>     > buffer will only grow to a maximum size that can be set by the user
>     > and based on the balance between speed/performance and memory usage.
>     > When a method call is made to the light-weight Feature object it is
>     > forwarded to the FeatureCache. The FeatureCache passes this call to
>     > the regular Feature object if it is in the buffer. If it is not
>     in the
>     > buffer the Feature object is created in memory from information in
>     > permanent storage or "on-disk". The method call is then
>     processed and
>     > the newly created Feature is placed in the buffer. If the buffer is
>     > already at its limit the oldest Feature in the Buffer is stored back
>     > in permanent memory and removed from the buffer.
>     >
>     >
>     >
>     > There should be no major distinction between Features and a
>     > FeatureCollection implemented by a FeatureCache and normal Features
>     > and FeatureCollections that are stored entirely in memory. The only
>     > significant difference will be the speed of operations and
>     rendering.
>     > This will be slower with this system than it is with Features and
>     > FeatureCollections stored entirely in memory. However, it will
>     make it
>     > possible to work with very large datasets.
>     >
>     >
>     >
>     > Here is the part of the system that I would like to get some
>     > suggestions on. I need to decide on a storage format for the
>     features
>     > placed in permanent memory, or on disk. I think I have 3 choices.
>     >
>     >
>     >
>     > [1] Java's Standard Object Serialization Format
>     >
>     > [2] A custom binary storage format.
>     >
>     > [3] A text based format.
>     >
>     >
>     >
>     > I believe the first two formats will be much quicker than the
>     third. I
>     > don't really think the second format is something I want to do,
>     > because I think cooking up a custom binary format will be a real
>     pain
>     > in the neck. So I need to decide between the first format listed and
>     > the third format listed.
>     >
>     >
>     >
>     > If I use a text-based format external tools will be able to easily
>     > work with the FeatureCache, and I won't have to worry about
>     versioning
>     > issues. It will also be slower. If I use Java's standard object
>     > serialization format I'll have better performance, but I'll have to
>     > worry about versioning issues that might come up if we change the
>     > interface definition for the Feature interface. It will also make it
>     > difficult for external tools, especially those that aren't
>     written in
>     > Java, to work with the data in the FeatureCache.
>     >
>     >
>     >
>     > I'd like to know what storage format the other developers would
>     > recommend and why.
>     >
>     > Thanks,
>     >
>     > The Sunburned Surveyor
>     >
>     >------------------------------------------------------------------------
>
>     >
>     >-------------------------------------------------------------------------
>     >Take Surveys. Earn Cash. Influence the Future of IT
>     >Join SourceForge.net's Techsay panel and you'll get the chance to
>     share your
>     >opinions on IT & business topics through brief surveys-and earn cash
>     >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>     
> <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV>
>     >
>     >------------------------------------------------------------------------
>     >
>     >_______________________________________________
>     >Jump-pilot-devel mailing list
>     > Jump-pilot-devel@lists.sourceforge.net
>     <mailto:Jump-pilot-devel@lists.sourceforge.net>
>     >https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>     >
>     >
>
>
>     -------------------------------------------------------------------------
>
>     Take Surveys. Earn Cash. Influence the Future of IT
>     Join SourceForge.net's Techsay panel and you'll get the chance to
>     share your
>     opinions on IT & business topics through brief surveys-and earn cash
>     http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>     
> <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV>
>     _______________________________________________
>     Jump-pilot-devel mailing list
>     Jump-pilot-devel@lists.sourceforge.net
>     <mailto:Jump-pilot-devel@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>     <https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel>
>
>
>------------------------------------------------------------------------
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Jump-pilot-devel mailing list
>Jump-pilot-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>  
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

Re: [JPP-Devel] FeatureCache - Request for suggestions...

Reply via email to