Re: [JPP-Devel] Refactoring JUMP fpr support oflarge spatialdatasets...

Michaël Michaud Wed, 23 Aug 2006 14:24:59 -0700

Sunburned Surveyor a écrit :

>Michael,
>
>I am glad I am not the only one who got a headache after trying to
>understand some of GeoTools code. Overall it seems like a great
>project but I've always found the lack of developer documentation a
>frustration. :]
>
>Before I talk about the memory problem, I had a couple of questions for you:
>
>[1] How do you pronounce your first name? (This is probably a funny
>question from a guy on a mailing list, but I am just curious.) :]
>  
>
Pronounciation is MI(lk) CA(ffe) EL(se). But some friends of mine say 
Michael as you - english speaking guys - do. So you can call me Michael, 
I'll understand ;-)


>[2] Which JUMP project are you involved in, if any? Is it SIGLE?
>  
>
I did most of my development before JPP and Sigle started. Now, I try to 
collaborate with Erwan from sigle team, but I have not much time to offer.

>[3] Are you still actively using or developing with OpenJUMP?
>  
>
Not much. I try to help for questions or development I already did in 
the past, and sometimes, I test a new idea. I don't use OpenJUMP in my 
daily work, but I am very enthousiast about its development.

>I was very pleased to hear from you again, and to learn that you had
>also given some serious thought to OpenJUMP's memory problem. It looks
>like you even implemented some solutions. I am very eager to hear
>about what you learned, as this will be a great aid in my own efforts.
>  
>
For some reasons, I never could implement a solution based on hsqldb. I 
implemented a kind of "feature-on-demand" class, but memory usage was 
worst than shapefile reader one (maybe a bad use of hsqldb cache : most 
of the memory were used by hsqldb strings)

>You mentioned that you "successfully implemented the feature interface
>to read features on disk". Did this not solve the problem? I am
>curious because it is a solution that I considered. My only hesitation
>was that a Feature interface implemented in this way would still
>require some level of representation in RAM. It this representation
>was significant, and the spatial dataset large enough, you will still
>hit the RAM restriction, though not as soon. How did this solution
>work out, and should I consider it?
>  
>
This implementation is quite efficient, but it is read-only. Keeping a 
feature in-memory is not a problem if memory is freed each time a 
feature is read.
You and David already mentionned the main problems I encountered :
- speed penalty : geometry (in a text format) is parsed each time it is 
used
- it does not solve the memory problem when all the features have to be 
displayed (full extent) but only when you zoom in
The reader use just a little memory for invisible features (see below) , 
but still use memory for features visible on screen. I suppose there are 
some good reasons but I don't understand the whole framework.

I had to decide what to keep in memory after the file is scannned for 
the first time. I tried the following :
just the feature address in the file : less memory but important speed 
penalty (the reader has to parse each feature geometry of the file just 
to know if it has to be displayed)
the feature address + the bounding box : more memory usage but much more 
efficient when you have to pan or zoom
the feature address + spatial index (quadtree or rtree) : more memory 
usage for a small speed improvement only.

I keep the second solution (a spatial index can still be used with this 
solution)

>If this is not a workable solution I think my only option will be to
>follow Alvaro's suggestion and change the classes in OpenJUMP's source
>code that depend on the getFeatures() and query() methods. (Although
>this looks like it will be a bear of a task. The Agile code won't be
>of much use to me, as Alvaro writes his source code comments in
>Spanish...) :]
>  
>
I don't understand everything. I thought that agiles plugin used to work 
with a standard jump distribution (in fact with an old distribution, 
because a compatibility pb appeared in new versions).
It seems that SAIG used agiles code for their own jump version and they 
claim they can read millions of objects. The bad thing is their source 
code is not available on their site, but I think they will have to make 
it available to respect the licence terms.

>Are you interested in working with me on this solution? If you are
>working with the SIGLE project and would like to incorporate my
>changes/plug-in I'd like to stay intouch with you about my design for
>the solution to this memory problem and how I can make sure it is
>compatible with SIGLE.
>  
>
I don't know the changes vivids made in the core api, but I think it 
should be a good start to understand the new api and to try to write new 
readers using less memory with this api.
If you have a good solution which solve more problems (no memory for all 
usecases !), may be vivids will be interested and will add it to the core...

I don't like to make promise when I know I have no time to do. But if I 
have ideas or code to share about this pb, I'll contact you through the 
list.

I don't send you the code I wrote because it is a specific format and I 
think it is not easy to understand all the implementation details, but 
here are the main classes and methods :
If X is my format :

class XMap {
   
    RandomAccessFile raf;
    Map datasets;
 
    public static XMap open(File file);
    public FeatureCollection getFeatureCollection(String className)
    public Feature getFeature (int id)
}

class XFeature extends AbstractBasicFeature {
    XMap xmap;
    long pos;
    int length;
    double minX, minY, maxX, maxY;
   
    public XFeature(XMap map, long pos, int length, int id);
    public Object getAttribute(int i) {// code reading the file with the 
raf object}
    public Object getAttributes() {// code reading the file with the raf 
object}
}

Sincerly

Michael

>Let me know.
>
>The Sunburned Surveyor
>
>
>On 8/22/06, David Zwiers <[EMAIL PROTECTED]> wrote:
>  
>
>>Michaël,
>>
>>We are planning to include an API for writing to DataStores at some point in 
>>the future, but we do not have a target date yet.
>>
>>I should also remind you, this API is not intended to feed a streaming 
>>renderer (like Geotools), so there will still be cases where the size of the 
>>data will preclude the user from viewing all the data at one time.
>>
>>David Zwiers
>>Vivid Solutions
>>Telephone : (250) 385-6040
>>
>>
>>-----Original Message-----
>>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michaël Michaud
>>Sent: August 22, 2006 4:03 PM
>>To: List for discussion of JPP development and use.
>>Subject: Re: [JPP-Devel] Refactoring JUMP fpr support oflarge 
>>spatialdatasets...
>>
>>
>>    
>>
>>>P.S. - If  I can't get this figured out I will give you a call, but I
>>>wanted to post this to the list for the benefit of the other
>>>developers.
>>>
>>>
>>>      
>>>
>>I am very pleased to follow this discussion.
>>I already tested shapefile-scalable from agile, unsuccessufully tried to
>>implement a hsqldb-database connection to avoid the memory pb,
>>successfully implemented the feature interface to read features on disk,
>>and catched headaches trying to understand the GeoTools datastore api.
>>So I am pleased to follow this interesting discussion on how will jump
>>soon allow large dataset reading.
>>
>>The next question is... is the new datastore api planned for scalable
>>readers only or does it include data modification capabilities for
>>scalable disk-based readers/writers ?
>>
>>Michaël
>>
>>    
>>
>>>On 8/22/06, David Zwiers <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>      
>>>
>>>>Sunburned,
>>>>
>>>>Most of the required components are readily available. I would look into
>>>>leveraging the DataStore framework (Jump CVS) in conjunction with an
>>>>indexed SHP reader (GeoTools port / interface?). You need only write the
>>>>'glue code' between the DataStore framework (already does caching based
>>>>on the viewing area) and your SHP reader of choice.
>>>>
>>>>If you need a few more directed hints to get started, feel free to drop
>>>>me a line.
>>>>
>>>>David Zwiers
>>>>Vivid Solutions
>>>>Telephone : (250) 385-6040
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: [EMAIL PROTECTED]
>>>>[mailto:[EMAIL PROTECTED] On Behalf Of
>>>>Sunburned Surveyor
>>>>Sent: August 22, 2006 8:26 AM
>>>>To: List for discussion of JPP development and use.
>>>>Subject: [JPP-Devel] Refactoring JUMP fpr support of large
>>>>spatialdatasets...
>>>>
>>>>I've been taking a look at the OpenJUMP source code, as well as the
>>>>work done by Alvaro on Agiles version of OpenJUMP, in an effort to
>>>>come up with a relatively simple solution to the RAM restriction
>>>>OpenJUMP has when reading large ESRI Shapefiles or other spatial
>>>>datasets. I've hit an obstacle of sorts, and I was hoping some of the
>>>>more experienced developers would have a solution.
>>>>
>>>>Alvaro overcame the RAM restriction by creating an object that
>>>>implemented the  FeatureCollection interface, but stored its features
>>>>on disk, and not in RAM.  I had the same idea, and was initially
>>>>pleased when I saw Alvaro had pioneered this technique. I didn't
>>>>understand why Alvaro had created his own version of the OpenJUMP
>>>>core, instead of encapsulating his support for large spatial datasets
>>>>in a plug-in.
>>>>
>>>>Then I read Alvaro's note:
>>>>
>>>>" In package org.agil.dao, you could see a FeatureCollection
>>>>implementation
>>>>(FeatureCollectionOnDemand) which read features directly from a
>>>>database.
>>>>After use it, you must change all JUMP code where get a List from a
>>>>FeatureCollection, and change it for an Iterator."
>>>>
>>>>I took a look at the source code and found that Alvaro was indeed
>>>>correct. The FeatureCollection interface has two methods that return a
>>>>generic list of all the Feature objects in the FeatureCollection. This
>>>>creates a problem, as all of the Feature objects then need to exist in
>>>>RAM, which is what we are trying to avoid in the first place.
>>>>
>>>>One of these two methods, the getFeatures() method, is called 26 times
>>>>in other parts of the source code.
>>>>
>>>>What does this mean? It means both JUMP and OpenJUMP are currently
>>>>wired in a way that prevents RAM independent FeatureCollection
>>>>implementations without some significant changes to the core source
>>>>code. (It would require that all of the methods that expect a list of
>>>>all the Features in a FeatureCollection be changed to work with an
>>>>Iterator object, as Alvaro suggested.)
>>>>
>>>>There are a couple of ways to fix this. I want to know which way will
>>>>be the simplest and most effective.
>>>>
>>>>[1] Refactor the OpenJUMP source code to use Iterators instead of
>>>>lists when calling the two problem methods of the FeatureCollection
>>>>interface.
>>>>
>>>>[2] Create a "RAM" independent object that implements the Feature
>>>>interface instead of a RAM independent implementation of an object the
>>>>Feature Collection interface. We could then return a list of these RAM
>>>>independent Features. (I'm not sure how this would reduce memory usage
>>>>though. Probably not as well as a RAM independent FeatureCollection
>>>>implementation. There might also be a pretty severe performance
>>>>penalty. ???)
>>>>
>>>>[3] Create a custom implementation of the Java.Util.List interface
>>>>that doesn't require its member objects to be in RAM. The only real
>>>>problem I saw with this technique initially is the toArray() method. I
>>>>suppose we could return an empty array, because I don't know another
>>>>way to implement this method without placing all the Feature Objects
>>>>into the array, and thus into RAM. This technique will also ivolve a
>>>>bit of work, as the List interface contains quite a few methods.
>>>>
>>>>I'm really looking for the simplest solution. I have planned a
>>>>"spatial database" for OpenJUMP-Ex that will solve this problem in
>>>>OpenJUMP's design. So I don't really want to invest a lot of energy
>>>>into what I hope will be a temporary "band-ais" solution.
>>>>
>>>>Thank you for your thoughts.
>>>>
>>>>The Sunburned Surveyor
>>>>
>>>>------------------------------------------------------------------------
>>>>-
>>>>Using Tomcat but need to do more? Need to support web services,
>>>>security?
>>>>Get stuff done quickly with pre-integrated technology to make your job
>>>>easier
>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>>>Geronimo
>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>_______________________________________________
>>>>Jump-pilot-devel mailing list
>>>>Jump-pilot-devel@lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>>>
>>>>
>>>>
>>>>-------------------------------------------------------------------------
>>>>Using Tomcat but need to do more? Need to support web services, security?
>>>>Get stuff done quickly with pre-integrated technology to make your job 
>>>>easier
>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>_______________________________________________
>>>>Jump-pilot-devel mailing list
>>>>Jump-pilot-devel@lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>-------------------------------------------------------------------------
>>>Using Tomcat but need to do more? Need to support web services, security?
>>>Get stuff done quickly with pre-integrated technology to make your job easier
>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>>_______________________________________________
>>>Jump-pilot-devel mailing list
>>>Jump-pilot-devel@lists.sourceforge.net
>>>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>>
>>>
>>>
>>>
>>>      
>>>
>>-------------------------------------------------------------------------
>>Using Tomcat but need to do more? Need to support web services, security?
>>Get stuff done quickly with pre-integrated technology to make your job easier
>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>_______________________________________________
>>Jump-pilot-devel mailing list
>>Jump-pilot-devel@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>
>>
>>
>>-------------------------------------------------------------------------
>>Using Tomcat but need to do more? Need to support web services, security?
>>Get stuff done quickly with pre-integrated technology to make your job easier
>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>_______________________________________________
>>Jump-pilot-devel mailing list
>>Jump-pilot-devel@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>>
>>    
>>
>
>-------------------------------------------------------------------------
>Using Tomcat but need to do more? Need to support web services, security?
>Get stuff done quickly with pre-integrated technology to make your job easier
>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>_______________________________________________
>Jump-pilot-devel mailing list
>Jump-pilot-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>
>
>  
>


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

Re: [JPP-Devel] Refactoring JUMP fpr support oflarge spatialdatasets...

Reply via email to