Hi!
Sorry, I started to write some comments yesterday, but I'm a volunteer at
the local Red Cross SAR team and we were call out to search for a missing
person before I was able to complete them.
Anyways, you find my comments below :-)
2011/6/26 Jody Garnett <[email protected]>
> Hi Kenneth - epic email!
>
> To answer - You should really be able to have your feature be a lazy
> wrapper around a real object (or eee-object in this case).
>
Yes, that is what I do today if you mean "lazy wrapper" == "wrapper created
when client code asks for it". I've created an internal object
(EFeatureInternal) which handles this. Both first class EFeature
implementations (? extends EFeatureImpl) and delegates (any EObject
containing Geometry data) use to implement the EFeature interface. When the
EFeatureReader requests SimpleFeature instances, it calls EFeature.getData()
which constructs the wrapper lazily. This wrapper is a inner class, allowing
it to access all members of EFeatureInternal, including the EObject which
holds all data. The SimpleFeatureType implementation is also a lazy wrapper
of the EClass that maps EObjects to EFeature. I've implemented this very
carefully, ensuring that only one mapping exists for each unique
SimpleFeatureType instance (EFeatureInfo).
EFeatureInfo.getFeatureType() constructs the wrapper lazily. Every EFeature
instance holding a SimpleFeature instance of given feature type, references
the same SimpleFeatureType.
> You may wish to consider some decoupling in order to handle
> transaction.commit()?
>
Yes, I think this is a good solution. In this case, I think EFeature should
override any client request for "coupled" SimpleFeature instances (lazy
wrappers). I have added a convenience interface called ESimpleFeature which
returns a reference to the EObject holding it's data. I going to add a
boolean method indicating if the data is holds is "coupled" or not. Here,
"coupled" implies that the SimpleFeature instance is a lazily created
wrapper around a EObject, and "uncoupled" implies that the instance extends
SimpleFeatureImpl (and implements ESimpleFeature) which contains data
copied from the referenced EObject. This spurs another thought: When
decoupled, but still (weakly) referencing the EObject is was created from, a
read (from EObject) and write (to EObject) method could be added.
Furthermore, if a transaction is passed to the SimpleFeature
implementations, I should be able to allow transaction.commit() even in the
coupled case?
>
> Many of the examples involve making a copy (and making too many copies to
> be thread-safe); there are some Query hints where client code can
> communicate how independent they need you to be. For something like the
> renderer they don't need you to be that independent as they are mostly just
> accessing the data in read-only mode.
>
Thanks, this helped a lot. I believe
Hints.FEATURE_DETACHED<http://docs.geotools.org/stable/javadocs/org/geotools/factory/Hints.html#FEATURE_DETACHED>
fits
like a glove :-)
>
> Suggestions:
> - this could be a lazy copy; and may be considered only useful for things
> that can be modified?
>
Yes, and FEATURE_DETACHED==true should be enforced by EFeature whenever a
transaction requires it, or when a client explicitly asks for it by passing
FEATURE_DETACHED==true.
> - trap setting of an attribute and store the result in your transaction
> state (you may or may not find transaction state diff useful?)
>
I am not that familiar with how geotools handles transactions, I'll look
into that.
> - transaction.rollback is an invitation to throw out any modifications
>
Yes, and this is trivial if FEATURE_DETACHED==true (just remove any
references to detached ESimpleFeatures). If FEATURE_DETACHED==false, I
believe I could use a EMF change
recorder<http://download.eclipse.org/modeling/emf/emf/javadoc/2.5.0/org/eclipse/emf/ecore/change/util/ChangeRecorder.html>
to
track changes made to attributes and just forward Transaction.rollback() to
that.
> - transaciton.commit is when you get to push the changes to your EObject
>
That's right. However, EMF do supply native support for partial updates, it
only allows update of all changes. There are some tricks that might allow
partial commits, but in the general case this would be difficult to achieve.
Hmm...
>
> You could however optimise:
> - transaction AUTO_COMMIT you could update the EObject directly when
> write() is called? (this is close to your current solution and I bet you
> could rescue your current work to use in this case??)
>
Good point. I'll try that.
>
> I have also experimented with different options:
> - just have a single feature; and change what EObject it is pointing to
> behind the scenes; not really a popular option but it worked well for
> database results when a feature could be backed on to a row
>
I haven't thought of that, thanks. This would partly solve the memory
problem, but probably only for read operations, which is fine since this
probably the most common occurrence.
>
> Sorry i was unable to get back to you until the weekend; I would like to
> ask if you had a look at the abstract data store tutorial and if it could of
> been more clear about this transaction stuff?
>
I've read the tutorial many times. It has been a source of inspiration and
it has helped me to understand many aspects of building GT data stores. The
tutorial is built around the API, which is fine. However, when trying to
understand how transactions fit into the overall picture, the documentation
about transaction becomes a bit fragment. The design whole feature
coupled/decoupled challenge is not covered either. I think this is a bit "of
topic" with regard to the purpose of the tutorial (it is enough to process
in one go already). Maybe the topic of modifications and how
transactions allow access to the contents of a data store during
modifications is worthy a own chapter in the user guide? This could also
make the abstract data store tutorial easier to read and understand by
linking to appropriate parts in the chapter on modification. This would also
be a great place to discuss implementation specific challenges like feature
values coupling/decoupling to the backing store.
> --
> Jody Garnett
>
> On Friday, 24 June 2011 at 9:42 AM, Kenneth Gulbrandsoy wrote:
>
> Hi!
>
> I have now implemented and successfully run unit test for all major read
> capabilities of the EFeature datastore. However, when I started on
> implementing write capabilities, I hit a design choice challenge. In order
> to conserve memory, I've implemented SimpleFeature instances using wrapper
> classes around EObjects (="rows"), which holds the actual feature values.
> Hence, SimpleFeature instance values (attributes and geometries) are not
> decoupled from the backing resource.
>
> (warning, long email)
>
> Looking at other FeatureReader implementations, I see that this is
> principally different from "the geotools implementation norm", which
> I believe is to copy the feature values from the data source (RDMS, file
> etc.) and then build a new feature instance using a SimpleFeatureBuilder,
> each time the client code read features from the data source. This make
> sense when data is located in RDMS or files; when the JDBC resultset or file
> resource handle is closed, any allocated memory is easily reclaimed by the
> system.
>
> For EMF however, things are a bit different. The default EMF model
> implementation strategy (which many EMF models therefore implements), add
> strong references between all associated objects in the model when
> de-serialized from the backing resource (typically a XML file). This
> prevents EObjects from being garbage collected after the client code is
> finished reading features from it (EFeatureReader is closed).
>
> There are standard solutions to this problem, like forcing EMF to unload
> the EMF model. However, for a EMF savvy client, this would in many use-cases
> be a non-standard behavior, and would defeat the purpose of EFeature, which
> is to extract features from existing models an coexist with other EMF
> consumers, not take control over the backing resource on the expense of
> other consumes.
>
> Although EFeature do not assume anything about how EMF model instances
> handles references, the default case strongly suggest that I put some effort
> into minimize feature value duplication (EObject+SimpleFeature=2 values per
> property). I could take the easy road by assuming that SimpleFeature
> instances are just strongly referenced by client code for a limited amount
> of time (f.ex per method/analysis), or that EMF models should employ a
> implementation strategy which involve resource unloading or weak references
> instead of strong (f.ex CDO). This however, restricts the applicability
> of EFeature datastores, making it less useful.
>
> So, I'm left with the wrapper solution described above. The question then
> is, does this violate contracts between client code and geotools? It think
> so, because values written to SimpleFeatures, acquired using EFeatureReader,
> is written directly to the backing resource, which is analogous to writing
> directly to a JDBC connection or file buffer. This effectively "shortcuts"
> the purpose of FeatureWriters. It also circumvents any transactions and
> locking mechanisms. So, I'm forced to resort back to a scheme resulting in
> feature value duplicating (EObject+SimpleFeature).
>
> Does anyone have some suggestions to what I should do? Should I keep my
> current non-standard implementation, or revert back to the standard way of
> building SimpleFeatures with values decoupled from the EObjects they where
> constructed from? Does client code normally keep or discard strong
> references to SimpleFeature instances returned by FeatureReaders and
> FeatureCollections?
>
> There are some other memory issues which I'm tinkering with, but I think
> this one is the most important to address right now.
>
> Cheers,
> Kenneth
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense..
> http://p.sf.net/sfu/splunk-d2d-c1
> _______________________________________________
> Geotools-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>
>
Cheers,
Kenneth
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel