Hi!
I have now implemented and successfully run unit test for all major read
capabilities of the EFeature datastore. However, when I started on
implementing write capabilities, I hit a design choice challenge. In order
to conserve memory, I've implemented SimpleFeature instances using wrapper
classes around EObjects (="rows"), which holds the actual feature values.
Hence, SimpleFeature instance values (attributes and geometries) are not
decoupled from the backing resource.
(warning, long email)
Looking at other FeatureReader implementations, I see that this is
principally different from "the geotools implementation norm", which
I believe is to copy the feature values from the data source (RDMS, file
etc.) and then build a new feature instance using a SimpleFeatureBuilder,
each time the client code read features from the data source. This make
sense when data is located in RDMS or files; when the JDBC resultset or file
resource handle is closed, any allocated memory is easily reclaimed by the
system.
For EMF however, things are a bit different. The default EMF model
implementation strategy (which many EMF models therefore implements), add
strong references between all associated objects in the model when
de-serialized from the backing resource (typically a XML file). This
prevents EObjects from being garbage collected after the client code is
finished reading features from it (EFeatureReader is closed).
There are standard solutions to this problem, like forcing EMF to unload the
EMF model. However, for a EMF savvy client, this would in many use-cases be
a non-standard behavior, and would defeat the purpose of EFeature, which is
to extract features from existing models an coexist with other EMF
consumers, not take control over the backing resource on the expense of
other consumes.
Although EFeature do not assume anything about how EMF model instances
handles references, the default case strongly suggest that I put some effort
into minimize feature value duplication (EObject+SimpleFeature=2 values per
property). I could take the easy road by assuming that SimpleFeature
instances are just strongly referenced by client code for a limited amount
of time (f.ex per method/analysis), or that EMF models should employ a
implementation strategy which involve resource unloading or weak references
instead of strong (f.ex CDO). This however, restricts the applicability
of EFeature datastores, making it less useful.
So, I'm left with the wrapper solution described above. The question then
is, does this violate contracts between client code and geotools? It think
so, because values written to SimpleFeatures, acquired using EFeatureReader,
is written directly to the backing resource, which is analogous to writing
directly to a JDBC connection or file buffer. This effectively "shortcuts"
the purpose of FeatureWriters. It also circumvents any transactions and
locking mechanisms. So, I'm forced to resort back to a scheme resulting in
feature value duplicating (EObject+SimpleFeature).
Does anyone have some suggestions to what I should do? Should I keep my
current non-standard implementation, or revert back to the standard way of
building SimpleFeatures with values decoupled from the EObjects they where
constructed from? Does client code normally keep or discard strong
references to SimpleFeature instances returned by FeatureReaders and
FeatureCollections?
There are some other memory issues which I'm tinkering with, but I think
this one is the most important to address right now.
Cheers,
Kenneth
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense..
http://p.sf.net/sfu/splunk-d2d-c1
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel