Does this mean the geotools DAL will support joins in the future?
The past few months I have been doing some serious class hacking of jdbc
classes in app-schema to provide some support for faster data retrieval
with feature chaining through joining.
It would of course be even better if geotools actually supported joining.
Regards
Niels
On 14/06/11 03:16, Justin Deoliveira wrote:
Hi all,
The last major bit of the wfs 2 work is joins. I wanted to start some
discussion here and post some questions with regard to the work.
So with the wfs protocol you can do queries now that look like this:
<wfs:Query typeNames="myns:Person myns:Person" aliases="a b">
<fes:Filter>
<fes:And>
<fes:PropertyIsEqualTo>
<fes:ValueReference>a/Identifier</fes:ValueReference>
<fes:Literal>12345</fes:Literal>
</fes:PropertyIsEqualTo>
<fes:PropertyIsEqualTo>
<fes:ValueReference>a/spouse<fes:ValueReference>
<fes:ValueReference>b/Identifier</fes:ValueReference>
</fes:PropertyIsEqualTo>
</fes:And>
</fes:Filter>
</wfs:Query>
The result is a feature collection that does not contain only feature
members, but tuples of feature members. Something like:
<wfs:member>
<wfs:Tuple>
<wfs:member>
<ns1:FeatureTypeOne>...</ns1:FeatureTypeOne>
</wfs:member>
<wfs:member>
<ns1:FeatureTypeTwo>...</ns1:FeatureTypeTwo>
</wfs:member>
</wfs:Tuple>
</wfs:member>
With that providing a bit of context I would like to bring up some
points of discussion.
* app-schema vs simple features
With knowing zero about app-schema currently I believe there is the
ability to do joins via feature chaining. However my impression is
that these relationships are configured before hand and not really
created on the fly? Correct me if I am wrong.
So perhaps we could just say that we support joins with app-schema and
call it a day. However that said I do think there is a case for
supporting joins with simple features as well. And to be honest
working with app-schema, because of the learning curve, would be out
of scope for this project.
* cross datastore joins
When talking about doing joins there are varying levels of complexity.
For instance talking about supporting joins of feature types within a
jdbc datastore is one thing. Supporting joining say a shapefile
feature type to a jdbc feature type is a total different ball of wax.
Doing cross datastore joins is something i think would be neat... but
far from trivial to do it in a way that scales. A much simpler problem
would be joining two feature types within the same datastore. However
still unless the datastore is one that can do joins natively (jdbc is
really the only one here) it is still a hard problem. For instance
consider attempting to join two Shapefile feature types from the same
datastore... doable but again difficult to do in a non naive way.
* query interface
Given that only some datastores can do joins efficiently makes it a
good candidate for QueryCapabilities with the addition of a method
"isJoiningSupported". That interface change is relatively straight
forward. However one that is not is how to modify Query (if that is
the way to go) to support joins. I can think of a few different
strategies:
1. Not modify it at all and come up with a new interface called
"JoinSupportingDataStore" or something that adds some new methods for
joins.
2. Subclass Query and add some new join methods. Looking around
I actually notice that there is some code in app-schema that does just
this called JoiningQuery
3. Modify Query directly to add support for joins
Thoughts? When I thought about the alternatives I thought (3) made the
most sense. Especially given how we support other concepts that are
not supported in all datastores like sorting.
So I decided to go further with (3), and added a class called "Join",
that looks something like the following:
class Join {
/** the feature type being joined to */
String getTypeName();
/** the attributes from the joined feature type to select */
List<PropertyName> getProperties()
/** the join filter */
Filter getJoinFilter();
/** additional filter to apply to the feature type being joined to */
Filter getFilter();
}
And then it was a matter of modifying Query adding a new property.
class Query {
List<Join> getJoins();
}
So with this api the above query would look something like this:
Query q = new Query("Persons");
q.setFilter(PropertyIsEqualTo(PropertyName("Identifer"), Literal(12345)));
Join j = new Join("Persons");
j.setJoinFilter(PropertyIsEqualTo(PropertyName("spouse"),
PropertyName("Identifer")));
q.getJoins().add(j);
That is obviously simplified quite a bit... there still a few things
to iron out like handling name clashes, etc... but that would be the
general idea. Thoughts?
* joined features
Another major question is what should the result of a join look like?
Given that the current return from a query is features I thought it
best to stick with that not come up with some new class or something
to represent a tuple (although maybe that is something worth
considering). I thought of a few different alternatives. To illustrate
consider two feature types:
f1 (name, geometry)
f2 (name, foo, geom)
1. Return a single feature with attributes from joined feature types
"rolled into it". So the resulting joined feature would look like:
f'(name, geometry, name, foo, geom)
2. Return a single feature that contains attributes for joined features:
f'(name, geometry, f2)
3. Return a single feature that contains attributes for all features
in the join
f'(f1,f2)
All methods have their various issues. (1) for instance requires that
we break simple feature rules since we have two attributes with the
same local name.
(2) requires us to have attribute types that are SimpleFeatureType.
Which I don't think technically violates simple feature rules although
admittedly not something that happens often.
(3) Same more or less as (2) but more represents the notion of the
"tuple". Question is what id to give to the feature? If any?
Pretty open to suggestions on this one... i imagine there is probably
a better solution than any of those three. In the end with the
prototype i decided to go with (2). Seemed the least invasive.
* join types
Joins come in many flavors... inner vs outer, etc... The wfs spec
specifies that the semantics are that of an inner join. But I guess we
could add some notion of join type to the join class so that a user
could specify which type of join they want? Or maybe just stick with
inner join since that is the requirement and the most common case?
That is about it for now... sorry it's a lot random thoughts i know. I
currently have a basic implementation working in the jdbc module. It
needs testing and to handle some more special cases but with it I have
been able to do a variety of joins, both "standard" and spatial.
Thoughts and feedback welcome. Thanks folks.
-Justin
--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.
--
*Niels Charlier*
Software Engineer
CSIRO Earth Science and Resource Engineering
Phone: +61 8 6436 8914
Australian Resources Research Centre
26 Dick Perry Avenue, Kensington WA 6151
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel