Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Andy Seaborne Mon, 19 Jan 2015 07:16:27 -0800

On 17/01/15 12:00, Bruno P. Kinoshita wrote:

Hi Andy!

Jena can (and does) support multiple APIs over a common core.

A commons-rdf API can be added along side the existing APIs; that means
it is not a "big bang" to have commons-rdf interfaces supported.


That's great! Would the commons-rdf dependency go in jena-core/pom.xml? Is it 
going to be necessary to change some classes in the core? I think it will be 
transparent for other modules like ARQ, Fuseki, Text. Is that right?


I don't think so - Jena's core is "generalized" RDF and this is important.

Just adding any new interfaces to the code Node (etc) objects isn'tideal: you get multiple method names for the same thing. And thehashcode/equality contract to work across implementations (hashCode() ofimplementation A must be the same as hashCode() of implementation B whenequality is the same ) is really quite tricky.


See also my comments about using classes not interfaces.

I personally do not see the worry about wrappers - for me the importanceis the architectural difference of a presentation API, designed forapplications to write code against, and systems API, designed to supportthe machinery. Java is really rather good at optimizing away the costof wrappers, including with multisite method dispatch optimizations andcoping with dynamic loading code that changes assumptions at a later time.

So a new module that is "jena-commons-rdf" that provides an applicationpresentation API woudl be the obvious route to me. Fuseki etc

And this is only RDF, not Datasets or SPARQL. We discussed that andfairly easily came to the conclusion that getting some common sooner wasbetter than a complete set of APIs. Some of the natural other ones area lot more complicated - they would build on the terms provided bycommons-rdf.

There is a lot more to working with RDF than the RDF API part - SPARQL
engines don't use that API if they want performance and/or scale. (1)
SPARQL queries collections of graphs and (2) for scale+persistence, you
need to work in parts at a level somewhat lower level than java objects,
and closer to the binary of persistence structures.


Good point. I'm enjoying learning about Jena code for JENA-632. Even though 
datasets, streaming queries collections and all that part about journaling and 
graph persistence can be a bit scary.

:-)

Luckily, journalling and persistent is orthogonal to implementationJENA-632 though as a application feature mapped over the whole system,its a good way of seeing across several components.

Probably that won't be covered in the commons-rdf, but I think that's correct.

I agree - there is a new world out here - a world of large memorymachines, and quite likely, large scale persistent RAM in the not toodistant future. Given the longevity of shared APIs, it's very hard tofind a balance across requirements and expectations. The graph level isnaturally driven by the specs but as soon as systems issues get throwninto the mix, the choice space is much larger.


        Andy


Thanks!
Bruno


----- Original Message -----

From: Andy Seaborne <a...@apache.org>
To: dev@commons.apache.org
Cc:
Sent: Saturday, January 17, 2015 7:40 AM
Subject: Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

On 15/01/15 11:52, Bruno P. Kinoshita wrote:

  Hello!


  I feel like I can't help much in the current discussion. But just

wanted to chime in

  and tell that I'm +1 for a [rdf] component in Apache Commons. As a

commons committer I'd

  like to help.

  I started watching the GitHub repository and have subscribed to the ongoing

discussion. I'll


  tryto contribute in some way; maybe testing and with small patches.


  My go-to Maven dependency for RDF, Turtle, N3, working with ontologies,

reasoners, etc,


  is Apache Jena. I think it would be very positive to have a common

interface that I could

  use in my code (mainly crawlers and data munging for Hadoop jobs) and that

would work


  with different implementations.


  Thanks!

  Bruno


Since you mention Jena ... :-)

Jena can (and does) support multiple APIs over a common core.

A commons-rdf API can be added along side the existing APIs; that means
it is not a "big bang" to have commons-rdf interfaces supported.

There is a lot more to working with RDF than the RDF API part - SPARQL
engines don't use that API if they want performance and/or scale. (1)
SPARQL queries collections of graphs and (2) for scale+persistence, you
need to work in parts at a level somewhat lower level than java objects,
and closer to the binary of persistence structures.

     Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Reply via email to