[RDF] Fluent parser API?

Stian Soiland-Reyes Tue, 20 Feb 2018 08:49:50 -0800

ajs6f and others who like talking about immutability and fluent APIs:

Stuck on a train with no WiFi I plundered a bit further with the idea of 
a fluent Parser (& writer) API for Commons RDF. I have committed this
idea to the fluent-parser branch for now:


https://github.com/apache/commons-rdf/compare/fluent-parser

(Note that I have not wired up any of the implementations)


Before we talked about mutability/immutability. I think I might have
found some middle ground:


RDF.parser(RDFSyntax) returns a Parser instance (perhaps we should call
it RDFParser?)

https://s.apache.org/RDFparser

if the syntax is unsupported, then Optional.empty(). The argument null
can be used for syntax guessing, content negotiation, e.g. as supported
by Jena RIOT (but not by JSONLD-Java).


The parser
http://stain.github.io/commons-rdf/fluent-parser/org/apache/commons/rdf/api/io/Parser.html
takes a single ParserConfig argument
http://stain.github.io/commons-rdf/fluent-parser/org/apache/commons/rdf/api/io/ParserConfig.html

which is the (hopefully serializable) bean of the job to parse. Most
importantly a ParserSource (e.g. a IRI or Path) and a ParserTarget
(e.g. a Dataset). 

You can start building these either from a 
ParserSource.immutable() or ParserSource.mutable()
and then use fluent interface with .withBase(), .withSource() etc.

The interfaces these expect have similar helpers.

E.g.

    RDF rdf = new JenaRDF();
    ParserConfig c = ParserConfig.mutable()
      .withSyntax(RDFSyntax.TURTLE)
      .withSource(ParserSource.fromPath(
         Paths.get("/tmp/file.ttl")))
      .withTarget(ParserTarget.toDataset(
        rdf.createDataset()))
      .asImmutableConfig();

The .withOption() can be used to set a key-value pair for
vendor-specific Options. I think further work is needed there to ensure
those are serializable.


This can then be used like this:

    Parsed parsed = rdf.parser(null).parse(c);  

(syntax will be picked up from c in this case)

http://stain.github.io/commons-rdf/fluent-parser/org/apache/commons/rdf/api/io/Parsed.html
 
is just the source and target of the completed job, as well as the count
of how many were parsed. This is obviously more interesting in an async
parser case where you have many parser sessions. (I'll come to that in
another email)


The mutable ParserConfig bean modifies in-place the config values, so
you can set them many times, returning the same bean:
https://s.apache.org/MutableParserConfig

This is naive, simple, and definitely not thread-safe, but it should
have minimal performance hits.


The .asImmutableConfig can make a snapshot copy to a similar bean
that is immutable (its with* method returns new instances) - that would
not really be needed if you don't keep around the ParserConfig
instance and don't do async parser jobs, as both mutable and immutable
version comply with the ParserConfig interface. (Discuss!)


ParserConfig.immutable() works just the same, but
keeps every step immutable, starting with a "null" config. Thus every
step can be used both as argument for multiple Parser sessions, but also
for thread-safe re-use of ParserConfig (e.g. to vary the source).

Here instead of copying lots of fields we start with an empty config
with no fields and all methods returning Optional.empty()
https://s.apache.org/ImmutableParserConfigImpl

Any call to a .withSomething() will then create a "child" config that
delegates to unchanged properties and keeps the new value. For instance
https://s.apache.org/WithTarget only overrides .target() while .source()
will go to the (immutable) parent config. Unset properties will
therefore fall back to the underlying Optional.empty() in the initial
bean.


Note that "parent" here does not mean subclass, but delegation. (To
confuse matters and avoid code smell all parent delegation is done by
the WithParent superclass)

As such the immutable parser configs are thread-safe and can be re-used
many times. They are also Serializable (which makes a single snapshot
https://s.apache.org/SnapshotParserConfig
while the MutableParserConfig on purpose is not serializable.


BTW: The With* classes are quite light-weight and so would hopefully not
 come with a much larger memory footprint than the mutable bean. 
 There is some potential memory waste here if a property is overridden
 many times, e.g. c.withSource(a).withSource(b).withSource(c)
 as the intermediate immutable configs in the "parent tree" is still
 kept. In extreme instances this could cause a stack overflow when
 looking up properties.  (Someone optimizing that can use
 c.asMutableConfig().asImmutableConfig() to cause a new base snapshot.)



As this is the config interface which happens to be fluent, but it is
not opinionated, e.g. you are allowed to "forget" a parser source which
generally won't make sense. There is no overloading shortcuts such as
.withSource(path).  Also you have to fetch the Parser and pass the
config along yourself.

The advantage I see here is that the JenaParser implementation gets a
very clean Parser interface to implement, no abstract class needed.  It
might however need to check itself that the config is complete enough
for its needs as in theory it could be a null-config with nothing set.

I'll come back to the alternative ParserBuilder interface
which guides the client caller step by step straight into a parsed file.


-- 
Stian Soiland-Reyes
The University of Manchester
http://www.esciencelab.org.uk/
http://orcid.org/0000-0001-9842-9718


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[RDF] Fluent parser API?

Reply via email to