On Wed, Feb 8, 2017 at 7:27 PM, Matt Sicker <boa...@gmail.com> wrote:
> I've always considered builders to be mutable, thread unsafe objects used > to create immutable, thread safe objects. +1 Gary > If these builders cause GC > pressure to go too high, then I'd turn to object pooling or per-thread > reusable objects depending on how the code is used. > > On 8 February 2017 at 20:38, Stian Soiland-Reyes <st...@apache.org> wrote: > > > Peter Ansell raised a valid question about why in Commons RDF I made the > > RDFParser interface as an immutable factory-like builder pattern. > > > > https://commons.apache.org/proper/commons-rdf/apidocs/ > > org/apache/commons/rdf/experimental/RDFParser.html > > > > > > Here is how it can be used today: > > > > RDF rdf = new JenaRDF(); > > Graph g1 = rdf.createGraph(); > > Future future = new JenaRDFParser() > > .source(Paths.get("/tmp/graph.ttl")) > > .contentType(RDFSyntax.TURTLE) > > .target(g1).parse(); > > future.get(30, TimeUnit.Seconds); // block > > > > > > You may notice it is not a factory as-such, as you only get a Future > > of the parser result, rather than the actual "parser". Also it is > > currently an interface, with multiple implementations (one per RDF > > implementation). > > > > > > > > Although the javadoc only suggests for it to be immutable and > > threadsafe, the abstract class AbstractRDFParser > > https://commons.apache.org/proper/commons-rdf/apidocs/ > > org/apache/commons/rdf/simple/experimental/AbstractRDFParser.html > > https://github.com/apache/commons-rdf/blob/master/ > > simple/src/main/java/org/apache/commons/rdf/simple/experimental/ > > AbstractRDFParser.java > > is both, and thus every "option setter" method therefore return a cloned > > copy of the factory with the mutated fields set there. This simply uses > > .clone() at the moment. > > > > This abstract class and immutability pattern is used by each of the > > three RDF implementations of the interface. > > > > > > This means it's safe to parse many files like this: > > > > JenaRDFParser parser = new JenaRDFParser() > > .contentType(RDFSyntax.TURTLE) > > .target(g1); > > > > // Parse multiple files in parallell > > Future future1 = parser.source(Paths.get("/tmp/graph1.ttl")).parse(); > > Future future2 = parser.source(Paths.get("/tmp/graph2.ttl")).parse(); > > > > // Wait for both to finish > > future1.get(30, TimeUnit.Seconds); > > future2.get(30, TimeUnit.Seconds); > > > > > > Above the second call to parser.source(..) is not a problem (even if > > it happens in another thread) as both .source() calls return separate > > cloned JenaRDFParser instances and don't mutate the "parser" instance. > > > > > > Now a valid critique of such a pattern is that it's wasteful to make > > clones that are immediately thrown away, for instance the above > > future1/future2 example would cause 7 instantiations of JenaRDFParser() > > (.parse() does some finalization in a final clone). > > > > I have not done any benchmarking, but envision that as parsing RDF files > > generally creates many thousands of RDFTerm instances, a couple of > > extra parser instantiations shouldn't make a big dent in the bill. > > > > > > The typical builder pattern is rather to return "this" and mutate fields > > directly. This is obviously not thread-safe but generally achives the > > same. Multiple threads would however instead have to create a brand new > > parser from start and set all the options themselves. (Or they could a > > similar .clone() method from a carefully locked down "proto" instance > > that everyone has to be careful not to call any mutating methods on) > > > > > > Now multiple threads is now very common because of Java 8 streams, and > > for instance it would not be unthinkable that a parallelStream() is > > mapped as a .source() or .target() on such a parser builder. The Futures > > fit somewhat into this picture as well. I think this would get > > complicated with having to set everything again rather than just pass > > "parser::source" as a method reference. > > > > > > > > > > As this was just thrown together as an experiment (and we've not yet > > tried writing the corresponding RDFWriter interface), I think we could > > revisit how to do this pattern, hopefully drawing on the wider > > experience of the Apache Commons developers on this list. > > > > > > > > Roughly here are my requirements: > > > > * Ability to set various options (e.g. path of file to parse, format, > > destination) in any order > > * Avoid mega-constructors/multi-arg methods > > * Some options take multiple types, e.g a source can be given as a Path, > > InputStream or IRI (URL) -- but would override each other > > > > Nice to have: > > > > * Reusable (e.g. set options that are common, but modify > > source() to parse multiple files) > > * Thread-safe (e.g. immediate reuse of builder before a parsing session > > has started or finished) > > * Easy to serialize / keep around (should not keep references to big > > objects in memory uneccessarily) > > * Parsing in the background, e.g. return Future. (implementation manages > > threads/queues as it pleases) > > > > Unclear: > > > > * How to select between the multiple RDFParser instances? > > I was thinking the RDF interface could have a .makeParser() method, > > but not all parsers can take all formats, so possibly some kind of > > registry or "what can you handle" mechanism might be needed. > > > > > > How do you think we should proceed? Mutable or immutable? How should the > > settings be kept? Fields, map, or what? Does it make sense with an > > interface, abstract class (keeps settings) and implementations > > (processess settings), or should we have a single ParserFactory class > > and have a new internal interface below? > > > > > > -- > > Stian Soiland-Reyes > > University of Manchester > > http://www.esciencelab.org.uk/ > > http://orcid.org/0000-0001-9842-9718 > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > -- > Matt Sicker <boa...@gmail.com> > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition <https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459> JUnit in Action, Second Edition <https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021> Spring Batch in Action <https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory