Hi Ritesh, Jena could store triples in NQuadsInputFormat that is an HadoopInputFormat so that you can read data in effiient way with Flink. Unfortunately I rembember that I had some problem usign it so I just export my Jena model as NQuads so then I can parse it efficiently with Flink as a text file. However the parsing with sesame 4 is more efficient in terms of speed and garbage collection.
What I do is to convert every quad into a tuple5, group triples/quads by subject and then apply some logic. The quads grouped by subject is what we call "entiton atom" and combining them leads to an "entiton molecule" (i.e. a graph rooted in some entiton atom). We presented our work at FlinkForward 2015 in Berlin: http://www.slideshare.net/FlinkForward/s-bartoli-f-popmermaier-a-semantic-big-data-companion If you need some code that reads the nquads with Flink I can give you some code, just write me in private! Best, Flavio On Wed, Apr 6, 2016 at 3:57 PM, Ritesh Kumar Singh < riteshoneinamill...@gmail.com> wrote: > Hi Flavio, > > 1. How do you access your rdf dataset via flink? Are you reading it as > a normal input file and splitting the records or you have some wrappers in > place to convert the rdf data into triples? Can you please share some code > samples if possible? > 2. I am using Jena TDB command line utilities to make queries against > the dataset in order to avoid java garbage collection issues. I am also > using Jena java APIs as a dependency but command line utils are way faster > (Though it comes with an extra requirement to have Jena command line utils > installed in the system). Main reason for this approach being able to pass > the string output from the command line to Flink as part of my pipeline. > Can you tell me your approach to this? > 3. Should I dump my query output to a file and then consume it as a > normal input source for Flink? > > > Basically, any help regarding this will be helpful. > > Regards, > Ritesh > > > > Ritesh Kumar Singh > [image: https://]about.me/riteshoneinamillion > > <https://about.me/riteshoneinamillion?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> > > On Wed, Apr 6, 2016 at 2:45 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Ho Ritesh, >> I have sone experience with Rdf and Flink. What do you mean for accessing >> a Jena model? How do you create it? >> >> From my experience reading triples from jena models is evil because it >> has some problems with garbage collection. >> On 6 Apr 2016 00:51, "Ritesh Kumar Singh" <riteshoneinamill...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I need some suggestions regarding accessing RDF triples from flink. I'm >>> trying to integrate flink in a pipeline where the input for flink comes >>> from SPARQL query on a Jena model. And after modification of triples using >>> flink, I will be performing SPARQL update using Jena to save my changes. >>> >>> - Are there any recommended input format for loading the triples to >>> flink? >>> - Will this use case be classified as a flink streaming job or a >>> batch processing job? >>> - How will loading of the dataset vary with the input size? >>> - Are there any recommended packages/ projects for these type of >>> projects? >>> >>> Any suggestion will be of great help. >>> >>> Regards, >>> Ritesh >>> https://riteshtoday.wordpress.com/ >>> >> >