Hello Mark and Tom, in the past some big data backends were implemented but
not distributed in official release, such as Big Data Triple Store, now
Blazegraph [1], Titan [2] or Accumulo Graph [3]. Personally I tested this
last integration option, and it worked fine, although I didn't any
benchmark.
To build Marmotta with a different backend, you need build it with the
corresponding profile in maven configurator and set the specific properties.
Hoping that I've given you useful information,
regards,
Raffaele.

[1] https://www.blazegraph.com/
[2] http://thinkaurelius.github.io/titan/
[3] https://github.com/JHUAPL/AccumuloGraph




2015-09-18 21:37 GMT+02:00 Mark Breedlove <m...@dp.la>:

>
> Hello, Marmotta Users,
>
> At the Digital Public Library of America, we have a large Marmotta
> triplestore, with which we interact entirely over LDP.
>
> We're looking for some advice about scaling Marmotta's LDP interface past
> our current size. In the short term, we are hoping that we can find ways to
> tune PostgreSQL to mitigate some problems we have seen; in the long term,
> we are open to advice about alternate backends.
>
> A high-level overview of how we interact with our LDP Resources is
> documented in [1].  While we have had to do some LDP-specific tuning
> (especially introducing a partial index on `triples.context`) for all
> processes, we have seen particular trouble in cases where we GET,
> transform, then PUT an LDP RDFSource (see: *Enrichment *in the overview
> link).
>
> That overview is part of a greater wiki that we've put together to
> document our installation and performance-tuning activities [2].
>
> Our biggest problem at the moment is addressing slow updates and inserts
> [3], observed when we GET and PUT those RDFSources with two concurrent
> mapping or enrichment activities. If we run one of these activities,
> GETing, transforming, and PUTing in serial, performance seems to be network
> and CPU bound, and is not very bad. But as soon as we run a second mapping
> or enrichment, work performed grinds practically to a halt, as described in
> [3].
>
> To give you a sense of the scale at which we're operating, we have about
> two million LDP-RSs, typically including about 50 triples and a handful of
> blank nodes (around 5 to 15). Our `triples` table has about 294M rows now
> and takes up 32GB for the table, and 13GB each for its two largest indices.
> Our entire Marmotta database takes up about 140GB. We've had some successes
> with improving index performance with low cardinality in `triples.context`
> [4] and tuning the Amazon EC2 instances that we run on [5][6]. The I/O wait
> problem with concurrent LDP operations, however, is the new blocker.
>
> Some supplemental information:
>
> * An overview of the project for which Marmotta is being
>   used:
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Heidrun
>
> * The application (a Rails engine) that makes all of these LDP requests:
>   https://github.com/dpla/KriKri
>
> * Our configuration-management project, with details on how some of our
>   stack is configured: https://github.com/dpla/automation
>
> We'd be grateful for any feedback that you might have that would assist us
> with handling large volumes of data over LDP. Thanks for your help!
>
> - Mark Breedlove and Tom Johnson,
>   Digital Public Library of America (http://dp.la/)
>   t...@dp.la
>
>
> [1]
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/LDP+Interactions+Overview
> [2]
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Marmotta
> [3]
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Addressing+slow+updates+and+inserts
> [4]
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Index+performance+with+high+context+counts
> [5]
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Amazon+EC2+adjustments
> [6]
> https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Using+irqbalance+and+SMP+IRQ+affinity
>

Reply via email to