Hey Manos, 2011/3/16 Manos Karpathiotakis <m...@di.uoa.gr>
> Let me explain a few things about our dataset. We are using a system named > Sesame [1] that stores and queries RDF data. In our case, it uses Postgres > as a relational backend. In RDF, data are triples. Here is an example of an > RDF triple: > > ex:Postgres rdf:type ex:RDBMS > > Triples are stored in Postgres. We can use two storing schemes. > > A monolithic scheme where every triple is stored in 1 huge table: > Triple(int subject, int predicate, int object) and some additional > information is stored in other tables of the form dictionary(int id, string > original_value). This results in a schema with approximately 10 tables. > Have you considered to use hstore in this case? http://www.postgresql.org/docs/9.0/static/hstore.html > > A per-predicate scheme can also be used. This storing scheme creates a > table for every distinct predicate. For example, to store the aforementioned > triple, we would create a table type(int subject, int object) and we would > insert a tuple with the encoded values for ex:Postgres and ex:RDBMS. > > Queries for RDF data can be expressed in the SPARQL query language. Sesame > translates SPARQL queries to SQL queries depending on the storing scheme > being used. So, you can imagine that when we use the monolithic storing > scheme, queries would be translated to an SQL query with many self-joins on > a huge triple table. On the other hand, if we use the predicate schema, > SPARQL queries are translated to many joins between smaller tables. > > In our case, we want to store 111M triples (and we would like to experiment > with even datasets) that consists of approximately 10.000 distinct > predicates. This means that when we would strongly prefer to use the > per-predicate storing scheme to have faster query execution. However we are > not experienced in tuning Postgres for this kind of data. > > Until now, we disabled WAL, disabled autocommit, increased shared buffers > to 512mb, temp buffers to 64mb but we haven't tried disabling indices and > foreign key constraints because it would require some code restructuring (I > understand however the performance increase that we would get if we disabled > them). > > We are currently loading the dataset in chunks that perform 24.000.000 > insertions to the database. > > Could you suggest some values for shared buffers, temp > buffers, maintenance_work_mem, checkpoint_segments or other relevant > parameters that we could use as a starting point? > > Best Regards, > Manos Karpathiotakis > > [1] http://www.openrdf.org/ > > > On Tue, Mar 15, 2011 at 7:15 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Manos Karpathiotakis <m...@di.uoa.gr> writes: >> > Hi all, >> > I am using postgres 9.0 under CentOS 5.5 (Dual Xeon Quad Core @ 2,44 >> GHz, >> > 64GB RAM, 2TB RAID 5). In my case, postgres is used as a backend for the >> RDF >> > store Sesame. >> > I am trying to store a dataset that consists of approximately >> 900.000.000 >> > insertions (organized in either 10 tables, or in an arbitrary number of >> > tables exceeding 2000). Loading the data in 10 tables takes about two >> days, >> > while in the 2nd case, an org.postgresql.util.PSQLException: ERROR: out >> of >> > shared memory error is thrown. >> >> I'm guessing that you're seeing this through some client-side code that >> helpfully suppresses the HINT about raising max_locks_per_transaction >> :-( >> >> If your application tries to touch all 2000 tables in one transaction, >> you will need to raise that parameter to avoid running out of locktable >> space for the AccessShareLocks it needs on all those tables. >> >> However ... generally speaking, I'd guess that whatever you did to >> refactor 10 tables into 2000 was a bad idea --- one table with an extra >> key column is almost always a better design than N basically-identical >> tables. The latter will be a *lot* harder to use, and probably won't >> fix your performance problem anyway. You should attack the performance >> problem in a different way. Have you read >> http://www.postgresql.org/docs/9.0/static/populate.html ? >> >> regards, tom lane >> > > > > -- > =================================================== > Manos Karpathiotakis > National & Kapodistrian University of Athens > Department of Informatics & Telecommunications, Office B25 > Management of Data & Information Knowledge Group > Panepistimioupolis, Ilissia > GR-15784 Athens, Greece > Tel: +30.210.727.5159 > Fax: +30.210.727.5214 > e-mail: m...@di.uoa.gr > =================================================== > > -- // Dmitriy.