Hi Marti, Thanks, this is exactly the kind of feedback I was looking for. I am already storing the whole XML in a payload table actually. My problem is, the queries are actually created in a domain specific langauge, and then they are transformed to SQL. There is a no way of knowing what kind of queries would be run over the XML docs, so I'd be creating indices over and over with each incoming new query. Still, I may be able to find a path in the middle, maybe using a combination of the XML path based index and row based representation. Thaks for the pointers to relevant index types.
Finally, ltree may be an alternative to xpath based indices, but I don't know if that would be faster. The database may need to go beyond 100 milion rows, and I'm not sure what would happen then with a row based representation. I'll probably generate dummy data and compare performance of the options. Kind regards Seref On Fri, May 18, 2012 at 10:55 AM, Marti Raudsepp <ma...@juffo.org> wrote: > On Thu, May 17, 2012 at 8:40 PM, Seref Arikan <serefari...@gmail.com> > wrote: > > Is there a glaring error in my approach? Should I be better off with > another > > SQL query, or Ltree/XPATH queries? > > For the particular query you posted, I would suggest the following indexes: > > (rm_type_name, payload_id, parent_feature_mapping_id) > And maybe: > (rm_type_name, feature_name, payload_id, parent_feature_mapping_id) > > But overall, storing a hierarchical XML structure as rows in a table > might not be the best approach. If performance is problematic, you > might consider storing whole XML documents -- or fragments -- in an > xml field and create expression indexes for the queries that you need, > possibly with GIN/GiST. > > Now I haven't needed to do this myself, so what follows is just me > trying out stuff to give you some ideas and certainly not "best > practice" -- there are lots of different indexing strategies and > different ways to do this. > > For example: > > CREATE TABLE foo (doc_id serial primary key, doc xml not null); > CREATE INDEX foo_doc_id_exists_root_element_test ON foo (doc_id) WHERE > xpath_exists('/root/element[text()="test"]', doc); > CREATE INDEX foo_root_element_text_gin ON foo USING > gin((xpath('/root/element/text()', doc)::text[])); > > To find documents which have <element>test</element>, using the above > indexes: > > # explain analyze select * from foo where > xpath_exists('/root/element[text()="test"]', doc); > Bitmap Heap Scan on foo (cost=3.33..450.22 rows=4311 width=36) > (actual time=0.025..0.026 rows=1 loops=1) > Recheck Cond: xpath_exists('/root/element[text()="test"]'::text, > doc, '{}'::text[]) > -> Bitmap Index Scan on foo_doc_id_exists_root_element_test > (cost=0.00..2.26 rows=4311 width=0) (actual time=0.014..0.014 rows=1 > loops=1) > Total runtime: 0.067 ms > > # explain analyze select * from foo where > (xpath('/root/element/text()', doc)::text[]) @> array['test']; > Bitmap Heap Scan on foo (cost=8.50..105.51 rows=65 width=32) (actual > time=0.025..0.025 rows=1 loops=1) > Recheck Cond: ((xpath('/root/element/text()'::text, doc, > '{}'::text[]))::text[] @> '{test}'::text[]) > -> Bitmap Index Scan on foo_root_element_text_gin > (cost=0.00..8.49 rows=65 width=0) (actual time=0.020..0.020 rows=1 > loops=1) > Index Cond: ((xpath('/root/element/text()'::text, doc, > '{}'::text[]))::text[] @> '{test}'::text[]) > Total runtime: 0.046 ms > (5 rows) > > The GIN index lets you search for documents that have both "test" and > "testing": > (xpath('/root/element/text()', doc)::text[]) @> array['test','testing']; > > Regards, > Marti >