I was not hoping that indexing, per se, would help me. In fact, indexing smiles would be of virtually no use to me, except for exact matches, e.g. where smiles = 'CCCOC'; I was only trying to subvert the use of indexing for my own purposes, to store the parsed smiles somewhere automatic for the sql user, yet transparently available to my functions for quick searches.
I think I've thought about this enough and gotten enough advice to realize I should do this the straightforward way. I should store the parsed smiles in a separate column, have a trigger to keep it up to date, and require the user to pass me the parsed_smiles column for quick searches. And the user could maintain the parsed_smiles in a separate table, if he so desired, with foreign key relations.
Thanks to everyone for all your advice. This is my first postgresql project and I'm liking what I've seen so far.
TJ
Tom Lane wrote:
"TJ O'Donnell" <[EMAIL PROTECTED]> writes:
The only type of search will be of the type:
Select smiles,id from structure where oe_matches(smiles,'c1ccccc1C(=O)N');
You haven't really said much about how you expect an index to be able to help you with this, but I think if any index type can help you it will be GiST. What you would do is define an operator on top of the oe_matches function, so that the above query is written say
Select smiles,id from structure where smiles ~~ 'c1ccccc1C(=O)N';
and then construct a GiST operator class that accepts ~~ as an indexable operator. There's not a huge amount of plain-old-documentation about GiST but there are quite a few examples available in the contrib/ tree.
I don't think you can completely hide the existence of the parsed version of the smiles data. The easiest way to go at it would be to write the queries like
Select smiles,id from structure where smiles_parsed ~~ 'c1ccccc1C(=O)N';
where smiles_parsed is the extra column holding the parsed data, and the ~~ operator is grabbed by a GiST index over that column.
Plan B would be to construct the index as a functional index and write
Select smiles,id from structure where parsed(smiles) ~~ 'c1ccccc1C(=O)N';
However plan B doesn't readily support applying any other operations to the parsed data, since it doesn't exist anywhere except inside the index. Since you mentioned having other things you wanted to do with it, I think you'll end up wanting the separate column.
regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly