Marc G. Fournier wrote:

Now, if I knock off the LIKE, so that I'm returning all rows from ndict8, join'd to all the URLs that contain them, you get:

Can't you build seperate databases for each domain you want to index? Than you wouldn't need the like operator at all.


The like-operator doesn't seem to allow a very scalable production environment. And besides that point, I don't really believe a "record per word/document-couple" is very scalable (not in SQL, not anywhere).

Anyway, that doesn't help you much, perhaps decreasing the size of the index-tables can help, are they with OIDs ? If so, wouldn't it help to recreate them without, so you save yourselves 4 bytes per word-document couple, therefore allowing it to fit in less pages and by that speeding up the seqscans.

Are _all_ your queries with the like on the url? Wouldn't it help to create an index on both the wordid and the urlid for ndict8?

Perhaps you can create your own 'host table' (which could be filled using a trigger or a slightly adjusted indexer), and a foreign key from your url table to that, so you can search on url.hostid = X (or a join with that host table) instead of the like that is used now?

By the way, can a construction like (tablefield || '') ever use an index in postgresql?

Best regards and good luck,

Arjen van der Meijden



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to