[PERFORM] Many to many join seems slow?

Drew Wilson Tue, 15 May 2007 07:00:09 -0700

I'm trying to debug a query that gets all the French translations forall US string values. Ultimately, my goal is to rank them all by editdistance, and only pick the top N.

However, I cannot get the basic many-to-many join to return all theresults in less than 3 seconds, which seems slow to me. (Mycompetition is an in-memory perl hash that runs on client machinesproviding results in around 3 seconds, after a 30 second startup time.)


The simplified schema is :
        source ->> translation_pair <<- translation

The keys are all sequence generated oids. I do wonder if theperformance would be better if I used the string values as keys toget better data distribution. Would this help speed up performance?


There are 159283 rows in source
There are 1723935 rows in translation, of which 159686 are French

=# explain SELECT s.source_id, s.value AS sourceValue, t.value AStranslationValue

      FROM
          source s,
          translation_pair tp,
          translation t,
          language l
      WHERE
          s.source_id = tp.source_id
          AND tp.translation_id = t.translation_id
          AND t.language_id = l.language_id
          AND l.name = 'French' ;

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

Merge Join  (cost=524224.49..732216.29 rows=92447 width=97)
   Merge Cond: (tp.source_id = s.source_id)
   ->  Sort  (cost=524224.49..524455.60 rows=92447 width=55)
         Sort Key: tp.source_id
         ->  Nested Loop  (cost=1794.69..516599.30 rows=92447 width=55)

-> Nested Loop (cost=1794.69..27087.87 rows=86197width=55)-> Index Scan using language_name_key on"language" l (cost=0.00..8.27 rows=1 width=4)

                           Index Cond: ((name)::text = 'French'::text)

-> Bitmap Heap Scan on translation t(cost=1794.69..25882.43 rows=95774 width=59)Recheck Cond: (t.language_id =l.language_id)-> Bitmap Index Scan ontranslation_language_l_key (cost=0.00..1770.74 rows=95774 width=0)Index Cond: (t.language_id =l.language_id)-> Index Scan using translation_pair_translation_idon translation_pair tp (cost=0.00..5.67 rows=1 width=8)

                     Index Cond: (tp.translation_id = t.translation_id)

-> Index Scan using source_pkey on source s(cost=0.00..206227.65 rows=159283 width=46)

(15 rows)

I'm running Postgres 8.2.3 on latest Mac OSX 10.4.x. The CPU is a3Ghz Dual-Core Intel Xeon, w/ 5G ram. The drive is very fast althoughI don't know the configuration (I think its an XRaid w/ 3 SAS/SCSI70G Seagate drives).


The regular performance configurable values are:
work_mem           32MB
shared_buffers     32MB
max_fsm_pages      204800
max_fsm_relations  1000


Thanks for any advice,

Drew

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[PERFORM] Many to many join seems slow?

Reply via email to