If we stored the actual queries and the EXPLAIN ANALYZE results (when generated) in the database, what would be the purpose of the node_name, db_object, and condition_detail columns? They don't seem like they would be useful for statistical analysis, and it seems like the information would be more useful in context. Are these column really needed?
For a given node_type, are there mutiple valid condition_type values? If so, I need to modify my python script to capture this. If not, I don't see a need to store it. -Kevin >>> Josh Berkus <josh@agliodbs.com> >>> [snip] so, for example, the query step: " -> Seq Scan on detail0009 (cost=0.00..20500.11 rows=26 width=1005) (actual time=453.000..5983.000 rows=53588 loops=1)" " Filter: ((txcontenttype ~~* '%html%'::text) AND ((vchost)::text ~~* '%www.%'::text))" Could be displayed as: query_instance 12008 step_id 14701 parent_step 14698 node_name Seq Scan on detail0009 node_type Seq Scan cost_start 0 cost_end 20500.11 est_rows 26 time_start 453.0 time_end 5983.0 actual_rows 53588 loops 1 db_object detail009 condition_type Filter condition_detail ((txcontenttype ~~* '%html%'::text) AND ((vchost)::text ~~* '%www.%'::text)) By collecting all of this data, you make it possible to perform other sorts of analysis on the cost estimates. For example, statistical analysis might tell us that 3-or-more-condition filters take significantly longer to execute than single-condition filters, which would be important to know for the cost model. Limiting it to collecting only 3 of the 13 bits of node data produced by EA would very much limit the usefulness of the tool and the reliability of its statistics. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster