Hello, this week I worked on optimizations of algorithms. One which building full tree of email threads from (possible incomplete) transitive closure of directly acyclic graph. I changed way how was used topological sorting and now it should properly handle email threads where are some missing in-reply-to headers, but references was present. I changed code to use arrays when possible instead hashes which should speed up some operations on bigger threads. Also now there are no more warnings reported about using uninitialized values. I fixed problems when emails threads could have possible loops (normally it should not happen, but somebody can generate emails which contains loop in in-reply- to or references headers). Next I optimized SQL code which is responsible in cgi script for generating root of trees. Because debian-user ML contains now about 740 000 mails, original SQL code which used B trees for ordering started to be slow. It was quite fast for database with 100 000 - 300 000 mails, but not for 800 000. To fix this problem I created new SQL table for caching needed information together with having indexes on ordering columns. This allowed me to simplify and speed up select statements for selecting and ordering roots of email trees at cost of inserting/updating more rows when adding new email to database. Basically now generating html pages (from cgi script) for debian-user takes about 0.1s, before it was more than one second. I think this is good enough.
-- Pali Rohár [email protected]
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Soc-coordination mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/soc-coordination
