On 01/02/2010 17:28, Tom Lane wrote:
Matteo Beccati<p...@beccati.com> writes:
My main concern is that we'd need to overcomplicate the thread detection
algorithm so that it better deals with delayed messages: as it currently
works, the replies to a missing message get linked to the
"grand-parent". Injecting the missing message afterwards will put it at
the same level as its replies. If it happens only once in a while I
guess we can live with it, but definitely not if it happens tens of
times a day.
That's quite common unfortunately --- I think you're going to need to
deal with the case. Even getting a direct feed from the mail relays
wouldn't avoid it completely: consider cases like
* A sends a message
* B replies, cc'ing A and the list
* B's reply to list is delayed by greylisting
* A replies to B's reply (cc'ing list)
* A's reply goes through immediately
* B's reply shows up a bit later
That happens pretty frequently IME.
I've improved the threading algorithm by keeping an ordered backlog of
unresolved references, i.e. when a message arrives:
1. Search for a parent message using:
1a. In-Reply-To header. If referenced message is not found insert its
Message-Id to the backlog table with position 0
1b. References header. For each missing referenced message insert its
Message-Id to the backlog table with position N
1c. MS Exchange Thread-Index and Thread-Topic headers
2. Message is stored along with its parent ID, if any.
3. Compare the Message-Id header with the backlog table. Update the
parent field of any referencing message and clean up positions >= n in
the references table.
Now I just need some time to do a final clean up and I'd be ready to
publish the code, which hopefully will be clearer than my words ;)
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers