This really isn't the fault of the "feedparser" module, but it's worth mentioning.
I have an application which needs to read each new item from a feed as it shows up, as efficiently as possible, because it's monitoring multiple feeds. I want exactly one copy of each item as it comes in. In theory, this is easy. Each time the feed is polled, pass in the timestamp and ID from the previous poll, and if nothing has changed, a 304 status should come back. Results are spotty. It mostly works for Reuters. It doesn't work for Twitter at all; Twitter updates the timestamp even when nothing changes. So items are routinely re-read. (That has to be costing Twitter a huge amount of bandwidth from useless polls.) Some sites have changing feed etags because they're using multiple servers and a load balancer. These can be recognized because the same etags will show up again after a change. Items can supposedly be unduplicated by using the "etag" value. This almost works, but it's tricker than one might think. On some feeds, an item might go away, yet come back in a later feed. This happens with news feeds from major news sources, because they have priorities that don't show up in RSS. High priority stories might push a low priority story off the feed, but it may come back later. Also, every night at 00:00, some feeds like Reuters re-number everything. The only thing that works reliably is comparing the story text. John Nagle -- http://mail.python.org/mailman/listinfo/python-list