On Jun 26, 12:30 pm, [EMAIL PROTECTED] wrote: > I wrote my own feed reader using feedparser.py but it takes about 14 > seconds to process 7 feeds (on a windows box), which seems slow on my > DSL line. Does anyone see how I can optimize the script below? Thanks > in advance, Bill > > # UTF-8 > import feedparser > > rss = [ > 'http://feeds.feedburner.com/typepad/alleyinsider/ > silicon_alley_insider', > 'http://www.techmeme.com/index.xml', > 'http://feeds.feedburner.com/slate-97504', > 'http://rss.cnn.com/rss/money_mostpopular.rss', > 'http://rss.news.yahoo.com/rss/tech', > 'http://www.aldaily.com/rss/rss.xml', > 'http://ezralevant.com/atom.xml' > ] > s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n' > > s += '<style>\n'\ > 'h3{margin:10px 0 0 0;padding:0}\n'\ > 'a.x{color:black}'\ > 'p{margin:5px 0 0 0;padding:0}'\ > '</style>\n' > > s += '</head>\n<body>\n<br />\n' > > for url in rss: > d = feedparser.parse(url) > title = d.feed.title > link = d.feed.link > s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n' > # aldaily.com has weird feed > if link.find('aldaily.com') != -1: > description = d.entries[0].description > s += description + '\n' > for x in range(0,3): > if link.find('aldaily.com') != -1: > continue > title = d.entries[x].title > link = d.entries[x].link > s += '<a href="'+ link +'">'+ title +'</a><br />\n' > > s += '<br /><br />\n</body>\n</html>' > > f = open('c:/scripts/myFeeds.htm', 'w') > f.write(s) > f.close > > print > print 'myFeeds.htm written'
I can 100% guarantee you that the extended run time is network I/O bound. Investigate using a thread pool to load the feeds in parallel. Some code you might be able to shim in: # Extra imports import threading import Queue # Function that fetches and pushes def parse_and_put(url, queue_): parsed_feed = feedparser.parse(url) queue_.put(parsed_feed) # Set up some variables my_queue = Queue.Queue() threads = [] # Set up a thread for fetching each URL for url in rss: url_thread = threading.Thread(target=parse_and_put, name=url, args=(url, my_queue)) threads.append(url_thread) url_thread.setDaemonic(False) url_thread.start() # Wait for threads to finish for thread in threads: thread.join() # Push the results into a list feeds_list = [] while not my_queue.empty(): feeds_list.append(my_queue.get()) # Do what you were doing before, replacing the for url in rss with for d in feedS_list for d in feeds_list: title = d.feed.title link = d.feed.link -- http://mail.python.org/mailman/listinfo/python-list