On 30/04/2019 00:23, nathan tech wrote: > The results were as follows: > > tim( a url): 2.9 seconds > > tim(the downoaded file(: 1.8 seconds > > > That tells me that roughly 1.1 seconds is network related, fair enough.
Or about 30% of the time. Since the network element will increase as data size increases as will the parse time it may be a near linear relationship. Only more extensive tests would tell. > entire thing again, they all say use ETAG and Modified, but my feeds > never, have them. > > I've tried feeds from several sources, and none have them in the http > header. Have you looked at the headers to see what they do have? > To that end, that is why I mentioned in the previous email about .date, > because that seemed the most likely, but even that failed. Again you tell us that something failed. But don't say how it failed. Do you mean that date did not exist? Why did you think it would if you had already inspected the headers? Can you share some actual code that you used to check these fields? And sow us the actual headers you are reading? > 1, download a feed to the computer. > > 2. Occasionally, check the website to see if the donloaded feed is out > of date if it is, redownload it. Seems a good plan. You just need to identify when changes occur. Even better would be if the sites provided a web API to access the data programmatically, but of course few sites do that... > I did think about using threading for this, for example: > user sees downloaded feed data only, in the background, the program > checks for updates on each feed, and the user may see them gradually > start to update. > > This would work, in that execution would not fail at any time, but it > seems... clunky, to me I suppose? And rather data jheavy for the end > user, especially if, as you suggest, a feed is 10 MB in size. Only data heavy if you download everything. If you only do the headers and you only have a relatively few feeds its a good scheme. As an alternative is there anything in the feed body that identifies its creation date? Could you change your parsing mechanism to parse the data as it arrives and stop if the date/time has not changed? That minimises the download data. > Furthering to that, how many threads is safe? You have a lot of I/O going on so you could run quite a few threads without blocking issues. How many feeds do you watch? Logic would say have one thread per feed. But how real time does this really need to be? Would it be terrible if updates were, say 1 minute late? If that's the case a single threaded solution may be fine. (and much simpler) I'd certainly focus on a single threaded solution initially. Get it working first then think about performance tuning. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor