On Thu, Aug 20, 2020, 05:45 Lucas Nussbaum <lu...@debian.org> wrote: > Hi Asheesh, >
Hi! :) > > I think that the changes compared to the current table structure should > be minimized, to avoid rewrite all tools that use this data. > Improvements are welcomed of course, but please don't make changes if > there's no good reason for them. > Good call. I'll prioritize that. > Did you confirm with DSA that parsing the online list archives is the > preferred way to go? I fear that we will hit some HTTP rate limiting at > some point and will have to reconsider the implementation. > I haven't yet! I can do so. I will try to optimize the current approach first since I'm enthusiastic about it, but good call on checking with DSA. > How optimized is your code for running every few minutes? Ideally we > would like near-real-time updates of this data, we will require polling > the list archives (previously, email was received directly on > ullmann.debian.org via a special email address) > It's a good question. Let me update you about that once I've optimized further. I think I can get down to one HTTP call at start when nothing changes (mailing list index page) and down to 2 (index page plus message page) if there is a change. Running every 2 min (say) would mean 24*30 = 720 requests per day, which seems well below any rate limit I can think of, but obviously 0 unnecessary requests is nicer. It's a good topic to discuss with DSA, and I can do that. Even if the inbound email is used for fresh data, historic data needs to come from somewhere. I think the email archives on the web are a good place to import those, based on my preference to develop in a context that doesn't require any special setup. Hope you're doing well! Asheesh. >