On Thu, Aug 20, 2020, 05:45 Lucas Nussbaum <lu...@debian.org> wrote:

> Hi Asheesh,
>

Hi! :)


>
> I think that the changes compared to the current table structure should
> be minimized, to avoid rewrite all tools that use this data.
> Improvements are welcomed of course, but please don't make changes if
> there's no good reason for them.
>

Good call. I'll prioritize that.


> Did you confirm with DSA that parsing the online list archives is the
> preferred way to go? I fear that we will hit some HTTP rate limiting at
> some point and will have to reconsider the implementation.
>

I haven't yet! I can do so. I will try to optimize the current approach
first since I'm enthusiastic about it, but good call on checking with DSA.


> How optimized is your code for running every few minutes? Ideally we
> would like near-real-time updates of this data, we will require polling
> the list archives (previously, email was received directly on
> ullmann.debian.org via a special email address)
>

It's a good question. Let me update you about that once I've optimized
further. I think I can get down to one HTTP call at start when nothing
changes (mailing list index page) and down to 2 (index page plus message
page) if there is a change.

Running every 2 min (say) would mean 24*30 = 720 requests per day, which
seems well below any rate limit I can think of, but obviously 0 unnecessary
requests is nicer. It's a good topic to discuss with DSA, and I can do that.

Even if the inbound email is used for fresh data, historic data needs to
come from somewhere. I think the email archives on the web are a good place
to import those, based on my preference to develop in a context that
doesn't require any special setup.

Hope you're doing well!

Asheesh.

>

Reply via email to