On 10-Sep-21 08:36, Victoria Risk wrote: > > >> On Sep 10, 2021, at 7:24 AM, Timothe Litt <l...@acm.org >> <mailto:l...@acm.org>> wrote: >> >> Clearly map format solved a big problem for some users. Asking >> whether it's OK to drop it with no statement of what those users >> would give up today is not reasonable. >> > Actually, we are not sure there ARE any users. In fact, the one > example I could come up with was Anand, who has replied to the list > that he is in fact NOT using map zone. I should have asked directly - > is anyone on this list USING MAP ZONE format? > Well, if the answer is "no one", that simplifies matters :-)
I do remember that startup time was a big issue before map came out, and that the complaints subsided thereafter. No personal knowledge as to whether that was cause and effect or a realignment of the planets. In general, I don't look to Astrology for answers :-) >> After all the "other improvements in performance" that you cited, >> what is the performance difference between map and the other formats? > > I don’t know that, to be honest. We don’t have the resources to > benchmark everything. Maybe someone on this list could? We would also > like to be able to embark on a wholesale update to the rbtdb next year > and this is the sort of thing that might complicate refactoring > unnecessarily. IIRC, when I did some work on the stats channel & was concerned with scalability, Evan said that you keep some large datasets (1M+zones) around for testing and produced some numbers for that. So it ought to be possible to get some basic data. I'm not suggesting a full benchmarking campaign -but one or two datapoints are a lot better than none. E.g. If there's no difference with 1 or 10M zones with, say, 10K records each, it's pretty clear that map's time is past. If it's orders of magnitude faster (and it's used), it's not. I don't remember - did your user survey ask about how many/how large zones people serve? I vaguely think so, but it's been a while... >> For a case which took 'several hours' before map was introduced, what >> would the restart time be for named if raw format was used now? >> >>> If I knew that I would have said. 'Raw’ was much faster than the >>> text version. Map was faster than raw. Raw is apparently not a >>> problem to maintain. I believe the improvement with raw was ~3x. >>> > I think the questions are: (a) is startup time an issue (however it's solved)?, (b) if so, is map format the solution? (c) If it is and people are using it, what would the consequences be to them if it went away? (d) If it is, and people aren't using it - is the documentation too scary (as Anand said it is for him)? >> It's pretty clear to me that if map format saves a few seconds in the >> worst case, it's not worth keeping. If it saves hours for large >> operators, then the alternative isn't adequate. Maybe "map" isn't >> the answer - how might 'raw' compare to a tuned database back end? >> (Which has other advantages for some.) What if operators specified a >> priority order for loading zones? Or zones were loaded on demand >> during startup, with low activity zones added as a background task? >> Or??? > > Well, back when we added map zone format, startup time was a major > pain point for some users. Now, it seems as though large operators are > updating their zones all the time (also updating RPZ feeds) and > efficiency in transfers seems to be a bigger issue. > What I was getting as is how hard the definition of "startup time" is. Time to serving all zones? Important zones? Is it OK for responses to be slow during startup, or is startup only complete when responses are at nominal speed? I wonder if this comes from large operators using a database(DLZ) back end. Database developers tend to have a single-minded focus on performance, and direct updates are probably faster than going thru named & its generalized authentication/validation. Plus, depending on how you set up your server architecture, DB replication can replace DNS zone transfers. > We don’t have any direct data on what features are being used, we can > only judge based on complaints we receive via bug tickets or posts on > this list. You did a survey a while back... >> >> A fair question for users would be what restart times are acceptable >> for their environment - obviously a function of the number and >> size/content of zones. And is a restart "all or nothing", or would >> some priority/sequencing of zone availability meet requirements? >> > That is a good question. Can you answer it for yourself? Sure. I'm not a large operator, but I've always thought big and implemented smaller. About 350 zones, 2 real views and 2 static-stub recursive views. 50-a couple of hundred records/zone - not counting the DNSSEC signatures & overhead that named generates. ~10 servers. Plus a 3rd party backup service. Anything under a minute is a reasonable startup time for named - though most of my servers are underpowered. (e.g. RPi class machines with USB disks that sleep a lot.) Two minutes is tolerable. Longer than that, I'd have issues. If I were a larger operator and had to choose, I'd prioritize external views so that key services (e.g. e-mail, webservers, vpns,...) aren't seen to be slow/down. The internal network has plenty of redundancy & tolerance for slow resolution. The external views are smaller, with fewer servers. Another priority would be zones for which a server is primary, since it's required for updates. If I were a DNS provider/registrar, I'd guess that of the (hopefully) millions of zones that I sold, only a few actually get a lot of traffic. So a scheme where historical query stats drove reload order would be attractive. And since I'd sell SLAs, prioritizing the higher-paying customers would be good business. Of course, none of that matters if reload times are small enough to cover expected outage durations with an affordable number of servers. The key would be the downtime on the database primaries (masters) - that would prevent my customers from activating/updating their zones. And a reason for a database back-end rather than named-managed files - since DB persistence, consistency, and replication are solved problems in that world. Since you're lucky to get through to a (competent technical) help desk in 10s of minutes, a total downtime (meaning rebooting a server thru named serving at least key/zones and updates) on the order of 15 minutes is probably the outer limit. That's a thumb-in-the-air number, not science. Hope this helps. > > Thank you! > > Vicky > >
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users