Hal Murray <halmur...@sonic.net>: > > Eric said: > > Talk to me about what you think the effect of very occasional stop-the-world > > pauses of 600 microseconds or less would be on sync accuracy. By "very > > occasionally" let's say once every ten minutes or so, that being what I > > think > > is a *very* pessimistic estimate of GC frequency for a program with NTP's > > memory-usage pattern. > > Could you please say more? How did you get 600 microseconds? What > assumptions were you making?
You appear to have missed one of my earlier posts on this topic. There is a suite of tools called dragonboat ("A feature complete and high performance multi-group Raft library in Go." that has done careful measurement of STW pauses in Go 1.11 and Go 1.12 under its workload. Given the description of what it does has to be much tougher one than NTPsec's - much more packet volume, hence more GC churn. Take a look at https://github.com/lni/dragonboat If you go far enough down the page you'll find a graph https://github.com/lni/dragonboat/blob/master/docs/stw.png which shows their meassured STW pauses are bounded to about 95% by 600us and typically less than 400us. This is consistent with other reports I've seen, and that's why I took 600us as a worst case STW we're likely to see. > The real cost of using a GC is that we have to keep thinking about what it > might do, or if the code we want to write might change the assumptions used > to > conclude that an occasional GC was OK. Agreed. > If we get a sample that is off by enough to be interesting, is that because > the network was busy or the GC did its thing? There is a way we can spot possible latency spikes. We can query the runtime to get a timestamp of the last GC. One brutally simple way to prevent GC-induced latency spilkes from distorting time sync it to take timestam[s before and after each critical region, then check for an intervening GC and throw out the associated sample if we find one. > Suppose Eric got up on the other side of the bed one morning, and I proposed > using some new system that had a GC and waved my hands and claimed that it > wouldn't be a problem. I wouldn't be at all surprised if Eric claimed it was > ugly, not appropriate, and we should find something better. But I'm not just claiming it won't be a problem - I've already thought through multiple mitigation strategies in case it really is a problem. This plan didn't come out of nowhere; I started thinking and doing the research years ago. 1. Guard small critical regions by turning off GC. 2. Schedule GCs during quiet periods. 3. Detect when GC spikes might have collided with sample reads and throw out those samples. > I'm assuming the goal is more than just to convert our current code base to a > safe language. We also need to get the structure and environment right. An > environment with a GC seems like a bad start. It poses some challenges, but I think they are surmountable. And those challenges have to be put in context of whst's available in the way of safe languages. Good options for us are rather limited. > You haven't commented about my Rust vs Go question. I have, previously. I guess you missed the post where I explained that. > A friend commented that he might use Go over Rust because it would be easier > for others to pick up. Yes, that is an important factor. There is one other that is more important: Rust does not have a stable API, certainly not one that we can count on to be solid on decadal scales. Nor does it have the kind of development culture that is conducive to API stability over decades. It's a very young language, still in "move fast and break thing" mode. Go, on the other hand, has an ironclad forward-portability guarantee that its development culture takes very seriously. I think we need that. > I'm learning Rust, or trying to. I find it not-easy, but picking up new > languages is not one of my strong points. The type-checking is really picky. Picking up new languages *is* one of my strong points, yet I found Rust rebarbative in the extreme. This did nothing to make me optimistic about finding developers to work in it. On the other hand, we already have two developers expert in Go. > It doesn't do recv time stamps. As far as I can tell, there isn't a clean > way > to do that in Rust. I think it can be done in Go but will take some fancy dancing. I have this under acrtive investigation now. > Crazy thought dept... > > Assuming we want to us Go, can we split things up such that the timing > critical code runs in separate processes without GC? Maybe. But I know from previous experience that trying to make major changes to a program's architecture *while you're porting it to new language* is an invitation to disaster. The only strategy that works is to do a stupid, literal, unidiomatic port first, verify it, then clean it up and make it idiomatic. This keans that changes oof the kind you're proposing need to be on hold until we have a more or less literal translation of the present C code working. > I know how to split out the server side of ntpd. > > Suppose we come up with an API for refclocks. Would that, or something > similar also work for network servers? > > I think the timing critical code would be small enough that we could write it > in C and inspect it carefully. That may not be valid if we include the > crypto > stuff. > > Converting that that sort of code to Rust seems reasonable. I want to stay away from mixing languages if at all possible. The joints between them are always *serious* defect attractors and major sources of maintainence complexity. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel