On Mon, 17 Feb 2003, Terry Lambert wrote:
> First, I just have a slight editorial comment, about cheating on
> Polygraph.
Terry,
This is not the place to start a long discussion about our
Polygraph testing methodology, but I have to say, with all due
respect, that many of your statements are either misleading or based
on misinformation about Web Polygraph and the way standard tests are
executed. I have to respond because I both love and understand cache
benchmarking. I apologize to the majority of the audience for what may
be considered an out-of-scope thread.
> One issue I have with Polygraph is that it intentionally works for a
> very long time to get worst case performance out of caches;
> basically, it cache-busts on purpose. Then the test runs.
This is plain wrong. I assume that you are referring to PolyMix
workloads that have a filling-the-cache phase and measurement phases.
Filling the cache phase does not bust the cache. Its primary purpose
is to bring cache's storage to a steady state (hopefully). If you
tested many caches, including Squid, then you know that cache
performance "on an empty stomach" often differs from sustained
performance by 50%. Since we must start from scratch, we must pump
enough data to approach steady state.
You might have been misinformed that all the fill objects are used
during the measurement phases; this is not true. Polygraph keeps the
size of the working set constant. That size is usually much smaller
than the amount of traffic during the fill phase. Again, the fill
phase is there to reach a steady state after you start with an empty
disk.
> This seems to be an editorial comment on end-to-end guarantees, much
> more than it seems a valid measurement of actual cache performance.
Not sure what end-to-end guarantees you are referring here.
> If you change squid to force a random page preplacement, then you
> end up with a bounded worst case which is a better number than you
> would be able to get with your best (in terms of the real-world
> performance) algorithm (e.g. LRU or whatever), because you make it
> arbitrarily hard to characterize what that would be.
Random page replacement should not cause better performance, Polygraph
simulates hot subsets (aka flash crowds), which you would not be able
to take advantage of if you replace randomly. Also, random replacement
will lose partial advantages of temporal locality that Polygraph also
simulates (e.g., same HTML containers have same images).
> NetApp has a tunable in their cache product which might as well be
> labelled "get a good Polygraph score"; all it does is turn on random
> page replacement, so that the Polygraph code is unable to
> characterize "what would constitute worst case performance on this
> cache?", and then intentionally exercise that code path, which is
> what it would do, otherwise (i.e. pick a working set slightly larger
> than the cache size so everythings a miss, etc.).
I am unaware of any tunables of that kind. Moreover, I suspect they
simply would not work (see above). Are you a rich? If not, you may
want to sell a proof of the above to NetApp competitor. I, myself,
would be very interested to hear it as well. Keep in mind that NetApp
and most other vendors use Polygraph for day-to-day regression tests
so they are interested in making the tests realistic.
Also, offered Polygraph traffic does not depend on cache performance.
Polygraph code does not "characterize" anything run-time, at leat not
during PolyMix tests.
> Basically, most of the case numbers are 99.xx% miss rates. With
> this modification, that number drops down to closer to 80%.
Actually, the measured miss ratio is usually about 50% (hit rate of
50+%), which is quite realistic. Offered hit ratio is about 55%. Byte
hit ratio is lower. Not sure where you got 99 or 80% numbers. See
cache-off results for true values.
> That's kind of evil; but at least it's a level playing field, and
> we can make a FreeBSD-specific patch for SQUID to get better numbers
> for FreeBSD. 8-) 8-).
I would not encourage you to cheat, even if there is a way. I would
recommend that you suggest ways to improve the benchmark instead.
Chances are, Polygraph can already do what you want.
> > > options MAXFILES=16384
> > > options NMBCLUSTERS=32678
>
> These I understand, though I think they are on the low end.
We have never run out of related resources with these settings during
a valid test. Keep in mind that we have to keep the number of open
concurrent HTTP connections below 5-8K to get robust performance given
PolyMix burstiness and other factors.
> > > options HZ=1000
>
> This one, I don't understand at all. The web page says it's for faster
> dummynet processing. But maybe this is an artifact of using NETISR.
This setting is a must-have if you use dummynet. We did not invent it,
it was suggested by the dummynet author himself, and it did solve
performance problems we experienced with s