I used a simple method for analysing the working set of URLs in our
logs. I took the md5 sum of the URLs and bucketed them in hour
groupings. Then I compared periods of time. I first compared a one
hour period to the previous hour and doubled the periods up to 16
hours. Below is the first few lines of the command. I also, put in a
cut off for the max document size (16MB).
I was surprised to see that the set of unique URLs don't overlap more
then they do.
*one hour:
*[bcall@d5 ~]$ ./traffic_workingset.pl -t 1 -d 16777216 post2
hour: 2837 objects: 5, bytes: 21469
hour: 2838 objects: 30809, bytes: 355626853
overlap with previous period - objects: 5, total bytes: 21469,
percent: 0
hour: 2839 objects: 24154, bytes: 296768548
overlap with previous period - objects: 7289, total bytes:
95066185, percent: 32
hour: 2840 objects: 20250, bytes: 246836984
overlap with previous period - objects: 6005, total bytes:
88160038, percent: 35
hour: 2841 objects: 17631, bytes: 208717157
overlap with previous period - objects: 5177, total bytes:
68819734, percent: 32
hour: 2842 objects: 20482, bytes: 261871070
overlap with previous period - objects: 5259, total bytes:
65139470, percent: 24
hour: 2843 objects: 26853, bytes: 359263335
overlap with previous period - objects: 6348, total bytes:
90346330, percent: 25
hour: 2844 objects: 41284, bytes: 530875217
overlap with previous period - objects: 9114, total bytes:
113678069, percent: 21
hour: 2845 objects: 56788, bytes: 714659824
overlap with previous period - objects: 14709, total bytes:
168634391, percent: 23
hour: 2846 objects: 69527, bytes: 827435497
overlap with previous period - objects: 18888, total bytes:
190586144, percent: 23
hour: 2847 objects: 104125, bytes: 849059653
*two hours:
*[bcall@d5 ~]$ ./traffic_workingset.pl -t 2 -d 16777216 post2
hour: 2836 objects: 5, bytes: 21469
hour: 2838 objects: 47674, bytes: 557329216
overlap with previous period - objects: 5, total bytes: 21469,
percent: 0
hour: 2840 objects: 32704, bytes: 386734407
overlap with previous period - objects: 10966, total bytes:
132746383, percent: 34
hour: 2842 objects: 40987, bytes: 530788075
overlap with previous period - objects: 10265, total bytes:
114514769, percent: 21
hour: 2844 objects: 83363, bytes: 1076900650
overlap with previous period - objects: 16046, total bytes:
162057227, percent: 15
hour: 2846 objects: 152505, bytes: 1466180080
overlap with previous period - objects: 31607, total bytes:
277027232, percent: 18
hour: 2848 objects: 161981, bytes: 1539186710
overlap with previous period - objects: 38284, total bytes:
342293473, percent: 22
hour: 2850 objects: 146606, bytes: 1502598116
overlap with previous period - objects: 38451, total bytes:
353381905, percent: 23
hour: 2852 objects: 140935, bytes: 1615811852
overlap with previous period - objects: 38650, total bytes:
356447519, percent: 22
hour: 2854 objects: 127372, bytes: 1492043037
overlap with previous period - objects: 37189, total bytes:
326768221, percent: 21
*four hours:
*[bcall@d5 ~]$ ./traffic_workingset.pl -t 4 -d 16777216 post2
hour: 2836 objects: 47674, bytes: 557329216
hour: 2840 objects: 63426, bytes: 803007713
overlap with previous period - objects: 15873, total bytes:
173225580, percent: 21
hour: 2844 objects: 204261, bytes: 2266053498
overlap with previous period - objects: 29116, total bytes:
257040913, percent: 11
hour: 2848 objects: 270136, bytes: 2688402921
overlap with previous period - objects: 63174, total bytes:
527400828, percent: 19
hour: 2852 objects: 231118, bytes: 2781086668
overlap with previous period - objects: 68491, total bytes:
584011398, percent: 20
hour: 2856 objects: 230739, bytes: 2753795619
overlap with previous period - objects: 66077, total bytes:
557332950, percent: 20
hour: 2860 objects: 124443, bytes: 1438856617
overlap with previous period - objects: 47696, total bytes:
417322768, percent: 29
hour: 2864 objects: 64674, bytes: 772265468
overlap with previous period - objects: 23657, total bytes:
220822926, percent: 28
hour: 2868 objects: 176254, bytes: 2148820072
overlap with previous period - objects: 25529, total bytes:
250619141, percent: 11
hour: 2872 objects: 199550, bytes: 2788444805
overlap with previous period - objects: 57998, total bytes:
518807894, percent: 18
*eight hours:
*[bcall@d5 ~]$ ./traffic_workingset.pl -t 8 -d 16777216 post2
hour: 2832 objects: 47674, bytes: 557329216
hour: 2840 objects: 238571, bytes: 2812020298
overlap with previous period - objects: 25722, total bytes:
246526135, percent: 8
hour: 2848 objects: 432763, bytes: 4885478191
overlap with previous period - objects: 89138, total bytes:
709437386, percent: 14
hour: 2856 objects: 307486, bytes: 3775329468
overlap with previous period - objects: 102289, total bytes:
818169627, percent: 21
hour: 2864 objects: 215399, bytes: 2670466399
overlap with previous period - objects: 71682, total bytes:
593670076, percent: 22
hour: 2872 objects: 346799, bytes: 5432255520
overlap with previous period - objects: 83435, total bytes:
714189092, percent: 13
hour: 2880 objects: 296848, bytes: 4399969376
overlap with previous period - objects: 107980, total bytes:
897157271, percent: 20
hour: 2888 objects: 293829, bytes: 3481795517
overlap with previous period - objects: 65063, total bytes:
530234761, percent: 15
*16 hours:
*[bcall@d5 ~]$ ./traffic_workingset.pl -t 16 -d 16777216 post2
hour: 2832 objects: 260523, bytes: 3122823379
hour: 2848 objects: 637960, bytes: 7842638032
overlap with previous period - objects: 112224, total bytes:
873272314, percent: 11
hour: 2864 objects: 478763, bytes: 7388532827
overlap with previous period - objects: 159546, total bytes:
1202308406, percent: 16
hour: 2880 objects: 525614, bytes: 7351530132
overlap with previous period - objects: 144261, total bytes:
1135653343, percent: 15
hour: 2896 objects: 342600, bytes: 3518992577
overlap with previous period - objects: 91463, total bytes:
711931497, percent: 20
-Bryan
On 04/27/2012 05:06 PM, John Plevyak wrote:
Interesting. Maybe there is a bug in the overflow code. 1/2 is
significant. Is the hot set stable or does it change? Do you fit the
working set? Clubs has more overhead but should be more robust (sans
bugs).
On Apr 27, 2012 4:10 PM, "Bryan Call"<bc...@yahoo-inc.com> wrote:
I wanted to know if anyone else has done testing on comparing how clfus
and lru perform for real world traffic.
I noticed that a server using clfus and didn't have the cache filled was
getting a higher ram cache hit rate then a server that had the cache full.
That lead me to do a test with lru and clfus in production. All I did was
switch the configuration option on one box and wait for the cache to fill
up. The server running lru was getting 1/2 the ram cache misses. Also,
this was confirmed by the amount of disk access the clfus server was doing
to grab objects from the disk cache.
-Bryan