Hi there,

For those of you that are facts and numbers crazy,
I attached some data size info for 3 large FSFS
repositories. They are 1.8-format mirrors of the
Apache, KDE and wordpress repositories. I used
my new fsfs-stats tool to extract the info.

Some of my findings:

* Apache: lots of large zip files added lately
  (low overall compression rate but tool does not
   list zip files etc. as the reason - yet)
* KDE: still larger then Apache with an excellent
  compression ratio (lots of large .po files); >1TB
* Wordpress: directory compression eliminated
  directory storage overhead (5000% => <10%)

* rep sharing is most effective when you have many
  "casual" users (> factor 2 in wordpress; 25% savings
  for Apache; insignificant for KDE since po files
  are not shared / identical between branches)
* noderevs + changes list takes up 10..30% of
  the total repo size, i.e. actual content already
  well compressed

* more different file props reps than I thought
  (probably due to per-file old merge info)
* >50% of all nodes in Apache repo have props
* rep sharing + deltification brings prop info down
  to ~10 bytes / rev for Apache

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*
Global statistics:
      43,571,200,544 bytes in    1,407,978 revisions
       1,719,438,790 bytes in   11,919,461 changes
       8,527,042,341 bytes in   28,631,286 node revision records
      32,606,404,032 bytes in   26,042,259 representations
     175,991,665,585 bytes expanded representation size
     232,589,088,405 bytes with rep-sharing off

Noderev statistics:
       8,527,042,341 bytes in   28,631,286 nodes total
       4,529,280,752 bytes in   18,195,547 directory noderevs
       3,997,761,589 bytes in   10,435,739 file noderevs

Representation statistics:
      32,606,404,032 bytes in   26,042,259 representations total
       1,206,577,410 bytes in   17,999,442 directory representations
      31,386,080,727 bytes in    7,866,975 file representations
           7,936,967 bytes in      102,123 directory property representations
           5,808,928 bytes in       73,719 file property representations
         703,824,567 bytes in header & footer overhead

Directory representation statistics:
       1,206,577,410 bytes in   17,999,442 reps
           7,198,044 bytes in       76,251 shared reps
      14,900,076,043 bytes expanded size
          54,380,469 bytes expanded shared size
      15,067,384,452 bytes with rep-sharing off
             140,449 shared references

File representation statistics:
      31,386,080,727 bytes in    7,866,975 reps
       6,957,017,837 bytes in    1,308,907 shared reps
     160,724,606,881 bytes expanded size
      26,699,217,946 bytes expanded shared size
     215,992,591,222 bytes with rep-sharing off
           2,568,681 shared references

Directory property representation statistics:
           7,936,967 bytes in      102,123 reps
           2,435,475 bytes in       30,208 shared reps
         236,898,639 bytes expanded size
          48,224,988 bytes expanded shared size
         959,652,575 bytes with rep-sharing off
           3,267,341 shared references

File property representation statistics:
           5,808,928 bytes in       73,719 reps
             691,141 bytes in        8,936 shared reps
         130,084,022 bytes expanded size
           4,241,945 bytes expanded shared size
         569,460,156 bytes with rep-sharing off
           6,554,789 shared references
Global statistics:
      42,516,758,377 bytes in    1,325,037 revisions
       2,112,852,964 bytes in   18,163,503 changes
       9,918,750,627 bytes in   31,461,675 node revision records
      29,614,818,603 bytes in   29,269,280 representations
   1,114,881,994,595 bytes expanded representation size
   1,155,846,558,984 bytes with rep-sharing off

Noderev statistics:
       9,918,750,627 bytes in   31,461,675 nodes total
       3,641,226,857 bytes in   14,233,846 directory noderevs
       6,277,523,770 bytes in   17,227,829 file noderevs

Representation statistics:
      29,614,818,603 bytes in   29,269,280 representations total
       1,411,801,736 bytes in   14,143,671 directory representations
      28,200,181,907 bytes in   15,087,277 file representations
           1,465,071 bytes in       17,885 directory property representations
           1,369,889 bytes in       20,447 file property representations
         856,408,582 bytes in header & footer overhead

Directory representation statistics:
       1,411,801,736 bytes in   14,143,671 reps
           5,670,142 bytes in       51,339 shared reps
      26,884,721,654 bytes expanded size
          61,486,365 bytes expanded shared size
      26,955,905,794 bytes with rep-sharing off
              63,390 shared references

File representation statistics:
      28,200,181,907 bytes in   15,087,277 reps
       3,087,013,223 bytes in    1,136,350 shared reps
   1,087,898,597,508 bytes expanded size
      23,485,645,700 bytes expanded shared size
   1,126,563,329,834 bytes with rep-sharing off
           2,140,551 shared references

Directory property representation statistics:
           1,465,071 bytes in       17,885 reps
             782,037 bytes in        8,669 shared reps
          93,340,801 bytes expanded size
          30,873,623 bytes expanded shared size
       1,374,010,811 bytes with rep-sharing off
           8,070,095 shared references

File property representation statistics:
           1,369,889 bytes in       20,447 reps
             188,512 bytes in        3,028 shared reps
           5,334,632 bytes expanded size
             855,812 bytes expanded shared size
         953,312,545 bytes with rep-sharing off
           9,041,782 shared references
Global statistics:
       8,233,212,081 bytes in      507,189 revisions
         336,363,580 bytes in    3,473,008 changes
       1,205,197,688 bytes in    5,125,527 node revision records
       6,610,608,683 bytes in    3,175,300 representations
     416,559,053,291 bytes expanded representation size
     440,976,526,859 bytes with rep-sharing off

Noderev statistics:
       1,205,197,688 bytes in    5,125,527 nodes total
         403,048,125 bytes in    2,263,745 directory noderevs
         802,149,563 bytes in    2,861,782 file noderevs

Representation statistics:
       6,610,608,683 bytes in    3,175,300 representations total
         428,471,684 bytes in    2,111,717 directory representations
       6,181,996,505 bytes in    1,061,535 file representations
             116,243 bytes in        1,742 directory property representations
              24,251 bytes in          306 file property representations
          75,980,107 bytes in header & footer overhead

Directory representation statistics:
         428,471,684 bytes in    2,111,717 reps
           5,577,314 bytes in       36,636 shared reps
     398,462,596,403 bytes expanded size
          79,861,877 bytes expanded shared size
     398,549,277,881 bytes with rep-sharing off
              42,953 shared references

File representation statistics:
       6,181,996,505 bytes in    1,061,535 reps
       3,029,368,482 bytes in      446,128 shared reps
      18,096,237,254 bytes expanded size
       7,064,016,710 bytes expanded shared size
      42,360,997,646 bytes with rep-sharing off
           1,800,236 shared references

Directory property representation statistics:
             116,243 bytes in        1,742 reps
              78,252 bytes in        1,100 shared reps
             193,351 bytes expanded size
             106,096 bytes expanded shared size
           4,082,036 bytes with rep-sharing off
              68,921 shared references

File property representation statistics:
              24,251 bytes in          306 reps
              18,453 bytes in          239 shared reps
              26,283 bytes expanded size
              18,931 bytes expanded shared size
          62,169,296 bytes with rep-sharing off
           1,213,859 shared references

Reply via email to