On Aug 9, 2010, at 2:55 AM, Henry Yen wrote: > On Fri, Aug 06, 2010 at 10:48:10AM +0200, Christian Gaul wrote: >> Even when catting to /dev/dsp i use /dev/urandom.. Blocking on >> /dev/random happens much too quickly.. and when do you really need that >> much randomness. > > I get about 40 bytes on a small server before blocking.
On Linux, /dev/random will block if there is insufficient entropy in the pool. Unlike /dev/random, /dev/urandom will not block on Linux, but will reuse entropy in the pool. Thus, /dev/random produces higher quality, lower quantity random data than /dev/urandom. For the purposes of compressibility tests, the pseudorandom data of /dev/urandom is perfectly fine. The /dev/random device is better used, e.g., for generating cryptographic keys. > >>> Reason 1: the example I gave yields a file size for "tempchunk" of 512MB, >>> not 1MB, as given in your counter-example. I agree that (at least >>> now-a-days) >>> catting 1MB chunks into a 6MB chunk is likely (although not assured) >>> to lead to greatly reduced size during later compression, but I disagree >>> that catting 512MB chunks into a 3GB chunk is likely to be compressible >>> by any general-purpose compressor. >> >> Which is what i meant with "way bigger than the library size of the >> algorithm". Mostly my "Information" was pitfalls to look out for when >> testing the speed of your equipment, if you went ahead and cat-ted 3000 >> x 1MB, i believe the hardware compression would make something highly >> compressed out of it. >> My guess is it would work for most chunks around half as large as the >> buffer size of the drive (totally guessing). > > I think that the tape drive manufacturers don't make large buffer/CPU > capacity in their drives yet. I finally did a test on an SDLT2 (160GB) > drive; admittedly, it's fairly old as tape drives go, but tape technology > appears to be rather a bit slower than disk technology, at least as far > as raw capacity is concerned. I created two files from /dev/urandom; > one was 1GB, the other a mere 10K. I then created two identically-sized > files corresponding to each of these two chunks (4 of the first and approx. > 400k of the second). Writing them to the SDLT2 drive using 60k blocksize, > with compression on, yielded uncanny results: the writable capacity before > hitting EOT was within 0.01%, and the elapsed time was within 0.02%. As I posted here recently, even modern LTO tape drives use only a 1 KB (1024 byte) history buffer for its sliding window-based compression algorithm. So, any repeated random chunk greater than 1 KB in size will be incompressible by LTO tape drives. > I see there's a reason to almost completely ignore the so-called "compressed > capacity" claims by tape drive manufacturers... By definition, random data are not compressible. It's my understanding that the "compressed capacity" of tapes is based explicitly on an expected 2:1 compression ratio for source data (and this is usually cited somewhere in the small print). That is a reasonable estimate for text. Other data may compress better or worse. Already-compressed or encrypted data will be incompressible to the tape drive. In other words, "compressed capacity" is heavily dependent on your source data. Cheers, Paul. ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users