Bill, I too think that we should defer to Kern and his list of priorities.
Kern has been doing a great job of prioritizing and running this project. I think that people that are that tight on storage should prune the files a few days earlier and let Kern work on functionality, disk space is fairly in expensive. -Jason On Tue, 2008-02-12 at 09:52 -0500, Bill Moran wrote: > In response to Cousin Marc <[EMAIL PROTECTED]>: > > [snip] > > > > One hash is not possible without seriously restricting user's flexibility > > > -- the MD5 field though not totally used as planned in 2.2. (hopefully it > > > will be in 2.2) is a critical field for security and certain government > > > legal requirements for the storage of data. As legal requirements become > > > more strict with increasing computer speed/technology we can expect the > > > need for larger and larger hash codes. In any case, it certainly needs to > > > be user configurable without rebuilding the database -- thus variable. In > > > fact, as it stands, the user can have multiple different MD5 sizes in any > > > given Job. I.e. some files may be more important to verify than others. > > > > I understand that md5 is required, as it's the only way of reliably > > checking > > that a file has not been modified. But only one type of checksum may be > > more > > efficient from a database point of view, as it could be fixed size (no need > > to waste 4 bytes for instance in postgresql telling the engine : be > > careful, > > next field is variable length, here is it's size). Of course, going from > > md5 > > to sha256 sacrifices 16 bytes... I don't know if there could be an > > efficient > > way of doing this. Anyway, base64 "wastes" more space in this scheme, so a > > transparent conversion at database level may be useful. > > > > > As already explained, I would be very reluctant to make it a requirement > > > to > > > be a multiple of 32 bits. It just takes one genius to come up with a new > > > super fast algorithm that uses 129 bits to break the code. > > Okay, I'll experiment with both. For us right now, a byte per record is > > only > > 300MB in database size :) > > Any time you look at complicating things to improve efficiency, there's > the question "is it worth it". > > On the larger of our two Bacula servers, the database size is 8.5G. The > file table contains 35 million rows. If you can save 16 bytes per row, > that means an on-disk savings of 1/2G. > > My reaction to that would be "big friggin deal". Considering the fact > that we've got 750G of file volumes on a RAID 5, saving 500M on the > database doesn't really seems worth the effort to me. > > Let's say Bacula moves to using SHA-256 hashes instead of md5. Now the > savings in storage space is 32 bytes instead of 16 bytes. So, I'd be > saving a whole G on the total database size. I still say, "why bother" > > Bacula works just dandy for us with these sizes. If I do a "list jobs > where a given file is saved" on the largest of our servers, the response > is fast enough that I don't even consider it a wait. Quite honestly, it's > fast enough that I have trouble believing that it doesn't take longer. > Nearly instantaneous. > > Just my opinion, of course. I'd be interested to hear how much effect > this would have on others and whether they think it's worthwhile to even > investigate. > -- ---------------------------------------------------------------------------- Jason A. Kates ([EMAIL PROTECTED]) Fax: 208-975-1514 Phone: 212-400-1670 x2 ============================================================================ ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel