On Tuesday 12 February 2008 15.59:10 Jason A. Kates wrote: > Bill, > > I too think that we should defer to Kern and his list of priorities.
Thanks for your vote of confidence :-) It is interesting because this is the first time I have not worked on the highest priority job (Item 1: Accurate restoration of renamed/deleted files), which was also the project that interested me the most. Instead, I am working on plugins (item 12: Add Plug-ins to the FileSet Include statements), which I decided to work on because it is the #1 most requested feature for enterprises (they want to be able to backup MS Exchange with a "module"). Well, I wasn't much enjoying the project, because it is a lot of *really* heavy design and delicate integration with Bacula, but now that I am into it, it is getting really interesting. And the real nice part is that a very kind programmer came along and is making very good progress on the Accurate Backup project :-) Also another kind programmer came along and is making great progress on Item h7: Commercial database support, which is the #2 most demanded enterprise feature -- I still have to figure out how to make this work legally/morally with our Open Source license ... Kern > > Kern has been doing a great job of prioritizing and running this > project. I think that people that are that tight on storage should > prune the files a few days earlier and let Kern work on functionality, > disk space is fairly in expensive. > > -Jason > > On Tue, 2008-02-12 at 09:52 -0500, Bill Moran wrote: > > In response to Cousin Marc <[EMAIL PROTECTED]>: > > > > [snip] > > > > > > One hash is not possible without seriously restricting user's > > > > flexibility -- the MD5 field though not totally used as planned in > > > > 2.2. (hopefully it will be in 2.2) is a critical field for security > > > > and certain government legal requirements for the storage of data. > > > > As legal requirements become more strict with increasing computer > > > > speed/technology we can expect the need for larger and larger hash > > > > codes. In any case, it certainly needs to be user configurable > > > > without rebuilding the database -- thus variable. In fact, as it > > > > stands, the user can have multiple different MD5 sizes in any given > > > > Job. I.e. some files may be more important to verify than others. > > > > > > I understand that md5 is required, as it's the only way of reliably > > > checking that a file has not been modified. But only one type of > > > checksum may be more efficient from a database point of view, as it > > > could be fixed size (no need to waste 4 bytes for instance in > > > postgresql telling the engine : be careful, next field is variable > > > length, here is it's size). Of course, going from md5 to sha256 > > > sacrifices 16 bytes... I don't know if there could be an efficient way > > > of doing this. Anyway, base64 "wastes" more space in this scheme, so a > > > transparent conversion at database level may be useful. > > > > > > > As already explained, I would be very reluctant to make it a > > > > requirement to be a multiple of 32 bits. It just takes one genius to > > > > come up with a new super fast algorithm that uses 129 bits to break > > > > the code. > > > > > > Okay, I'll experiment with both. For us right now, a byte per record is > > > only 300MB in database size :) > > > > Any time you look at complicating things to improve efficiency, there's > > the question "is it worth it". > > > > On the larger of our two Bacula servers, the database size is 8.5G. The > > file table contains 35 million rows. If you can save 16 bytes per row, > > that means an on-disk savings of 1/2G. > > > > My reaction to that would be "big friggin deal". Considering the fact > > that we've got 750G of file volumes on a RAID 5, saving 500M on the > > database doesn't really seems worth the effort to me. > > > > Let's say Bacula moves to using SHA-256 hashes instead of md5. Now the > > savings in storage space is 32 bytes instead of 16 bytes. So, I'd be > > saving a whole G on the total database size. I still say, "why bother" > > > > Bacula works just dandy for us with these sizes. If I do a "list jobs > > where a given file is saved" on the largest of our servers, the response > > is fast enough that I don't even consider it a wait. Quite honestly, > > it's fast enough that I have trouble believing that it doesn't take > > longer. Nearly instantaneous. > > > > Just my opinion, of course. I'd be interested to hear how much effect > > this would have on others and whether they think it's worthwhile to even > > investigate. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel