Kern Sibbald wrote: > Hello, > > Here are a couple of notes about this feature, then some related ideas ... > > - There is a clear need for a feature like this. If you have a Job that has > the File records pruned, and it was a backup of 1TB but you only want a tiny > portion of that, the only alternative to a solution like this is to scan the > Volume, which is terribly slow. > > - As Martin points out, this code gives the SD a bit more knowledge of the > records it has stored, but unless someone has a better idea, I see no > alternative. > > - One aspect of this code I haven't looked at yet is whether it is really > required to add it in read_record.c rather than match_bsr.c, where all the > other bsr filtering code is located. To be investigated ... > > ======== > > On a similar but slightly different subject: one user brought up a problem > that we are surely likely to see quite a lot in the near future. He has 600 > million File records in his Bacula catalog, and he is required to have at > least a 7 year retention period, which means the database is growing (I think > it is currently at 100GB), and it will continue to grow. > > He has proposed to improve performance to have a separate File table for each > client. This would very likely improve the performance quite a lot because > if you have say 60 clients, instead of having one gigantic File table it > would be split into 60 smaller tables. For example, instead of referencing > File, Bacula would for a clients named FD1 and FD2 reference FD1Files and > FD2Files, and so on, each of which would be identical tables but containing > only the data for a single client. > > The problem I have with the suggestion is that it would require rather > massive > changes to the current SQL code, and it would break all external programs > that reference the File table of the database. > > The first important information is that version 3.0.0 we are planning to > switch to by default using a 64 bit Id for the File table -- this will remove > the current restriction of 4G files (it can manually be enabled in the > current version, so the main change is to make it automatic). > > The second thing that could help a lot is the "Selective restore" patch > submitted by Kjetil, because although a user may have a requirement for long > retention periods, that does not necessarily mean the all the File records > must be kept -- what is probably the most important is retaining the data and > being able to extract it in a reasonable amount of time. Implementation of > this patch will allow some users to prune the File records even though the > Volumes must be kept a long time. Obviously this will not satisfy all > requirements. > > Another suggestion that I have for the problem of growing File tables is a > sort of compromise. Suppose that we implement two File retention periods. > One as currently exists that defines when the records are deleted, and a new > period that defines when the records are moved out of the File table and > placed in a secondary table perhaps called OldFiles. This would allow users > to keep the efficiency for active files high but at the same time allow the > delete retention period to be quite long. The database would still grow, but > there would be a lot less overhead. Actually the name of the table for > these "expired" File records could even be defined on a client by client or > Job by Job basis which would allow for having multiple "OldFiles" tables. > > Another advantage of my suggestion would be that within Bacula itself, > switching from using the File table to using the OldFiles table could be made > totally automatic (it will require a bit of code, but no massive changes). > External programs would still function normally in most cases, but if they > wanted to access older data, they would need some modification. > > We could also envision moving the "expired" File records to a different > database, which would in the end be much more efficient, but would require > considerably more work to implement. > > Whatever is finally decided, it is clear to me that it is unlikely to be > implemented in time for the next major release (planned for the end of the > year). > > I would appreciate your comments on either the "Selective restore" feature > and/or the "multiple File table" feature.
The cleanest way of splitting up tables is probably to use partitioning. This lets you create a single logical table that is backed by multiple physical partitions, with a ruleset on the logical table that determines which partition a given row is stored in. This might let you, for example, sort the File table rows into partitions on a per-year basis. To the majority of the standard insert/select statements, the partitioning isn't visible, so it should have far less impact on the SQL code in Bacula than managing multiple tables manually. Partitioning is supported by postgresql http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html and is already in the 5.1 development version of MySQL http://dev.mysql.com/doc/refman/5.1/en/partitioning.html I don't see any support for it in sqlite, but if you're worrying about that large a catalog you probably shouldn't be using sqlite anyway. -- Frank Sweetser fs at wpi.edu | For every problem, there is a solution that WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel