Hello, Now that Bacula version 2.2.0 has been released, I thought I would give you a brief review of the direction that I see Bacula taking over the next year.
1. Of course, there will probably be a few maintenance releases that fix bugs and add minor new features. I believe that Eric already has several projects implemented ... 2. One major change is that as I have previously noted, I will be decreasing the time I spend on the project from 100% to 40-50%. The rest of the time 50-60% I will be devoting to the new Bacula services endeavor, which should be operational by the beginning of next year. At some point (a year or two from now), I will probably return full time to the project. As a consequence, development for the project will probably temporarily slow down unless the contribution rate increases. However, in the long run, the Bacula services endeavor, in my opinion, is the best and fastest way to accelerate Bacula development. 3. Normally after a major release, we do a vote on the Projects so that the developers will have your input as to what is important and what is not. This does not guarantee the the developers will develop all the high priority projects and not the low ones, but the user assigned priority is certainly the largest factor in deciding what to work on. For this particular release, unfortunately, the #1 project on the list was taken by a developer who recently left the project, which means it was not implemented. As a consequence, in my opinion, it is not absolutely necessary to hold a new vote as there are enough high priority projects to work on. That said, if Arno, would like to do a vote on the project list, that is perfectly fine with me, and perhaps some of your priorities have changed. In any case, I have reviewed the old project list, removed the items that were completed in 2.2.0, combined several projects that were similar, and eliminated (put into a hold area) projects that are either developer optimizations, not well enough explained for me to implement, projects that I don't know how to implement, or projects that require proprietary code, so cannot be implemented in Bacula (at the current moment). This cut the number of projects in the voting list down from 44 to 25. They are numbered 1-25. There are 10 projects in the hold list h1-h10. For all the projects that I placed on hold, I made notes, so if one of your projects was placed on hold, you will know why, and if it was placed on hold because I didn't understand what you want or need additional information, please feel free to supply it. In addition, I stopped keeping track of Feature Requests some time ago (about 3 months ago) so any Feature Requests submitted after that point are not included in the current list. To sum it up, I've reproduced the list below, and if you feel it is important to vote again on the items, please discuss it with Arno, work out the details and let me know. Best regards, Kern Projects: Bacula Projects Roadmap Status updated 18 August 2007 After removing items completed in version 2.2.0 and renumbering Items Completed: Summary: Item 1: Accurate restoration of renamed/deleted files Item 2: Allow FD to initiate a backup Item 3: Merge multiple backups (Synthetic Backup or Consolidation) Item 4: Implement Catalog directive for Pool resource in Director Item 5: Add an item to the restore option where you can select a Pool Item 6: Deletion of disk Volumes when pruned Item 7: Implement Base jobs Item 8: Implement Copy pools Item 9: Scheduling syntax that permits more flexibility and options Item 10: Message mailing based on backup types Item 11: Cause daemons to use a specific IP address to source communications Item 12: Add Plug-ins to the FileSet Include statements. Item 13: Restore only file attributes (permissions, ACL, owner, group...) Item 14: Add an override in Schedule for Pools based on backup types Item 15: Implement more Python events and functions Item 16: Allow inclusion/exclusion of files in a fileset by creation/mod times Item 17: Automatic promotion of backup levels based on backup size Item 18: Better control over Job execution Item 19: Automatic disabling of devices Item 20: An option to operate on all pools with update vol parameters Item 21: Include timestamp of job launch in "stat clients" output Item 22: Implement Storage daemon compression Item 23: Improve Bacula's tape and drive usage and cleaning management Item 24: Multiple threads in file daemon for the same job Item 25: Archival (removal) of User Files to Tape Item 1: Accurate restoration of renamed/deleted files Date: 28 November 2005 Origin: Martin Simmons (martin at lispworks dot com) Status: Robert Nelson will implement this What: When restoring a fileset for a specified date (including "most recent"), Bacula should give you exactly the files and directories that existed at the time of the last backup prior to that date. Currently this only works if the last backup was a Full backup. When the last backup was Incremental/Differential, files and directories that have been renamed or deleted since the last Full backup are not currently restored correctly. Ditto for files with extra/fewer hard links than at the time of the last Full backup. Why: Incremental/Differential would be much more useful if this worked. Notes: Merging of multiple backups into a single one seems to rely on this working, otherwise the merged backups will not be truly equivalent to a Full backup. Note: Kern: notes shortened. This can be done without the need for inodes. It is essentially the same as the current Verify job, but one additional database record must be written, which does not need any database change. Notes: Kern: see if we can correct restoration of directories if replace=ifnewer is set. Currently, if the directory does not exist, a "dummy" directory is created, then when all the files are updated, the dummy directory is newer so the real values are not updated. Item 2: Allow FD to initiate a backup Origin: Frank Volf (frank at deze dot org) Date: 17 November 2005 Status: What: Provide some means, possibly by a restricted console that allows a FD to initiate a backup, and that uses the connection established by the FD to the Director for the backup so that a Director that is firewalled can do the backup. Why: Makes backup of laptops much easier. Item 3: Merge multiple backups (Synthetic Backup or Consolidation) Origin: Marc Cousin and Eric Bollengier Date: 15 November 2005 Status: What: A merged backup is a backup made without connecting to the Client. It would be a Merge of existing backups into a single backup. In effect, it is like a restore but to the backup medium. For instance, say that last Sunday we made a full backup. Then all week long, we created incremental backups, in order to do them fast. Now comes Sunday again, and we need another full. The merged backup makes it possible to do instead an incremental backup (during the night for instance), and then create a merged backup during the day, by using the full and incrementals from the week. The merged backup will be exactly like a full made Sunday night on the tape, but the production interruption on the Client will be minimal, as the Client will only have to send incrementals. In fact, if it's done correctly, you could merge all the Incrementals into single Incremental, or all the Incrementals and the last Differential into a new Differential, or the Full, last differential and all the Incrementals into a new Full backup. And there is no need to involve the Client. Why: The benefit is that : - the Client just does an incremental ; - the merged backup on tape is just as a single full backup, and can be restored very fast. This is also a way of reducing the backup data since the old data can then be pruned (or not) from the catalog, possibly allowing older volumes to be recycled Item 4: Implement Catalog directive for Pool resource in Director Origin: Alan Davis [EMAIL PROTECTED] Date: 6 March 2007 Status: Submitted What: The current behavior is for the director to create all pools found in the configuration file in all catalogs. Add a Catalog directive to the Pool resource to specify which catalog to use for each pool definition. Why: This allows different catalogs to have different pool attributes and eliminates the side-effect of adding pools to catalogs that don't need/use them. Notes: Kern: I think this is relatively easy to do, and it is really a pre-requisite to a number of the Copy pool, ... projects that are listed here. Item 5: Add an item to the restore option where you can select a Pool Origin: kshatriyak at gmail dot com Date: 1/1/2006 Status: What: In the restore option (Select the most recent backup for a client) it would be useful to add an option where you can limit the selection to a certain pool. Why: When using cloned jobs, most of the time you have 2 pools - a disk pool and a tape pool. People who have 2 pools would like to select the most recent backup from disk, not from tape (tape would be only needed in emergency). However, the most recent backup (which may just differ a second from the disk backup) may be on tape and would be selected. The problem becomes bigger if you have a full and differential - the most "recent" full backup may be on disk, while the most recent differential may be on tape (though the differential on disk may differ even only a second or so). Bacula will complain that the backups reside on different media then. For now the only solution now when restoring things when you have 2 pools is to manually search for the right job-id's and enter them by hand, which is a bit fault tolerant. Notes: Kern: This is a nice idea. It could also be the way to support Jobs that have been Copied (similar to migration, but not yet implemented). Item 6: Deletion of disk Volumes when pruned Date: Nov 25, 2005 Origin: Ross Boylan <RossBoylan at stanfordalumni dot org> (edited by Kern) Status: What: Provide a way for Bacula to automatically remove Volumes from the filesystem, or optionally to truncate them. Obviously, the Volume must be pruned prior removal. Why: This would allow users more control over their Volumes and prevent disk based volumes from consuming too much space. Notes: The following two directives might do the trick: Volume Data Retention = <time period> Remove Volume After = <time period> The migration project should also remove a Volume that is migrated. This might also work for tape Volumes. Item 7: Implement Base jobs Date: 28 October 2005 Origin: Kern Status: What: A base job is sort of like a Full save except that you will want the FileSet to contain only files that are unlikely to change in the future (i.e. a snapshot of most of your system after installing it). After the base job has been run, when you are doing a Full save, you specify one or more Base jobs to be used. All files that have been backed up in the Base job/jobs but not modified will then be excluded from the backup. During a restore, the Base jobs will be automatically pulled in where necessary. Why: This is something none of the competition does, as far as we know (except perhaps BackupPC, which is a Perl program that saves to disk only). It is big win for the user, it makes Bacula stand out as offering a unique optimization that immediately saves time and money. Basically, imagine that you have 100 nearly identical Windows or Linux machine containing the OS and user files. Now for the OS part, a Base job will be backed up once, and rather than making 100 copies of the OS, there will be only one. If one or more of the systems have some files updated, no problem, they will be automatically restored. Notes: Huge savings in tape usage even for a single machine. Will require more resources because the DIR must send FD a list of files/attribs, and the FD must search the list and compare it for each file to be saved. Item 8: Implement Copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. Why: In many cases, businesses are required to keep offsite copies of backup volumes, or just wish for simple protection against a human operator dropping a storage volume and damaging it. The ability to generate multiple volumes in the course of a single backup job allows customers to simple check out one copy and send it offsite, marking it as out of changer or otherwise unavailable. Currently, the library and magazine management capability in Bacula does not make this process simple. Restores would use the copy of the data on the first available volume, in order of Copy pool chain definition. This is also a major scalability issue -- as the number of clients increases beyond several thousand, and the volume of data increases, transferring the data multiple times to produce additional copies of the backups will become physically impossible due to transfer speed issues. Generating multiple copies at server side will become the only practical option. How: I suspect that this will require adding a multiplexing SD that appears to be a SD to a specific FD, but 1-n FDs to the specific back end SDs managing the primary and copy pools. Storage pools will also need to acquire parameters to define the pools to be used for copies. Notes: I would commit some of my developers' time if we can agree on the design and behavior. Notes: I get the idea, but would like more details on the precise syntax of the necessary directives and what they would do. Item 9: Scheduling syntax that permits more flexibility and options Date: 15 December 2006 Origin: Gregory Brauer (greg at wildbrain dot com) and Florian Schnabel <florian.schnabel at docufy dot de> Status: What: Currently, Bacula only understands how to deal with weeks of the month or weeks of the year in schedules. This makes it impossible to do a true weekly rotation of tapes. There will always be a discontinuity that will require disruptive manual intervention at least monthly or yearly because week boundaries never align with month or year boundaries. A solution would be to add a new syntax that defines (at least) a start timestamp, and repetition period. An easy option to skip a certain job on a certain date. Why: Rotated backups done at weekly intervals are useful, and Bacula cannot currently do them without extensive hacking. You could then easily skip tape backups on holidays. Especially if you got no autochanger and can only fit one backup on a tape that would be really handy, other jobs could proceed normally and you won't get errors that way. Notes: Here is an example syntax showing a 3-week rotation where full Backups would be performed every week on Saturday, and an incremental would be performed every week on Tuesday. Each set of tapes could be removed from the loader for the following two cycles before coming back and being reused on the third week. Since the execution times are determined by intervals from a given point in time, there will never be any issues with having to adjust to any sort of arbitrary time boundary. In the example provided, I even define the starting schedule as crossing both a year and a month boundary, but the run times would be based on the "Repeat" value and would therefore happen weekly as desired. Schedule { Name = "Week 1 Rotation" #Saturday. Would run Dec 30, Jan 20, Feb 10, etc. Run { Options { Type = Full Start = 2006-12-30 01:00 Repeat = 3w } } #Tuesday. Would run Jan 2, Jan 23, Feb 13, etc. Run { Options { Type = Incremental Start = 2007-01-02 01:00 Repeat = 3w } } } Schedule { Name = "Week 2 Rotation" #Saturday. Would run Jan 6, Jan 27, Feb 17, etc. Run { Options { Type = Full Start = 2007-01-06 01:00 Repeat = 3w } } #Tuesday. Would run Jan 9, Jan 30, Feb 20, etc. Run { Options { Type = Incremental Start = 2007-01-09 01:00 Repeat = 3w } } } Schedule { Name = "Week 3 Rotation" #Saturday. Would run Jan 13, Feb 3, Feb 24, etc. Run { Options { Type = Full Start = 2007-01-13 01:00 Repeat = 3w } } #Tuesday. Would run Jan 16, Feb 6, Feb 27, etc. Run { Options { Type = Incremental Start = 2007-01-16 01:00 Repeat = 3w } } } Notes: Kern: I have merged the previously separate project of skipping jobs (via Schedule syntax) into this. Item 10: Message mailing based on backup types Origin: Evan Kaufman <[EMAIL PROTECTED]> Date: January 6, 2006 Status: What: In the "Messages" resource definitions, allowing messages to be mailed based on the type (backup, restore, etc.) and level (full, differential, etc) of job that created the originating message(s). Why: It would, for example, allow someone's boss to be emailed automatically only when a Full Backup job runs, so he can retrieve the tapes for offsite storage, even if the IT dept. doesn't (or can't) explicitly notify him. At the same time, his mailbox wouldnt be filled by notifications of Verifies, Restores, or Incremental/Differential Backups (which would likely be kept onsite). Notes: One way this could be done is through additional message types, for example: Messages { # email the boss only on full system backups Mail = [EMAIL PROTECTED] = full, !incremental, !differential, !restore, !verify, !admin # email us only when something breaks MailOnError = [EMAIL PROTECTED] = all } Notes: Kern: This should be rather trivial to implement. Item 11: Cause daemons to use a specific IP address to source communications Origin: Bill Moran <[EMAIL PROTECTED]> Date: 18 Dec 2006 Status: What: Cause Bacula daemons (dir, fd, sd) to always use the ip address specified in the [DIR|DF|SD]Addr directive as the source IP for initiating communication. Why: On complex networks, as well as extremely secure networks, it's not unusual to have multiple possible routes through the network. Often, each of these routes is secured by different policies (effectively, firewalls allow or deny different traffic depending on the source address) Unfortunately, it can sometimes be difficult or impossible to represent this in a system routing table, as the result is excessive subnetting that quickly exhausts available IP space. The best available workaround is to provide multiple IPs to a single machine that are all on the same subnet. In order for this to work properly, applications must support the ability to bind outgoing connections to a specified address, otherwise the operating system will always choose the first IP that matches the required route. Notes: Many other programs support this. For example, the following can be configured in BIND: query-source address 10.0.0.1; transfer-source 10.0.0.2; Which means queries from this server will always come from 10.0.0.1 and zone transfers will always originate from 10.0.0.2. Item 12: Add Plug-ins to the FileSet Include statements. Date: 28 October 2005 Origin: Kern Status: Partially coded in 1.37 -- much more to do. What: Allow users to specify wild-card and/or regular expressions to be matched in both the Include and Exclude directives in a FileSet. At the same time, allow users to define plug-ins to be called (based on regular expression/wild-card matching). Why: This would give the users the ultimate ability to control how files are backed up/restored. A user could write a plug-in knows how to backup his Oracle database without stopping/starting it, for example. Item 13: Restore only file attributes (permissions, ACL, owner, group...) Origin: Eric Bollengier Date: 30/12/2006 Status: What: The goal of this project is to be able to restore only rights and attributes of files without crushing them. Why: Who have never had to repair a chmod -R 777, or a wild update of recursive right under Windows? At this time, you must have enough space to restore data, dump attributes (easy with acl, more complex with unix/windows rights) and apply them to your broken tree. With this options, it will be very easy to compare right or ACL over the time. Notes: If the file is here, we skip restore and we change rights. If the file isn't here, we can create an empty one and apply rights or do nothing. Item 14: Add an override in Schedule for Pools based on backup types Date: 19 Jan 2005 Origin: Chad Slater <[EMAIL PROTECTED]> Status: What: Adding a FullStorage=BigTapeLibrary in the Schedule resource would help those of us who use different storage devices for different backup levels cope with the "auto-upgrade" of a backup. Why: Assume I add several new devices to be backed up, i.e. several hosts with 1TB RAID. To avoid tape switching hassles, incrementals are stored in a disk set on a 2TB RAID. If you add these devices in the middle of the month, the incrementals are upgraded to "full" backups, but they try to use the same storage device as requested in the incremental job, filling up the RAID holding the differentials. If we could override the Storage parameter for full and/or differential backups, then the Full job would use the proper Storage device, which has more capacity (i.e. a 8TB tape library. Item 15: Implement more Python events and functions Date: 28 October 2005 Origin: Kern Status: What: Allow Python scripts to be called at more places within Bacula and provide additional access to Bacula internal variables. Implement an interface for Python scripts to access the catalog through Bacula. Why: This will permit users to customize Bacula through Python scripts. Notes: Recycle event Scratch pool event NeedVolume event MediaFull event Also add a way to get a listing of currently running jobs (possibly also scheduled jobs). to start the appropriate job. Item 16: Allow inclusion/exclusion of files in a fileset by creation/mod times Origin: Evan Kaufman <[EMAIL PROTECTED]> Date: January 11, 2006 Status: What: In the vein of the Wild and Regex directives in a Fileset's Options, it would be helpful to allow a user to include or exclude files and directories by creation or modification times. You could factor the Exclude=yes|no option in much the same way it affects the Wild and Regex directives. For example, you could exclude all files modified before a certain date: Options { Exclude = yes Modified Before = #### } Or you could exclude all files created/modified since a certain date: Options { Exclude = yes Created Modified Since = #### } The format of the time/date could be done several ways, say the number of seconds since the epoch: 1137008553 = Jan 11 2006, 1:42:33PM # result of `date +%s` Or a human readable date in a cryptic form: 20060111134233 = Jan 11 2006, 1:42:33PM # YYYYMMDDhhmmss Why: I imagine a feature like this could have many uses. It would allow a user to do a full backup while excluding the base operating system files, so if I installed a Linux snapshot from a CD yesterday, I'll *exclude* all files modified *before* today. If I need to recover the system, I use the CD I already have, plus the tape backup. Or if, say, a Windows client is hit by a particularly corrosive virus, and I need to *exclude* any files created/modified *since* the time of infection. Notes: Of course, this feature would work in concert with other in/exclude rules, and wouldnt override them (or each other). Notes: The directives I'd imagine would be along the lines of "[Created] [Modified] [Before|Since] = <date>". So one could compare against 'ctime' and/or 'mtime', but ONLY 'before' or 'since'. Item 17: Automatic promotion of backup levels based on backup size Date: 19 January 2006 Origin: Adam Thornton <[EMAIL PROTECTED]> Status: What: Amanda has a feature whereby it estimates the space that a differential, incremental, and full backup would take. If the difference in space required between the scheduled level and the next level up is beneath some user-defined critical threshold, the backup level is bumped to the next type. Doing this minimizes the number of volumes necessary during a restore, with a fairly minimal cost in backup media space. Why: I know at least one (quite sophisticated and smart) user for whom the absence of this feature is a deal-breaker in terms of using Bacula; if we had it it would eliminate the one cool thing Amanda can do and we can't (at least, the one cool thing I know of). Item 18: Better control over Job execution Date: 18 August 2007 Origin: Kern Status: What: Bacula needs a few extra features for better Job execution: 1. A way to prevent multiple Jobs of the same name from being scheduled at the same time (ususally happens when a job is missed because a client is down). 2. Directives that permit easier upgrading of Job types based on a period of time. I.e. "do a Full at least once every 2 weeks", or "do a differential at least once a week". If a lower level job is scheduled when it begins to run it will be upgraded depending on the specified criteria. Why: Obvious. Item 19: Automatic disabling of devices Date: 2005-11-11 Origin: Peter Eriksson <peter at ifm.liu dot se> Status: What: After a configurable amount of fatal errors with a tape drive Bacula should automatically disable further use of a certain tape drive. There should also be "disable"/"enable" commands in the "bconsole" tool. Why: On a multi-drive jukebox there is a possibility of tape drives going bad during large backups (needing a cleaning tape run, tapes getting stuck). It would be advantageous if Bacula would automatically disable further use of a problematic tape drive after a configurable amount of errors has occurred. An example: I have a multi-drive jukebox (6 drives, 380+ slots) where tapes occasionally get stuck inside the drive. Bacula will notice that the "mtx-changer" command will fail and then fail any backup jobs trying to use that drive. However, it will still keep on trying to run new jobs using that drive and fail - forever, and thus failing lots and lots of jobs... Since we have many drives Bacula could have just automatically disabled further use of that drive and used one of the other ones instead. Item 20: An option to operate on all pools with update vol parameters Origin: Dmitriy Pinchukov <[EMAIL PROTECTED]> Date: 16 August 2006 Status: What: When I do update -> Volume parameters -> All Volumes from Pool, then I have to select pools one by one. I'd like console to have an option like "0: All Pools" in the list of defined pools. Why: I have many pools and therefore unhappy with manually updating each of them using update -> Volume parameters -> All Volumes from Pool -> pool #. Item 21: Include timestamp of job launch in "stat clients" output Origin: Mark Bergman <[EMAIL PROTECTED]> Date: Tue Aug 22 17:13:39 EDT 2006 Status: What: The "stat clients" command doesn't include any detail on when the active backup jobs were launched. Why: Including the timestamp would make it much easier to decide whether a job is running properly. Notes: It may be helpful to have the output from "stat clients" formatted more like that from "stat dir" (and other commands), in a column format. The per-client information that's currently shown (level, client name, JobId, Volume, pool, device, Files, etc.) is good, but somewhat hard to parse (both programmatically and visually), particularly when there are many active clients. Item 22: Implement Storage daemon compression Date: 18 December 2006 Origin: Vadim A. Umanski , e-mail [EMAIL PROTECTED] Status: What: The ability to compress backup data on the SD receiving data instead of doing that on client sending data. Why: The need is practical. I've got some machines that can send data to the network 4 or 5 times faster than compressing them (I've measured that). They're using fast enough SCSI/FC disk subsystems but rather slow CPUs (ex. UltraSPARC II). And the backup server has got a quite fast CPUs (ex. Dual P4 Xeons) and quite a low load. When you have 20, 50 or 100 GB of raw data - running a job 4 to 5 times faster - that really matters. On the other hand, the data can be compressed 50% or better - so losing twice more space for disk backup is not good at all. And the network is all mine (I have a dedicated management/provisioning network) and I can get as high bandwidth as I need - 100Mbps, 1000Mbps... That's why the server-side compression feature is needed! Notes: Item 23: Improve Bacula's tape and drive usage and cleaning management Date: 8 November 2005, November 11, 2005 Origin: Adam Thornton <athornton at sinenomine dot net>, Arno Lehmann <al at its-lehmann dot de> Status: What: Make Bacula manage tape life cycle information, tape reuse times and drive cleaning cycles. Why: All three parts of this project are important when operating backups. We need to know which tapes need replacement, and we need to make sure the drives are cleaned when necessary. While many tape libraries and even autoloaders can handle all this automatically, support by Bacula can be helpful for smaller (older) libraries and single drives. Limiting the number of times a tape is used might prevent tape errors when using tapes until the drives can't read it any more. Also, checking drive status during operation can prevent some failures (as I [Arno] had to learn the hard way...) Notes: First, Bacula could (and even does, to some limited extent) record tape and drive usage. For tapes, the number of mounts, the amount of data, and the time the tape has actually been running could be recorded. Data fields for Read and Write time and Number of mounts already exist in the catalog (I'm not sure if VolBytes is the sum of all bytes ever written to that volume by Bacula). This information can be important when determining which media to replace. The ability to mark Volumes as "used up" after a given number of write cycles should also be implemented so that a tape is never actually worn out. For the tape drives known to Bacula, similar information is interesting to determine the device status and expected life time: Time it's been Reading and Writing, number of tape Loads / Unloads / Errors. This information is not yet recorded as far as I [Arno] know. A new volume status would be necessary for the new state, like "Used up" or "Worn out". Volumes with this state could be used for restores, but not for writing. These volumes should be migrated first (assuming migration is implemented) and, once they are no longer needed, could be moved to a Trash pool. The next step would be to implement a drive cleaning setup. Bacula already has knowledge about cleaning tapes. Once it has some information about cleaning cycles (measured in drive run time, number of tapes used, or calender days, for example) it can automatically execute tape cleaning (with an autochanger, obviously) or ask for operator assistance loading a cleaning tape. The final step would be to implement TAPEALERT checks not only when changing tapes and only sending the information to the administrator, but rather checking after each tape error, checking on a regular basis (for example after each tape file), and also before unloading and after loading a new tape. Then, depending on the drives TAPEALERT state and the known drive cleaning state Bacula could automatically schedule later cleaning, clean immediately, or inform the operator. Implementing this would perhaps require another catalog change and perhaps major changes in SD code and the DIR-SD protocol, so I'd only consider this worth implementing if it would actually be used or even needed by many people. Implementation of these projects could happen in three distinct sub-projects: Measuring Tape and Drive usage, retiring volumes, and handling drive cleaning and TAPEALERTs. Item 24: Multiple threads in file daemon for the same job Date: 27 November 2005 Origin: Ove Risberg (Ove.Risberg at octocode dot com) Status: What: I want the file daemon to start multiple threads for a backup job so the fastest possible backup can be made. The file daemon could parse the FileSet information and start one thread for each File entry located on a separate filesystem. A confiuration option in the job section should be used to enable or disable this feature. The confgutration option could specify the maximum number of threads in the file daemon. If the theads could spool the data to separate spool files the restore process will not be much slower. Why: Multiple concurrent backups of a large fileserver with many disks and controllers will be much faster. Item 25: Archival (removal) of User Files to Tape Date: Nov. 24/2005 Origin: Ray Pengelly [ray at biomed dot queensu dot ca Status: What: The ability to archive data to storage based on certain parameters such as age, size, or location. Once the data has been written to storage and logged it is then pruned from the originating filesystem. Note! We are talking about user's files and not Bacula Volumes. Why: This would allow fully automatic storage management which becomes useful for large datastores. It would also allow for auto-staging from one media type to another. Example 1) Medical imaging needs to store large amounts of data. They decide to keep data on their servers for 6 months and then put it away for long term storage. The server then finds all files older than 6 months writes them to tape. The files are then removed from the server. Example 2) All data that hasn't been accessed in 2 months could be moved from high-cost, fibre-channel disk storage to a low-cost large-capacity SATA disk storage pool which doesn't have as quick of access time. Then after another 6 months (or possibly as one storage pool gets full) data is migrated to Tape. ========== Items on put hold by Kern ============================ Item h1: Split documentation Origin: Maxx <maxxatworkat gmail dot com> Date: 27th July 2006 Status: Approved, awaiting implementation What: Split documentation in several books Why: Bacula manual has now more than 600 pages, and looking for implementation details is getting complicated. I think it would be good to split the single volume in two or maybe three parts: 1) Introduction, requirements and tutorial, typically are useful only until first installation time 2) Basic installation and configuration, with all the gory details about the directives supported 3) Advanced Bacula: testing, troubleshooting, GUI and ancillary programs, security managements, scripting, etc. Notes: This is a project that needs to be done, and will be implemented, but it is really a developer issue of timing, and does not needed to be included in the voting. Item h2: Implement support for stacking arbitrary stream filters, sinks. Date: 23 November 2006 Origin: Landon Fuller <[EMAIL PROTECTED]> Status: Planning. Assigned to landonf. What: Implement support for the following: - Stacking arbitrary stream filters (eg, encryption, compression, sparse data handling)) - Attaching file sinks to terminate stream filters (ie, write out the resultant data to a file) - Refactor the restoration state machine accordingly Why: The existing stream implementation suffers from the following: - All state (compression, encryption, stream restoration), is global across the entire restore process, for all streams. There are multiple entry and exit points in the restoration state machine, and thus multiple places where state must be allocated, deallocated, initialized, or reinitialized. This results in exceptional complexity for the author of a stream filter. - The developer must enumerate all possible combinations of filters and stream types (ie, win32 data with encryption, without encryption, with encryption AND compression, etc). Notes: This feature request only covers implementing the stream filters/ sinks, and refactoring the file daemon's restoration implementation accordingly. If I have extra time, I will also rewrite the backup implementation. My intent in implementing the restoration first is to solve pressing bugs in the restoration handling, and to ensure that the new restore implementation handles existing backups correctly. I do not plan on changing the network or tape data structures to support defining arbitrary stream filters, but supporting that functionality is the ultimate goal. Assistance with either code or testing would be fantastic. Notes: Kern: this project has a lot of merit, and we need to do it, but it is really an issue for developers rather than a new feature for users, so I have removed it from the voting list, but kept it here, but at some point, it will be implemented. Item h3: Filesystem watch triggered backup. Date: 31 August 2006 Origin: Jesper Krogh <[EMAIL PROTECTED]> Status: What: With inotify and similar filesystem triggeret notification systems is it possible to have the file-daemon to monitor filesystem changes and initiate backup. Why: There are 2 situations where this is nice to have. 1) It is possible to get a much finer-grained backup than the fixed schedules used now.. A file created and deleted a few hours later, can automatically be caught. 2) The introduced load on the system will probably be distributed more even on the system. Notes: This can be combined with configration that specifies something like: "at most every 15 minutes or when changes consumed XX MB". Kern Notes: I would rather see this implemented by an external program that monitors the Filesystem changes, then uses the console Item h4: Directive/mode to backup only file changes, not entire file Date: 11 November 2005 Origin: Joshua Kugler <joshua dot kugler at uaf dot edu> Marek Bajon <mbajon at bimsplus dot com dot pl> Status: What: Currently when a file changes, the entire file will be backed up in the next incremental or full backup. To save space on the tapes it would be nice to have a mode whereby only the changes to the file would be backed up when it is changed. Why: This would save lots of space when backing up large files such as logs, mbox files, Outlook PST files and the like. Notes: This would require the usage of disk-based volumes as comparing files would not be feasible using a tape drive. Notes: Kern: I don't know how to implement this. Put on hold until someone provides a detailed implementation plan. Item h5: Implement multiple numeric backup levels as supported by dump Date: 3 April 2006 Origin: Daniel Rich <[EMAIL PROTECTED]> Status: What: Dump allows specification of backup levels numerically instead of just "full", "incr", and "diff". In this system, at any given level, all files are backed up that were were modified since the last backup of a higher level (with 0 being the highest and 9 being the lowest). A level 0 is therefore equivalent to a full, level 9 an incremental, and the levels 1 through 8 are varying levels of differentials. For bacula's sake, these could be represented as "full", "incr", and "diff1", "diff2", etc. Why: Support of multiple backup levels would provide for more advanced backup rotation schemes such as "Towers of Hanoi". This would allow better flexibility in performing backups, and can lead to shorter recover times. Notes: Legato Networker supports a similar system with full, incr, and 1-9 as levels. Notes: Kern: I don't see the utility of this, and it would be a *huge* modification to existing code. Item h6: Implement NDMP protocol support Origin: Alan Davis Date: 06 March 2007 Status: What: Network Data Management Protocol is implemented by a number of NAS filer vendors to enable backups using third-party software. Why: This would allow NAS filer backups in Bacula without incurring the overhead of NFS or SBM/CIFS. Notes: Further information is available: http://www.ndmp.org http://www.ndmp.org/wp/wp.shtml http://www.traakan.com/ndmjob/index.html There are currently no viable open-source NDMP implementations. There is a reference SDK and example app available from ndmp.org but it has problems compiling on recent Linux and Solaris OS'. The ndmjob reference implementation from Traakan is known to compile on Solaris 10. Notes: Kern: I am not at all in favor of this until NDMP becomes an Open Standard or until there are Open Source libraries that interface to it. Item h7: Commercial database support Origin: Russell Howe <russell_howe dot wreckage dot org> Date: 26 July 2006 Status: What: It would be nice for the database backend to support more databases. I'm thinking of SQL Server at the moment, but I guess Oracle, DB2, MaxDB, etc are all candidates. SQL Server would presumably be implemented using FreeTDS or maybe an ODBC library? Why: We only really have one database server, which is MS SQL Server 2000. Maintaining a second one for the backup software (we grew out of SQLite, which I liked, but which didn't work so well with our database size). We don't really have a machine with the resources to run postgres, and would rather only maintain a single DBMS. We're stuck with SQL Server because pretty much all the company's custom applications (written by consultants) are locked into SQL Server 2000. I can imagine this scenario is fairly common, and it would be nice to use the existing properly specced database server for storing Bacula's catalog, rather than having to run a second DBMS. Notes: This might be nice, but someone other than me will probably need to implement it, and at the moment, proprietary code cannot legally be mixed with Bacula GPLed code. This would be possible only providing the vendors provide GPLed (or OpenSource) interface code. Item h8: Incorporation of XACML2/SAML2 parsing Date: 19 January 2006 Origin: Adam Thornton <[EMAIL PROTECTED]> Status: Blue sky What: XACML is "eXtensible Access Control Markup Language" and "SAML is the "Security Assertion Markup Language"--an XML standard for making statements about identity and authorization. Having these would give us a framework to approach ACLs in a generic manner, and in a way flexible enough to support the four major sorts of ACLs I see as a concern to Bacula at this point, as well as (probably) to deal with new sorts of ACLs that may appear in the future. Why: Bacula is beginning to need to back up systems with ACLs that do not map cleanly onto traditional Unix permissions. I see four sets of ACLs--in general, mutually incompatible with one another--that we're going to need to deal with. These are: NTFS ACLs, POSIX ACLs, NFSv4 ACLS, and AFS ACLS. (Some may question the relevance of AFS; AFS is one of Sine Nomine's core consulting businesses, and having a reputable file-level backup and restore technology for it (as Tivoli is probably going to drop AFS support soon since IBM no longer supports AFS) would be of huge benefit to our customers; we'd most likely create the AFS support at Sine Nomine for inclusion into the Bacula (and perhaps some changes to the OpenAFS volserver) core code.) Now, obviously, Bacula already handles NTFS just fine. However, I think there's a lot of value in implementing a generic ACL model, so that it's easy to support whatever particular instances of ACLs come down the pike: POSIX ACLS (think SELinux) and NFSv4 are the obvious things arriving in the Linux world in a big way in the near future. XACML, although overcomplicated for our needs, provides this framework, and we should be able to leverage other people's implementations to minimize the amount of work *we* have to do to get a generic ACL framework. Basically, the costs of implementation are high, but they're largely both external to Bacula and already sunk. Notes: As you indicate this is a bit of "blue sky" or in other words, at the moment, it is a bit esoteric to consider for Bacula. Item h9: Archive data Date: 15/5/2006 Origin: calvin streeting calvin at absentdream dot com Status: What: The abilty to archive to media (dvd/cd) in a uncompressed format for dead filing (archiving not backing up) Why: At work when jobs are finished and moved off of the main file servers (raid based systems) onto a simple Linux file server (ide based system) so users can find old information without contacting the IT dept. So this data dosn't realy change it only gets added to, But it also needs backing up. At the moment it takes about 8 hours to back up our servers (working data) so rather than add more time to existing backups i am trying to implement a system where we backup the acrhive data to cd/dvd these disks would only need to be appended to (burn only new/changed files to new disks for off site storage). basialy understand the differnce between achive data and live data. Notes: Scan the data and email me when it needs burning divide into predefined chunks keep a recored of what is on what disk make me a label (simple php->mysql=>pdf stuff) i could do this bit ability to save data uncompresed so it can be read in any other system (future proof data) save the catalog with the disk as some kind of menu system Notes: Kern: I don't understand this item, and in any case, if it is specific to DVD/CDs, which we do not recommend using, it is unlikely to be implemented except as a user submitted patch. Item h10: Clustered file-daemons Origin: Alan Brown ajb2 at mssl dot ucl dot ac dot uk Date: 24 July 2006 Status: What: A "virtual" filedaemon, which is actually a cluster of real ones. Why: In the case of clustered filesystems (SAN setups, GFS, or OCFS2, etc) multiple machines may have access to the same set of filesystems For performance reasons, one may wish to initate backups from several of these machines simultaneously, instead of just using one backup source for the common clustered filesystem. For obvious reasons, normally backups of $A-FD/$PATH and B-FD/$PATH are treated as different backup sets. In this case they are the same communal set. Likewise when restoring, it would be easier to just specify one of the cluster machines and let bacula decide which to use. This can be faked to some extent using DNS round robin entries and a virtual IP address, however it means "status client" will always give bogus answers. Additionally there is no way of spreading the load evenly among the servers. What is required is something similar to the storage daemon autochanger directives, so that Bacula can keep track of operating backups/restores and direct new jobs to a "free" client. Notes: Kern: I don't understand the request enough to be able to implement it. A lot more design detail should be presented before voting on this project. ========== Already implemented ================================ Item n: make changing "spooldata=yes|no" possible for manual/interactive jobs Origin: Marc Schiffbauer <[EMAIL PROTECTED]> Date: 12 April 2007) Status: Already implemented by Eric What: Make it possible to modify the spooldata option for a job when being run from within the console. Currently it is possible to modify the backup level and the spooldata setting in a Schedule resource. It is also possible to modify the backup level when using the "run" command in the console. But it is currently not possible to to the same with "spooldata=yes|no" like: run job=MyJob level=incremental spooldata=yes Why: In some situations it would be handy to be able to switch spooldata on or off for interactive/manual jobs based on which data the admin expects or how fast the LAN/WAN connection currently is. Notes: ./. ============= Empty Feature Request form =========== Item n: One line summary ... Date: Date submitted Origin: Name and email of originator. Status: What: More detailed explanation ... Why: Why it is important ... Notes: Additional notes or features (omit if not used) ============== End Feature Request form ============== ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users