On 25/03/2010 11:36 a.m., James Harper wrote: >> >> We've used Legato networker before (we still do, as we're not yet >> successfully completed the migration; and it's looking more and more >> grim prospect by the day), and on approximately the same dataset (of >> about 500 million records spread over 100 servers) and somewhat >> weaker hardware, it would allow user to start selecting files to >> restore in matter of *seconds* (and it was using it's simple db6 >> files, no server/database tuning required at all) >> >> Now with bacula 5.0.1, we have to wait several *hours* before we can >> start selecting files to restore, and it is considered "normal" ?! >> > > I've always thought that Bacula could do this a bit better. If you are > selecting files rather than restoring everything then the chances are > you only want a small subset of all files (always exceptions of course), > so why read the whole tree in at once? Why not read it in as required, > or read it in 'layer by layer' in the background so the user can start > selecting files immediately. > > Complexity is probably the reason why it hasn't been done, but it would > be an interesting project. Actually it's not that hard (I've done it), but some of the queries can be quite slow, particularly the one to find all the subdirectories of a given chosen directory. I'm working on our own internal Web GUI for doing restores (ExtJS with a tree-based view of the filesystem for selecting files); I've found that you can either: a) Do it with low memory usage (not building a tree and doing ad-hoc queries as you go, recursing through the selected directories), but it'll be quite slow, or b) Use memory and pre-build time to build the directory tree in memory, then relatively quickly select.
(a) is better for small restores, (b) is better for restores of more than a few hundred files; if your database is grinding to a halt building the tree, it's gonna truly suck doing lots of small queries of large datasets required for (a). In the end I'm going to give the users a choice of which method, so they can make a human decision. It's hard to make that choice in code, because when the user has just selected a top level directory, the code doesn't know how deep the tree below that is. Could be 10 files, could be 10 million :) And for anyone interested: Doing (b) in php is not a good idea. 160byte overhead just to create an empty object, another 60+bytes per stored integer . Blech. Perl is not amazingly better, but can be wrangled down to more tolerable memory sizes with some trickery. The tree data storage really needs to be done in something language/mechanism that actually only uses 4 bytes to store a 32-bit integer :) Craig Miskell ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users