On Tue, May 22, 2012 at 6:24 PM, Simon Matthews <[email protected]> wrote: > Some googling on Berkeley DB shows that it is safe for concurrent access by > different users, so it should be safe to run db_checkpoint without shutting > down the qmaster.
I just quickly scanned the Berkeley DB spooling code - if you are not using the BDB RPC server then Grid Engine (at least all versions of Grid Engine distributed by the Open Grid Scheduler project) should be able to handle checkpointing & archiving within qmaster - ie. without using external commands. It's not an issue to use external BDB commands to clear the transaction logs (BDB is designed to have external transaction log cleanup commands), but I was wondering if you at some point in time were using BDB RPC spooling?? Rayson > > Simon > > >> >> >> Rayson >> >> >> >> On Fri, May 18, 2012 at 2:39 PM, Simon Matthews >> <[email protected]> wrote: >> > Thanks for pointing this out to me >> > >> > The documentation says that it should be used every minute if the >> > configuration uses a BDB server. I don't use a BDB server, but the >> > storage >> > method I use is BDB (not flat files). If I should use this checkppoint >> > script, how often should I run it, and should I shut down the qmaster to >> > run >> > it? >> > >> >> >> >> >> >> Rayson >> >> >> >> >> >> >> >> On Fri, May 18, 2012 at 1:17 PM, Simon Matthews >> >> <[email protected]> wrote: >> >> > After SGE was killed by the OOM killed, the file (a berkely db file) >> >> > in >> >> > my >> >> > cluster was 1.4GB. I did a db_dump and db_load, on this file, >> >> > resulting >> >> > in a >> >> > much smaller file. >> >> > >> >> > However, this then raised the question -- how is this file >> >> > maintained? >> >> > Presumably, it holds the information on jobs in all states (queued, >> >> > running >> >> > and finished). How do the finished jobs get removed from this file? >> >> > Obviously, I don't want the file to grow without limit. >> >> > >> >> > We are now putting about 50k jobs into our small cluster every day >> >> > (many >> >> > finish running in a fraction of a second). >> >> > >> >> > Simon >> >> > >> >> > _______________________________________________ >> >> > users mailing list >> >> > [email protected] >> >> > https://gridengine.org/mailman/listinfo/users >> >> > >> > >> > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
