Dear all, Would appreciate any guidance on this situation:
* Version GE2011p1 running on RedHat6 server whose Hardisk reaches 100% ... system stays up, but qsub starts to fail "cannot connect to Berkeley database" is the error report. * space released on hardisk, but qsub still fails. sge_qmaster still running. qconf fails. * Decide to restart services: sgeexecd softstopped and sgemaster stopped, then started: fails to come up. "messages" in $SGE_ROOT/$SGE_CELL/spool/qmaster says: main|frontend0|E|couldn't open berkeley database "sge": (22) Invalid argument main|frontend0|E|startup of rule "default rule" in context "berkeleydb spooling" failed main|frontend0|C|setup failed * Decide to repair database according to this post At first db_verify gave db_verify: Page 21: invalid next_pgno 25 db_verify: sge: DB_VERIFY_BAD: Database verification failed (report adheres to idea that database could not expand due to lack of space, and nextpage ptr is out of sync). Then follow procedure in this post: https://arc.liv.ac.uk/pipermail/gridengine-users/2008-October/020911.html however, new "sge" bdb very small ... empty except for some headers. Still, it passes db_verify fine. * sgemaster still fails to come up. "messages" in $SGE_ROOT/$SGE_CELL/spool/qmaster now says: main|frontend0|W|local configuration frontend0 not defined - using global configuration main|frontend0|E|global configuration not defined main|frontend0|C|setup failed * Seems to exonerate the database, but I'm not so sure ... database repair was not "satisfying" * How to get global configuration? WIth qconf, right? Yes, but it fails of course, sge_qmaster is not up. sgemaster does not stay up ... in fact sge_qmaster binary completes and returns $?=0 very quickly. Leaves no processes on system at all. Unusual. * current lines of inquiry: 0. BDB repaired, but GE2011 somehow retains some state of the corrupt databse. 1. Install a new Gridengine, not before trying this on another server. Beware clobbering current GE2011. 2. Access corrupt database manually, through api perhaps.Just to gain more knowledge. Many thanks for reading. Cheers / Ramon.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users