Dear all,

Would appreciate any guidance on this situation:

* Version GE2011p1 running on RedHat6 server whose Hardisk reaches 100% ...
system stays up, but qsub starts to fail "cannot connect to Berkeley
database" is the error report.
* space released on hardisk, but qsub still fails. sge_qmaster still
running. qconf fails.
* Decide to restart services: sgeexecd softstopped and sgemaster stopped,
then started: fails to come up. "messages" in $SGE_ROOT/$SGE_CELL/spool/qmaster
says:

main|frontend0|E|couldn't open berkeley database "sge": (22) Invalid
argument
main|frontend0|E|startup of rule "default rule" in context "berkeleydb
spooling" failed
main|frontend0|C|setup failed

* Decide to repair database according to this post

At first db_verify gave

db_verify: Page 21: invalid next_pgno 25
db_verify: sge: DB_VERIFY_BAD: Database verification failed

(report adheres to idea that database could not expand due to lack of
space, and nextpage ptr is out of sync). Then follow procedure in this post:

https://arc.liv.ac.uk/pipermail/gridengine-users/2008-October/020911.html

however, new "sge" bdb very small ... empty except for some headers. Still,
it passes db_verify fine.

* sgemaster still fails to come up. "messages" in
$SGE_ROOT/$SGE_CELL/spool/qmaster
now says:

main|frontend0|W|local configuration frontend0 not defined - using global
configuration
main|frontend0|E|global configuration not defined
main|frontend0|C|setup failed

* Seems to exonerate the database, but I'm not so sure ... database repair
was not "satisfying"
* How to get global configuration? WIth qconf, right? Yes, but it fails of
course, sge_qmaster is not up.

sgemaster does not stay up ... in fact sge_qmaster binary completes and
returns $?=0 very quickly. Leaves no processes on system at all. Unusual.

* current lines of inquiry:
0. BDB repaired, but GE2011 somehow retains some state of the corrupt
databse.
1. Install a new Gridengine, not before trying this on another server.
Beware clobbering current GE2011.
2. Access corrupt database manually, through api perhaps.Just to gain more
knowledge.

Many thanks for reading.

Cheers / Ramon.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to