Hi,

I have a 2-Node OCFS2 Cluster on top of DRBD 8.0.4. 
The kernel version I use is:

uname -a
Linux webhost1 2.6.18-028stab039 #2 SMP Tue Aug 21 17:49:05 UTC 2007 i686 
GNU/Linux

Both nodes are in the same bladecenter an directly connected with 1Gbit/s by 
the baldecenters internal ethernet switch.

One of the nodes stops working at least once a day with the following messages:

Nov 23 19:05:02 webhost2 kernel: (4424,3):o2net_sendpage:827 ERROR: sendpage of 
size 24 to node webhost1 (num 0) at 10.2.0.70:7777 failed with 4294967264
Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_send_remote_convert_request:395 
ERROR: status = -107
Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_send_remote_convert_request:395 
ERROR: status = -107
Nov 23 19:05:02 webhost2 kernel: (4997,2):dlm_wait_for_node_death:374 
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of 
node 0
Nov 23 19:05:02 webhost2 kernel: (6774,0):dlm_wait_for_node_death:374 
225202289F954729807AACECEBB2D2AC: waiting 5000ms for notification of death of 
node 0


After that the node hangs and even does not reboot although 
/proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops are set to 1.

Can anybody please help me to understand the error messages and make that node 
more stable?


Thanks,
- Rainer




      
____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  
http://overview.mail.yahoo.com/

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to