How many times have you failed over in those 2 years? It always works great if you never failover :-)
______________________________ John Monahan Senior Consultant Enterprise Solutions Computech Resources, Inc. Office: 952-833-0930 ext 109 Cell: 952-221-6938 http://www.computechresources.com "Bell, Charles (Chip)" <[EMAIL PROTECTED] To .COM> [email protected] Sent by: "ADSM: cc Dist Stor Manager" Subject <[EMAIL PROTECTED] Re: Authentication problems :^( .EDU> 12/22/2005 12:21 PM Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED] .EDU> Well, I figure they were set up properly, because they have been working properly before this most recent failover. By the way... TSM server, AIX 5.2, TSM v5.3.1.2 TSM client v5.3.0, W2K I mean, if it was not done right from the start, why would it have worked this long (2+ years)? -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of John Monahan Sent: Thursday, December 22, 2005 12:12 PM To: [email protected] Subject: Re: [ADSM-L] Authentication problems :^( "ADSM: Dist Stor Manager" <[email protected]> wrote on 12/22/2005 11:53:11 AM: > In a MSCS cluster, an admin of one of our higher profile client machines > failed over from one machine (OLALPHA) back to other (OLBRAVO) after > BRAVO crashed this morning. > > > > Since I've been having a devil of a time with a MSCS cluster resource > that serves as the scheduler for the cluster drive on BRAVO not coming > up. To begin with, it posted ANS1835E, ANS1025E, ANS1570E, all of which > point to authentication problems. I updated the node password, issued a > 'q ses -optfile...', and it would authenticate fine. When I try to bring > the cluster resource back online, it stays up from a few seconds, fails, > and when I check the registry, the passwords has disappeared! What in > the world? It has also posted ANS1029E and ANS2050E since I've been > playing around trying to get the cluster resource to work, and also the > base client (to back up C/D/system state) has been issuing ANS1977E with > the "ccCreateTimerFile: Unable to create timer file" and "errno=13 > error: Permission denied". > It sounds like the services weren't setup properly from the start or the service password somehow got out of sync. When setting up the services in the cluster, it is very important to fully set them up on each node of the cluster and be sure they are working BEFORE setting up the service in the cluster manager. I think your only solution is to remove the service from the cluster configuration, then remove/resetup the services on one node, restart the service several times and make sure it works OK. Then failover to the other node and repeat. Once you are sure both work, add the service back in to the cluster, make sure you get the right registry key setup to replicate during failover. Fail back and forth a couple times to make sure all is working properly. The big drawback here is that you will need to do this during downtime when you can failover nodes quite a few times. That is why it is so important to ensure it is done right from the start. Every time I have seen the disappearing password in a cluster it was because the services weren't setup right initially or fully before configuring them in the cluster. In one rare case, special characters in the node password also caused a problem and the password wouldn't replicate properly. For this reason I always use only letters or numbers in cluster node passwords (no underscores, dashes, etc.). ----------------------------------------- Confidentiality Notice: The information contained in this email message is privileged and confidential information and intended only for the use of the individual or entity named in the address. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this information is strictly prohibited. If you received this information in error, please notify the sender and delete this information from your computer and retain no copies of any of this information.
