I got a weird problem. This is a shot in the dark, but I'm going to ask all the kind-folks on the Adsm mailing list a weird question. Maybe someone out there will have some ideas.
Please bare in mind that we do have a PMR open with TSM support, a service call with IBM Hardware CE's, and with Brocade. It appears that we've tracked the problem to the 3584's. but not positive about that yet. About every 2-4 days we have to power cycle all the drives, or at least a good subset of them, in our 3584 libraries. Both of the 3584's are four frames each, and each has 27 LTO2 drives. The tape SAN consists of two Brocade M14's. Each drive is plugged into the M14 directly. (Not into another edge switch). We thought that there might be a bad connection between the two M14's, but we disabled the system boards that were giving us some issues in both last week. Then today the issue happened again. The TSM Server (library managers) that run the two libraries are both at v5.2.4, on AIX 5.2 ML4. There are about 15 TSM library clients that talk the respective library managers, as well as somewhere around 50 storage agents as well. All are current TSM levels. We are at the latest firmware on all the drives and the libraries as well now. We are at Atape 8.4.9.0 . Today, when the problem was happening we also (using the 3584's web interface, so TSM was not involved) wouldn't eject tapes from the libraries to empty slots from the tape drives in frame four on one of them. It told me that there weren't any empty slots to put the tape into. But I could move the tape from that drive, to drive 1 in frame 1, and then it would eject it to an empty slot like normal. It did this with five tapes. The two things we have done to temp. fix the issue has been: 1. Restarting the TSM library Managers. 2A. Power cycling the drives. Sometimes just the two drives that are the control paths into the library, 2B. and sometimes it appears to need every drive power cycled. Since I was having trouble with ejecting tapes earlier and TSM was not involved in that scenario, I am inclined to think that TSM really isn't part of the root problem, but that it some how gets confused and needs to be restarted at times in order to, shall we say, clear its head. Because it seems to affect the library even when not talking to TSM or AIX, I don't think upgrading the Atape driver to whatever 9.X.X.X version is out there would fix the problem. But I'm open to arguments that say I'm full of it there. Does anybody out there have any ideas? Thank you, David N. Reiss Unix/TSM System Engineer Caterpillar, Inc. (309)/494-3749