On 04/30/2012 08:37 PM, Tim Serong wrote: > On 05/01/2012 08:04 AM, Seth Galitzer wrote: >> This was a bit trickier to get worked out, but I have made some >> progress. It turns out just putting the metadata on a shared disk >> resource and symlinking wasn't quite enough. nmbd (the netbios >> management daemon that samba uses) complained that the symlink to its >> working directory wasn't a real directory. On top of that, you can >> specify the path for the nmbd working dir, but only at compile time, not >> at run time. To work around this, I added a bind mount for that dir >> (/var/run/samba for debian/ubuntu) and now samba will start. It will >> even fail over if I put the primary into standby. So there's the progress. >> >> However, a client still can't reconnect to the share once the node has >> failed over until I rerun "net ads join" on the secondary (new primary). >> I've been running the join command using the dns name for the floating >> IP, but maybe that's not good enough. I'll look more deeply into net >> tomorrow, and see if I can specify the IP, too. > > Have you got "/var/lib/samba" on shared storage (or linked to, or > "private dir" in smb.conf set to some directory on shared storage)? > IIRC when you do "net ads join", various secrets and whatnot are saved > somewhere in that directory. If that's not persistent across failover, > it'd explain what you're seeing.
The following dirs are all on shared storage: /var/cache/samba /var/lib/samba /var/log/samba /var/run/samba The last is a bind mount, the rest are symlinks. Turns out that in debian, /var/run is a symlink to /run. In my fs resource for the bind mount, I indicated /var/run/samba as the target, but for some reason, the system mounted it at /run/samba instead. This meant that when I tried to failover the resource, it wouldn't unmount and silently fail. I changed the resource to use /run/samba as the target and now it fails over smoothly. Not sure who to blame for this behavior, but I've at least got it working now. > >> >> The other new oddity is that after I've put the primary into standby and >> everything has failed over to the secondary, as soon as I bring the >> primary back online, the resources try to switch back, i.e. they don't >> stay on the secondary (new primary) as expected. Granted, if I setup >> STONITH, this shouldn't be an immediate problem, but it still will be >> when I go to bring the node back online. I believe this is only the >> case with the samba resource enabled, but I'll test this more tomorrow >> to make sure. > > Do you have any constraints that make the resources prefer one node? > Also look at resource stickiness. Thanks for the tip. I set the stickiness on the LVM+fs+samba+exportfs group to 100 and that seems to have done the trick. > >> >> I'm starting to wonder if samba is practical for failover or not. I >> don't really have much choice about using it. Because of my mixed >> environment, I need to be able to export nfs and samba shares from this >> server. Manual failover is better than what I have now, which is no >> redundancy at all. At least I'd be able to get my users back up more >> quickly on the cloned node. It just won't be as smooth as I'd like with >> automated failover. It still seems like it should be doable, I just >> haven't found the proper incantation just yet. >> >> Any further advice is welcome. > > It is (or should be) ultimately possible. I have actually done it > before, just not for rather a while, which is why I'm being a bit vague > (sorry!) > > Regards, > > Tim > > Thanks for the help. I'm still plugging away at it. Seth -- Seth Galitzer Systems Coordinator Computing and Information Sciences Kansas State University http://www.cis.ksu.edu/~sgsax [email protected] 785-532-7790 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
