Thanks for the feedback Nick and Zoltan, I have been seeing periodic kernel panics when I used LIO. It was either due to LIO or the kernel rbd mapping. I have seen this on Ubuntu precise with kernel 3.14.14 and again in Ubunty trusty with the utopic kernel (currently 3.16.0-28). Ironically, this is the primary reason I started exploring a redundancy solution for my iSCSI proxy node. So, yes, these crashes have nothing to do with running the Active/Active setup.
I am moving my entire setup from LIO to rbd enabled tgt, which I've found to be much more stable and gives equivalent performance. I've been testing active/active LIO since July of 2014 with VMWare and I've never seen any vmfs corruption. I am now convinced (thanks Nick) that it is possible. The reason I have not seen any corruption may have to do with how VMWare happens to be configured. Originally, I had made a point to use round robin path selection in the VMware hosts; but as I did performance testing, I found that it actually didn't help performance. When the host switches iSCSI targets there is a short "spin up time" for LIO to get to 100% IO capability. Since round robin switches targets every 30 seconds (60 seconds? I forget), this seemed to be significant. A secondary goal for me was to end up with a config that required minimal tuning from VMWare and the target software; so the obvious choice is to leave VMWare's path selection at the default which is Fixed and picks the first target in ASCII-betical order. That means I am actually functioning in Active/Passive mode. Jake On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy < zol...@linux.vnet.ibm.com> wrote: > Just to chime in: it will look fine, feel fine, but underneath it's quite > easy to get VMFS corruption. Happened in our tests. > Also if you're running LIO, from time to time expect a kernel panic > (haven't tried with the latest upstream, as I've been using > Ubuntu 14.04 on my "export" hosts for the test, so might have improved...). > > As of now I would not recommend this setup without being aware of the > risks involved. > > There have been a few upstream patches getting the LIO code in better > cluster-aware shape, but no idea if they have been merged > yet. I know RedHat has a guy on this. > > On 01/21/2015 02:40 PM, Nick Fisk wrote: > > Hi Jake, > > > > Thanks for this, I have been going through this and have a pretty good > idea on what you are doing now, however I maybe missing something looking > through your scripts, but I’m still not quite understanding how you are > managing to make sure locking is happening with the ESXi ATS SCSI command. > > > > From this slide > > > > > http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY > (Page 8) > > > > It seems to indicate that for a true active/active setup the two targets > need to be aware of each other and exchange locking information for it to > work reliably, I’ve also watched the video from the Ceph developer summit > where this is discussed and it seems that Ceph+Kernel need changes to allow > this locking to be pushed back to the RBD layer so it can be shared, from > what I can see browsing through the Linux Git Repo, these patches haven’t > made the mainline kernel yet. > > > > Can you shed any light on this? As tempting as having active/active is, > I’m wary about using the configuration until I understand how the locking > is working and if fringe cases involving multiple ESXi hosts writing to the > same LUN on different targets could spell disaster. > > > > Many thanks, > > Nick > > > > *From:* Jake Young [mailto:jak3...@gmail.com <jak3...@gmail.com>] > *Sent:* 14 January 2015 16:54 > > *To:* Nick Fisk > *Cc:* Giuseppe Civitella; ceph-users > *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone? > > > > Yes, it's active/active and I found that VMWare can switch from path to > path with no issues or service impact. > > > > > > I posted some config files here: github.com/jak3kaj/misc > <http://xo4t.mjt.lu/link/xo4t/gzyhtx3/2/_P2HWj3RxQZC1v5DQ_206Q/aHR0cDovL2dpdGh1Yi5jb20vamFrM2thai9taXNj> > > > > One set is from my LIO nodes, both the primary and secondary configs so > you can see what I needed to make unique. The other set (targets.conf) are > from my tgt nodes. They are both 4 LUN configs. > > > > Like I said in my previous email, there is no performance difference > between LIO and tgt. The only service I'm running on these nodes is a > single iscsi target instance (either LIO or tgt). > > > > Jake > > > > On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk <n...@fisk.me.uk> wrote: > > Hi Jake, > > > > I can’t remember the exact details, but it was something to do with a > potential problem when using the pacemaker resource agents. I think it was > to do with a potential hanging issue when one LUN on a shared target failed > and then it tried to kill all the other LUNS to fail the target over to > another host. This then leaves the TCM part of LIO locking the RBD which > also can’t fail over. > > > > That said I did try multiple LUNS on one target as a test and didn’t > experience any problems. > > > > I’m interested in the way you have your setup configured though. Are you > saying you effectively have an active/active configuration with a path > going to either host, or are you failing the iSCSI IP between hosts? If > it’s the former, have you had any problems with scsi > locking/reservations…etc between the two targets? > > > > I can see the advantage to that configuration as you reduce/eliminate a > lot of the troubles I have had with resources failing over. > > > > Nick > > > > *From:* Jake Young [mailto:jak3...@gmail.com] > *Sent:* 14 January 2015 12:50 > *To:* Nick Fisk > *Cc:* Giuseppe Civitella; ceph-users > *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone? > > > > Nick, > > > > Where did you read that having more than 1 LUN per target causes stability > problems? > > > > I am running 4 LUNs per target. > > > > For HA I'm running two linux iscsi target servers that map the same 4 rbd > images. The two targets have the same serial numbers, T10 address, etc. I > copy the primary's config to the backup and change IPs. This way VMWare > thinks they are different target IPs on the same host. This has worked very > well for me. > > > > One suggestion I have is to try using rbd enabled tgt. The performance is > equivalent to LIO, but I found it is much better at recovering from a > cluster outage. I've had LIO lock up the kernel or simply not recognize > that the rbd images are available; where tgt will eventually present the > rbd images again. > > > > I have been slowly adding servers and am expanding my test setup to a > production setup (nice thing about ceph). I now have 6 OSD hosts with 7 > disks on each. I'm using the LSI Nytro cache raid controller, so I don't > have a separate journal and have 40Gb networking. I plan to add another 6 > OSD hosts in another rack in the next 6 months (and then another 6 next > year). I'm doing 3x replication, so I want to end up with 3 racks. > > > > Jake > > On Wednesday, January 14, 2015, Nick Fisk <n...@fisk.me.uk> wrote: > > Hi Giuseppe, > > > > I am working on something very similar at the moment. I currently have it > working on some test hardware but seems to be working reasonably well. > > > > I say reasonably as I have had a few instability’s but these are on the HA > side, the LIO and RBD side of things have been rock solid so far. The main > problems I have had seem to be around recovering from failure with > resources ending up in a unmanaged state. I’m not currently using fencing > so this may be part of the cause. > > > > As a brief description of my configuration. > > > > 4 Hosts each having 2 OSD’s also running the monitor role > > 3 additional host in a HA cluster which act as iSCSI proxy nodes. > > > > I’m using the IP, RBD, iSCSITarget and iSCSILUN resource agents to provide > HA iSCSI LUN which maps back to a RBD. All the agents for each RBD are in a > group so they follow each other between hosts. > > > > I’m using 1 LUN per target as I read somewhere there are stability > problems using more than 1 LUN per target. > > > > Performance seems ok, I can get about 1.2k random IO’s out the iSCSI LUN. > These seems to be about right for the Ceph cluster size, so I don’t think > the LIO part is causing any significant overhead. > > > > We should be getting our production hardware shortly which wil have 40 > OSD’s with journals and a SSD caching tier, so within the next month or so > I will have a better idea of running it in a production environment and the > performance of the system. > > > > Hope that helps, if you have any questions, please let me know. > > > > Nick > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com > <ceph-users-boun...@lists.ceph.com>] *On Behalf Of *Giuseppe Civitella > *Sent:* 13 January 2015 11:23 > *To:* ceph-users > *Subject:* [ceph-users] Ceph, LIO, VMWARE anyone? > > > > Hi all, > > > > I'm working on a lab setup regarding Ceph serving rbd images as ISCSI > datastores to VMWARE via a LIO box. Is there someone that already did > something similar wanting to share some knowledge? Any production > deployments? What about LIO's HA and luns' performances? > > > > Thanks > > Giuseppe > > > > > > > > _______________________________________________ > ceph-users mailing > listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com