Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Jake Young Fri, 23 Jan 2015 10:25:20 -0800

Thanks for the feedback Nick and Zoltan,

I have been seeing periodic kernel panics when I used LIO.  It was either
due to LIO or the kernel rbd mapping.  I have seen this on Ubuntu precise
with kernel 3.14.14 and again in Ubunty trusty with the utopic kernel
(currently 3.16.0-28).  Ironically, this is the primary reason I started
exploring a redundancy solution for my iSCSI proxy node.  So, yes, these
crashes have nothing to do with running the Active/Active setup.


I am moving my entire setup from LIO to rbd enabled tgt, which I've found
to be much more stable and gives equivalent performance.

I've been testing active/active LIO since July of 2014 with VMWare and I've
never seen any vmfs corruption.  I am now convinced (thanks Nick) that it
is possible.  The reason I have not seen any corruption may have to do with
how VMWare happens to be configured.

Originally, I had made a point to use round robin path selection in the
VMware hosts; but as I did performance testing, I found that it actually
didn't help performance.  When the host switches iSCSI targets there is a
short "spin up time" for LIO to get to 100% IO capability.  Since round
robin switches targets every 30 seconds (60 seconds? I forget), this seemed
to be significant.  A secondary goal for me was to end up with a config
that required minimal tuning from VMWare and the target software; so the
obvious choice is to leave VMWare's path selection at the default which is
Fixed and picks the first target in ASCII-betical order.  That means I am
actually functioning in Active/Passive mode.

Jake




On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy <
zol...@linux.vnet.ibm.com> wrote:

>  Just to chime in: it will look fine, feel fine, but underneath it's quite
> easy to get VMFS corruption. Happened in our tests.
> Also if you're running LIO, from time to time expect a kernel panic
> (haven't tried with the latest upstream, as I've been using
> Ubuntu 14.04 on my "export" hosts for the test, so might have improved...).
>
> As of now I would not recommend this setup without being aware of the
> risks involved.
>
> There have been a few upstream patches getting the LIO code in better
> cluster-aware shape, but no idea if they have been merged
> yet. I know RedHat has a guy on this.
>
> On 01/21/2015 02:40 PM, Nick Fisk wrote:
>
>  Hi Jake,
>
>
>
> Thanks for this, I have been going through this and have a pretty good
> idea on what you are doing now, however I maybe missing something looking
> through your scripts, but I’m still not quite understanding how you are
> managing to make sure locking is happening with the ESXi ATS SCSI command.
>
>
>
> From this slide
>
>
>
>
> http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY
> (Page 8)
>
>
>
> It seems to indicate that for a true active/active setup the two targets
> need to be aware of each other and exchange locking information for it to
> work reliably, I’ve also watched the video from the Ceph developer summit
> where this is discussed and it seems that Ceph+Kernel need changes to allow
> this locking to be pushed back to the RBD layer so it can be shared, from
> what I can see browsing through the Linux Git Repo, these patches haven’t
> made the mainline kernel yet.
>
>
>
> Can you shed any light on this? As tempting as having active/active is,
> I’m wary about using the configuration until I understand how the locking
> is working and if fringe cases involving multiple ESXi hosts writing to the
> same LUN on different targets could spell disaster.
>
>
>
> Many thanks,
>
> Nick
>
>
>
> *From:* Jake Young [mailto:jak3...@gmail.com <jak3...@gmail.com>]
> *Sent:* 14 January 2015 16:54
>
> *To:* Nick Fisk
> *Cc:* Giuseppe Civitella; ceph-users
> *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Yes, it's active/active and I found that VMWare can switch from path to
> path with no issues or service impact.
>
>
>
>
>
> I posted some config files here: github.com/jak3kaj/misc
> <http://xo4t.mjt.lu/link/xo4t/gzyhtx3/2/_P2HWj3RxQZC1v5DQ_206Q/aHR0cDovL2dpdGh1Yi5jb20vamFrM2thai9taXNj>
>
>
>
> One set is from my LIO nodes, both the primary and secondary configs so
> you can see what I needed to make unique.  The other set (targets.conf) are
> from my tgt nodes.  They are both 4 LUN configs.
>
>
>
> Like I said in my previous email, there is no performance difference
> between LIO and tgt.  The only service I'm running on these nodes is a
> single iscsi target instance (either LIO or tgt).
>
>
>
> Jake
>
>
>
> On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>
>  Hi Jake,
>
>
>
> I can’t remember the exact details, but it was something to do with a
> potential problem when using the pacemaker resource agents. I think it was
> to do with a potential hanging issue when one LUN on a shared target failed
> and then it tried to kill all the other LUNS to fail the target over to
> another host. This then leaves the TCM part of LIO locking the RBD which
> also can’t fail over.
>
>
>
> That said I did try multiple LUNS on one target as a test and didn’t
> experience any problems.
>
>
>
> I’m interested in the way you have your setup configured though. Are you
> saying you effectively have an active/active configuration with a path
> going to either host, or are you failing the iSCSI IP between hosts? If
> it’s the former, have you had any problems with scsi
> locking/reservations…etc between the two targets?
>
>
>
> I can see the advantage to that configuration as you reduce/eliminate a
> lot of the troubles I have had with resources failing over.
>
>
>
> Nick
>
>
>
> *From:* Jake Young [mailto:jak3...@gmail.com]
> *Sent:* 14 January 2015 12:50
> *To:* Nick Fisk
> *Cc:* Giuseppe Civitella; ceph-users
> *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Nick,
>
>
>
> Where did you read that having more than 1 LUN per target causes stability
> problems?
>
>
>
> I am running 4 LUNs per target.
>
>
>
> For HA I'm running two linux iscsi target servers that map the same 4 rbd
> images. The two targets have the same serial numbers, T10 address, etc.  I
> copy the primary's config to the backup and change IPs. This way VMWare
> thinks they are different target IPs on the same host. This has worked very
> well for me.
>
>
>
> One suggestion I have is to try using rbd enabled tgt. The performance is
> equivalent to LIO, but I found it is much better at recovering from a
> cluster outage. I've had LIO lock up the kernel or simply not recognize
> that the rbd images are available; where tgt will eventually present the
> rbd images again.
>
>
>
> I have been slowly adding servers and am expanding my test setup to a
> production setup (nice thing about ceph). I now have 6 OSD hosts with 7
> disks on each. I'm using the LSI Nytro cache raid controller, so I don't
> have a separate journal and have 40Gb networking. I plan to add another 6
> OSD hosts in another rack in the next 6 months (and then another 6 next
> year). I'm doing 3x replication, so I want to end up with 3 racks.
>
>
>
> Jake
>
> On Wednesday, January 14, 2015, Nick Fisk <n...@fisk.me.uk> wrote:
>
>  Hi Giuseppe,
>
>
>
> I am working on something very similar at the moment. I currently have it
> working on some test hardware but seems to be working reasonably well.
>
>
>
> I say reasonably as I have had a few instability’s but these are on the HA
> side, the LIO and RBD side of things have been rock solid so far. The main
> problems I have had seem to be around recovering from failure with
> resources ending up in a unmanaged state. I’m not currently using fencing
> so this may be part of the cause.
>
>
>
> As a brief description of my configuration.
>
>
>
> 4 Hosts each having 2 OSD’s also running the monitor role
>
> 3 additional host in a HA cluster which act as iSCSI proxy nodes.
>
>
>
> I’m using the IP, RBD, iSCSITarget and iSCSILUN resource agents to provide
> HA iSCSI LUN which maps back to a RBD. All the agents for each RBD are in a
> group so they follow each other between hosts.
>
>
>
> I’m using 1 LUN per target as I read somewhere there are stability
> problems using more than 1 LUN per target.
>
>
>
> Performance seems ok, I can get about 1.2k random IO’s out the iSCSI LUN.
> These seems to be about right for the Ceph cluster size, so I don’t think
> the LIO part is causing any significant overhead.
>
>
>
> We should be getting our production hardware shortly which wil have 40
> OSD’s with journals and a SSD caching tier, so within the next month or so
> I will have a better idea of running it in a production environment and the
> performance of the system.
>
>
>
> Hope that helps, if you have any questions, please let me know.
>
>
>
> Nick
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com
> <ceph-users-boun...@lists.ceph.com>] *On Behalf Of *Giuseppe Civitella
> *Sent:* 13 January 2015 11:23
> *To:* ceph-users
> *Subject:* [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Hi all,
>
>
>
> I'm working on a lab setup regarding Ceph serving rbd images as ISCSI
> datastores to VMWARE via a LIO box. Is there someone that already did
> something similar wanting to share some knowledge? Any production
> deployments? What about LIO's HA and luns' performances?
>
>
>
> Thanks
>
> Giuseppe
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Reply via email to