On Sat, Oct 15, 2011 at 6:57 PM, Jim Klimov <jimkli...@cos.ru> wrote:
> Thanks to all that replied. I hope we may continue the discussion, > but I'm afraid the overall verdict so far is disapproval of the idea. > It is my understanding that those active in discussion considered > it either too limited (in application - for VMs, or for hardware cfg), > or too difficult to implement, so that we should rather use some > alternative solutions. Or at least research them better (thanks Nico). > > I guess I am happy to not have seen replies like "won't work > at all, period" or "useless, period". I get "Difficult" and "Limited" > and hope these can be worked around sometime, and hopefully > this discussion would spark some interest in other software > authors or customers to suggest more solutions and applications - > to make some shared ZFS a possibility ;) > > Still, I would like to clear up some misunderstandings in replies - > because at times we seemed to have been speaking about > different architectures. Thanks to Richard, I stated what exact > hardware I had in mind (and wanted to use most efficiently) > while thinking about this problem, and how it is different from > "general" extensible computers or server+NAS networks. > > Namely, with the shared storage architecture built into Intel > MFSYS25 blade chassis and lack of expansibility of servers > beyond that, some suggested solutions are not applicable > (10GbE, FC, Infiniband) but some networking problems > are already solved in hardware (full and equal connectivity > between all servers and all shared storage LUNs). > > So some combined replies follow below: > > 2011-10-15, Richard Elling and Edward Ned Harver and Nico Williams wrote: > >> > #1 - You seem to be assuming storage is slower when it's on a remote >> storage >> > server as opposed to a local disk. While this is typically true over >> > ethernet, it's not necessarily true over infiniband or fibre channel. >> Many people today are deploying 10GbE and it is relatively easy to get >> wire speed >> for bandwidth and< 0.1 ms average access for storage. >> > > Well, I am afraid I have to reiterate: for a number of reasons including > price, our customers are choosing some specific and relatively fixed > hardware solutions. So, time and again, I am afraid I'll have to remind > of the sandbox I'm tucked into - I have to do with these boxes, and I > want to do the best with them. > > I understand that Richard comes from a background where HW is the > flexible part in equations and software is designed to be used for > years. But for many people (especially those oriented at fast-evolving > free software) the hardware is something they have to BUY and it > works unchanged as long as possible. This does not only cover > enthusiasts like the proverbial "red-eyed linuxoids", but also many > small businesses. I do still maintain several decade-old computers > running infrastructure tasks (luckily, floorspace and electricity are > near-free there) which were not yet virtualized because "if it ain't > broken - don't touch it!" ;) > > In particular, the blade chassis in my example, which I hoped to > utilize to their best, using shared ZFS pools, have no extension > slots. There is no 10GbE for neither external RJ45 nor internal > ports (technically there is 10GbE interlink of two switch modules), > so each server blade is limited to have either 2 or 4 1Gbps ports. > There is no FC. No infiniband. There may be one extSAS link > on each storage controller module, that's it. > > I think the biggest problem lies in requiring full >> connectivity from every server to every LUN. >> > > This is exactly (and the only) sort of connectivity available to > server blades in this chassis. > > I think this is as applicable to networked storage where there > is a mesh of reliable connections between disk controllers > and disks (or at least LUNs), be it switched FC or dual-link > SAS or whatnot. > > Doing something like VMotion would be largely pointless if the VM storage >> still remains on the node that was previously the compute head. >> > > True. However, in these Intel MFSYS25 boxes no server blade > has any local disks (unlike most other blades I know). Any disk > space is fed to them - and is equally accessible over a HA link - > from the storage controller modules (which are in turn connected > to the built-in array of hard-disks) that are a part of the chassis > shared by all servers, like the networking switches are. > > If you do the same thing over ethernet, then the performance will be >> degraded to ethernet speeds. So take it for granted, no matter what you >> do, >> you either need a bus that performs just as well remotely versus >> locally... >> Or else performance will be degraded... Or else it's kind of pointless >> because the VM storage lives only on the system that you want to VMotion >> away from. >> > > Well, while this is no Infiniband, in terms of disk access this > paragraph is applicable to MFSYS chassis: disk access > via storage controller modules can be considered a fast > common bus - if this comforts readers into understanding > my idea better. And yes, I do also think that channeling > disk over ethernet via one of the servers is a bad thing > bound to degrade performance as opposed to what can > be had anyway with direct disk access. > > Ethernet has *always* been faster than a HDD. Even back when we had 3/180s >> 10Mbps Ethernet it was faster than the 30ms average access time for the >> disks of >> the day. I tested a simple server the other day and round-trip for 4KB of >> data on a >> busy 1GbE switch was 0.2ms. Can you show a HDD as fast? Indeed many SSDs >> have trouble reaching that rate under load. >> > > As noted by other posters, access times are not bandwidth. > So these are two different "faster"'s ;) Besides, (1Gbps) > Ethernet is faster than a single HDD stream. But it is not > quite faster than an array of 14HDDs... > > And if Ethernet is utilized by its direct tasks - whatever they > be, say video streaming off this server to 5000 viewers or > whatever is needed to saturate the network, disk access > over the same ethernet link would have to compete. And > whatever the QoS settings, viewers would lose - either the > real-time multimedia signal would lag, or the disk data to > feed it. > > Moreover, usage of an external NAS (a dedicated server > with Ethernet connection to the blade chassis) would make > an external box dedicated and perhaps optimized to storage > tasks (i.e. with ZIL/L2ARC), and would free up a blade for > VM farming needs, but it would consume much of the LAN > bandwidth of the blades using its storage services. > > Today, HDDs aren't fast, and are not getting faster. >> -- richard >> > Well, typical consumer disks did get about 2-3 times faster for > linear RW speeds over the past decade; but for random access > they do still lag a lot. So, "agreed" ;) > > //Jim > > Quite frankly your choice in blade chassis was a horrible design decision. From your description of its limitations it should never be the building block for a vmware cluster in the first place. I would start by rethinking that decision instead of trying to pound a round ZFS peg into a square hole. --Tim
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss