Re: [zfs-discuss] Sidebar to ZFS Availability discussion

Miles Nordin Tue, 02 Sep 2008 22:09:13 -0700

>>>>> "bs" == Bill Sommerfeld <[EMAIL PROTECTED]> writes:


    bs> In an ip network, end nodes generally know no more than the
    bs> pipe size of the first hop -- and in some cases (such as true
    bs> CSMA networks like classical ethernet or wireless) only have
    bs> an upper bound on the pipe size.

yeah, but the most complicated and well-studied queueing disciplines
(like, everything implemented in ALTQ and I think everything
implemented by the two different Cisco queueing frameworks (the CBQ
process-switched one, and the diffserv-like cat6500 ASIC-switched
one)) is (a) hop-by-hop, so the algorithm one discusses only applies
to a single hop, a single transmit queue, never to a whole path, and
(b) assumes a unidirectional link of known fixed size, not a broadcast
link or token ring or anything like that.

For wireless they are not using the fancy algorithms.  They're doing
really primitive things like ``unsolicited grants''---basically just
TDMA channels.

I wouldn't think of ECN as part of QoS exactly, because it separates
so cleanly from your choice of queue discipline.

    bs> hmm.  I don't think the back pressure makes it all the way up
    bs> to zfs

I guess I was thinking of the lossless fabrics, which might change
some of the assumptions behind designing a scheduler that went into IP
QoS.  For example, most of the IP QoS systems divide the usual
one-big-queue into many smaller queues.  A ``classifier'' picks some
packets as pink ones and some as blue, and assigns them thusly to
queues, and they always get classified to the end of the queue.  The
``scheduler'' then decides from which queue to take the next packet.
The primitive QoS in Ethernet chips might give you 4 queues that are
either strict-priority or weighted-round-robin.  Link-sharing
schedulers like CBQ or HFSC make a heirarchy of queues where, to the
extent that they're work-conserving, child queues borrow unused
transmission slots from their ancestors.  Or a flat 256 hash-bucket
queues for WFQ, which just tries to separate one job from another.

but no matter which of those you choose, within each of the smaller
queues you get an orthogonal choice of RED or FIFO.  There's no such
thing as RED or FIFO with queues in storage networks because there is
no packet dropping.

This confuses the implementation of the upper queueing discipline
because what happens when one of the small queues fills up?  How can
you push up the stack, ``I will not accept another CDB if I would
classify it as a Pink CDB, because the Pink queue is full.  I will
still accept Blue CDB's though.''  Needing to express this destroys
the modularity of the IP QoS model.  We can only say ``block---no more
CDB's accepted,'' but that defeats the whole purpose of the QoS!  so
how to say no more CDB's of the pink kind?  With normal hop-by-hop
QoS, I don't think we can.

This inexpressability of ``no more pink CDB's'' is the same reason
enterprise Ethernet switches never actually use the gigabit ethernet
``flow control'' mechanism.  Yeah, they negotiate flow control and
obey received flow control signals, but they never _assert_ a flow
control signal, at least not for normal output-queue congestion,
because this would block reception of packets that would get switched
to uncongested output ports, too.  Proper enterprise switches would
assert flow control only for rare pathological cases like backplane
saturation or cheap oversubscribed line cards.  No matter what
overzealous powerpoint monkeys claim, CEE/FCoE is _not_ going to use
``pause frames.''

I guess you're right that some of the ``queues'' in storage are sort
of arbitrarily sized, like the write queue which could take up the
whole buffer cache, so back pressure might not be the right way to
imagine it.

pgpheHnFkXcqX.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sidebar to ZFS Availability discussion

Reply via email to