On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter <zbit...@redhat.com> wrote:
> On 22/09/14 10:11, Gordon Sim wrote: > >> On 09/19/2014 09:13 PM, Zane Bitter wrote: >> >>> SQS offers very, very limited guarantees, and it's clear that the reason >>> for that is to make it massively, massively scalable in the way that >>> e.g. S3 is scalable while also remaining comparably durable (S3 is >>> supposedly designed for 11 nines, BTW). >>> >>> Zaqar, meanwhile, seems to be promising the world in terms of >>> guarantees. (And then taking it away in the fine print, where it says >>> that the operator can disregard many of them, potentially without the >>> user's knowledge.) >>> >>> On the other hand, IIUC Zaqar does in fact have a sharding feature >>> ("Pools") which is its answer to the massive scaling question. >>> >> >> There are different dimensions to the scaling problem. >> > > Many thanks for this analysis, Gordon. This is really helpful stuff. > > As I understand it, pools don't help scaling a given queue since all the >> messages for that queue must be in the same pool. At present traffic >> through different Zaqar queues are essentially entirely orthogonal >> streams. Pooling can help scale the number of such orthogonal streams, >> but to be honest, that's the easier part of the problem. >> > > But I think it's also the important part of the problem. When I talk about > scaling, I mean 1 million clients sending 10 messages per second each, not > 10 clients sending 1 million messages per second each. > > When a user gets to the point that individual queues have massive > throughput, it's unlikely that a one-size-fits-all cloud offering like > Zaqar or SQS is _ever_ going to meet their needs. Those users will want to > spin up and configure their own messaging systems on Nova servers, and at > that kind of size they'll be able to afford to. (In fact, they may not be > able to afford _not_ to, assuming per-message-based pricing.) > Running a message queue that has a high guarantee of not loosing a message is hard and SQS promises exactly that, it *will* deliver your message. If a use case can handle occasionally dropping messages then running your own MQ makes more sense. SQS is designed to handle massive queues as well, while I haven't found any examples of queues that have 1 million messages/second being sent or received 30k to 100k messages/second is not unheard of [0][1][2]. [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s [1] http://java.dzone.com/articles/benchmarking-sqs [2] http://www.slideshare.net/AmazonWebServices/massive-message-processing-with-amazon-sqs-and-amazon-dynamodb-arc301-aws-reinvent-2013-28431182 > There is also the possibility of using the sharding capabilities of the >> underlying storage. But the pattern of use will determine how effective >> that can be. >> >> So for example, on the ordering question, if order is defined by a >> single sequence number held in the database and atomically incremented >> for every message published, that is not likely to be something where >> the databases sharding is going to help in scaling the number of >> concurrent publications. >> >> Though sharding would allow scaling the total number messages on the >> queue (by distributing them over multiple shards), the total ordering of >> those messages reduces it's effectiveness in scaling the number of >> concurrent getters (e.g. the concurrent subscribers in pub-sub) since >> they will all be getting the messages in exactly the same order. >> >> Strict ordering impacts the competing consumers case also (and is in my >> opinion of limited value as a guarantee anyway). At any given time, the >> head of the queue is in one shard, and all concurrent claim requests >> will contend for messages in that same shard. Though the unsuccessful >> claimants may then move to another shard as the head moves, they will >> all again try to access the messages in the same order. >> >> So if Zaqar's goal is to scale the number of orthogonal queues, and the >> number of messages held at any time within these, the pooling facility >> and any sharding capability in the underlying store for a pool would >> likely be effective even with the strict ordering guarantee. >> > > IMHO this is (or should be) the goal - support enormous numbers of > small-to-moderate sized queues. If 50,000 messages per second doesn't count as small-to-moderate then Zaqar does not fulfill a major SQS use case. > > > If scaling the number of communicants on a given communication channel >> is a goal however, then strict ordering may hamper that. If it does, it >> seems to me that this is not just a policy tweak on the underlying >> datastore to choose the desired balance between ordering and scale, but >> a more fundamental question on the internal structure of the queue >> implementation built on top of the datastore. >> > > I agree with your analysis, but I don't think this should be a goal. > > Note that the user can still implement this themselves using > application-level sharding - if you know that in-order delivery is not > important to you, then randomly assign clients to a queue and then poll all > of the queues in the round-robin. This yields _exactly_ the same semantics > as SQS. > The reverse is true of SQS - if you want FIFO then you have to implement > re-ordering by sequence number in your application. (I'm not certain, but > it also sounds very much like this situation is ripe for losing messages > when your client dies.) > > So the question is: in which use case do we want to push additional > complexity into the application? The case where there are truly massive > volumes of messages flowing to a single point? Or the case where the > application wants the messages in order? > > I'd suggest both that the former applications are better able to handle > that extra complexity and that the latter applications are probably more > common. So it seems that the Zaqar team made a good decision. > If Zaqar is supposed to be comparable to amazon SQS, then it has picked the wrong choice. > > (Aside: it follows that Zaqar probably should have a maximum throughput > quota for each queue; or that it should report usage information in such a > way that the operator could sometimes bill more for a single queue than > they would for the same amount of usage spread across multiple queues; or > both.) > > I also get the impression, perhaps wrongly, that providing the strict >> ordering guarantee wasn't necessarily an explicit requirement, but was >> simply a property of the underlying implementation(?). >> > > I wasn't involved, but I expect it was a bit of both (i.e. it is a > chicken/egg question). > > cheers, > Zane. > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev