Hi Mark, On Sat, Feb 19, 2011 at 11:18:26AM -0500, Mark Washenberger wrote: > It seems like put and post on ../queue[/id] are reversed from the usual > sense. I am probably just not familiar with the idioms for previous queue > services, but I imagine the following scheme. > > POST .../queue # create a message without specifying the id > PUT .../queue/id # create a message with a specific id, > # or modify existing message's metadata > PUT .../queue # create a queue or modify existing queue's metadata > POST .../queue/id # drop this command (?)
I was wondering if someone would point out this discrepancy with traditional REST semantics. :) The reason for the change is the following: We need both an insert and update method for the queue and message level. Some map well, but the "update all messages in a queue" (or the set matching some criteria) is the oddball. A PUT .../queue doesn't quite fit because we're not inserting or replacing a queue, we're modifying all messages in it. I wanted to keep GET/HEAD read-only and side-effect free, so I didn't want to overload those. Another option was to use reserved words, possibly those prefixed with an underscore (such as POST ../queue/_all), but this seemed unnecessary. Instead, I propose using POST ../queue to mean modify the queue (and possibly all messages in it). POST implies more of a modify/append/process operation that PUT, which should strictly be insert/replace. Using this form meant not using POST ../queue for the traditional "insert but auto-generate ID" semantics, which is why I used PUT instead. In some ways this is more consistent too, since you are still inserting a message like PUT ../queue/message, just without the id. If this divergence from traditional REST semantics is not welcome, we can certainly change it. We do need to find the best way to represent an atomic modify/get operation at the queue level though. > Is the FIFO constraint on GET .../queue a strong constraint? or more of an > approximation or eventual consistency type of weaker guarantee? For a single queue server, yes. When using an aggregate of all queue servers from a proxy or multi-connection worker, you'll need to handle this in the worker code since messages may be interleaved from multiple queue servers. > Looking at the public cloud service deployment diagram and notes, I started > to wonder about the performance demands of different use cases. I imagine > that a given account or a given queue could vary widely through time in the > number of messages produced and messages consumed. So an individual queue > might have a very high variance in the distribution of cpu and disk resources > consumed. I am worried about sharding hardware resources on the basis of the > account id alone. It seems like this would prevent the queue service from > scaling out a single queue to cluster level and would create hot-spots. Do we > need to worry about this type of use case? Am I missing how this would be > handled in your deployment example? I suppose there is no reason you couldn't perform the first level hash on account/queue instead of just account. In fact I think the plugin point for controlling distribution should do just this, pass in the full URI and server set, and return some set of servers. It can hash on whatever it likes. A distribution plugin could also use consistent hashing so we could optionally grow/shrink the queue server set if we notice hotspots for a particular set. > Going way back to a previous discussion on this thread--you had rejected IPC > because it would limit performance. It seems to me if we used an IPC-based > model of coordination across cores, it would be analogous to the model used > for scaling across multiple nodes in a zone. This approach would help us > scale an individual queue beyond the limitations of a single machine or HA > triplet. If I recall, that is sort of like the way erlang scales across > multiple machines. Not that I would want to deal with erlang's syntactic salt. Erlang doesn't use IPC across machines (well, not traditional System V IPC mechanisms), it's a socket daemon called epmd (Erlang Port Mapper Daemon) that forwards messages between different nodes. Within a single machine Erlang uses a multi-threaded event loop that maps all Erlang processes on top of it. This lets you efficiently use all cores and provide very lightweight message passing between the Erlang processes running on those cores (much lighter weight than any IPC mechanism). Having said all that, I suppose we could run multiple, independent queue servers per machine (one per core), and have each running on their own port. Each queue server process would represent a different node in the distribution set. The arguments against this are that it limits resource sharing for a machine since the processes would not know of one another. For example, if we have local persistent storage, this means multiple processes fighting for I/O rather than synchronizing this within one process (or possibly using IPC to a single process that manages all I/O, but this comes with overhead). This could be ok for deployments using a proxy, but for simpler deployments that don't use a load balancer/proxy, this means having to manage more queue servers (one per core) if you want to utilize all cores in a machine. So if I have two 8-core machines to manage my queue, this means 16 queue servers for my clients/workers to use, not just 2. In any deployment scenario, this will increase the number of multiplexed connections to the queue servers (will need to multiplex per process/port, not per machine) and increase the network chatter. This may not be a limiting factor, but is something to keep in mind. > Thanks for reading this far and for the docs. Cheers! Thanks for the feedback! Looking forward to hearing more. :) -Eric _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp