Gordon Sim wrote:
On 07/07/2015 05:48 PM, Clint Byrum wrote:
all of the call sites I checked _do not appear to resend_, they
simply explode on timeout waiting for reply. This is how calling code
should work and I'm ok with code in nova, cinder, et. al. being
written this way, because I'd expect my messaging layer to be at
least somewhat reliable
In my opinion, the calling code has better context for determining
whether or not to retry. Tackling reliability issues end-to-end is often
much more efficient also.
[...]
I think you'll find that once you try to make oslo.messaging handle the
retrying, that with the broker simply being ack'd all the time, you risk
duplicating RPC calls if you retry in a loop.
Resending the request will always risk duplicating the call (unless the
caller can verify that the previous request was not executed in some
call specific way). Whether or not you acknowledge the request (and
whether you do it before or after the processing of the request), the
response can still get lost (neither requests nor responses are
currently confirmed by the broker).
There is a message id 'cache' used to try and detect (and then ignore)
duplicates. It's not clear to me how effective that is in practice as it
only tracks the last 16 ids for a given listener. In any case if the
listener process is restarted, or if the call is redelivered to a
different server in a group, then the id cache would be of no use.
The 16 ids stuff always makes me chuckle (at how its so weird/and IMHO
useless); I remember that review, ha,
https://review.openstack.org/#/c/20567/ (imho a past 16 ids list just
hides the problem)... Maybe we can finally address the real problem here
(projects not being able to handle duplicate messages without corrupting
all the things...)
The pattern is well
established in RabbitMQ that acks should happen _AFTER_ the message has
been consumed and thus should not be duplicated, not before.
That is the pattern for at-least-once delivery, where either processing
is able to detect that a resent message was already processed or where
reprocessing it is preferable to not processing it at all.
I *believe* olso.messaging (or impl_rabbit at least) was aiming for an
at-most-once guarantee (i.e. avoiding duplication at the expense of
dropped messages). That may be why the acknowledgement is done before
processing, though since the acknowledgement is asynchronous, that only
narrows the window it doesn't eliminate it.
I may of course be wrong. It would be great to have some one more
qualified to comment on the intentions of the design provide some clarity.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev