Trying to group and compress responses here. > If consumer control the delayed message specific execution time we must trust clock of consumer, this can cause delayed message process ahead of time, some applications cannot tolerate this condition.
This is a problem that cannot be solved. Even assuming the timestamps are assigned by brokers and are guaranteed to be monotonic, this won't prevent 2 brokers from having clock skews. That would results in different delivery delays. Similarly, the broker timestamp might be assigned later compared to when a publisher was "intending" to start the clock. Barring super-precise clock synchronization techniques (which are way out of the scope of this discussion), the only reasonable way to think about this is that delays needs to be orders of magnitudes bigger than the average clock skew experienced with common techniques (eg: NTP). NTP clock skew will generally be in the 10s of millis. Any delay > 1 seconds will hardly be noticeably affected by these skews. Additionally, any optimization on the timeouts handling (like the hash-wheel timer proposed in PIP-26) will trade off precision for efficiency. In that case, the delays are managed in buckets, and can result in higher delays that what was requested. > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the largest proportion in throughput of delayed message . A subscription with a fixed delayed time can approach to this scene. I don't think that for fixed delays, any server-side implementation would provide any advantage compared to doing: ``` while (true) { Message msg = consumer.receive(); long delayMillis = calculateDelay(msg) if (delayMillis > 0) { Thread.sleep(delayMillis); } // Do something consumer.acknowledge(msg); } ``` This will not need any support from broker. Also, there will be no redeliveries. It could be wrapped in the client API, although I don't see that as big of a problem. > My concern of this category of approaches is "bandwidth" usage. It is basically trading bandwidth for complexity. With mixed delays on a single topic, in any case there has to be some kind of time-based sorting of the messages that needs to happen either at broker or at client. Functionally, I believe that either place is equivalent (from a user point of view), barring the different implementation requirements. In my view, the bigger cost here is not bandwidth but rather the disk IO, that will happen exactly in the same way in both cases. Messages can be cached, up to a certain point, either in broker or in client library. After that, in both cases, the messages will have to be fetched from bookies. Also, when implementing the delay feature in the client, the existing flow control mechanism is naturally applied to limit the overall amount of information that we have to keep track (the "currently tracked" messages). Some other mechanism would have to be done in the broker as well. Again, in general I'm more concerned of stuff that happens in broker because it will have to be scaled up 10s of thousands of times in a single process, while in client typically the requirements are much simpler. If the goal is to minimize the amount of redeliveries from broker -> client, there are multiple ways to achieve that with the client based approach (eg. send message id and delay time instead of the full payload to consumers as Ivan proposed). This seems to be simpler and with less overhead than having to persist the whole hashweel timer state into a ledger. -- Matteo Merli <matteo.me...@gmail.com> On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle <ezequiellove...@gmail.com> wrote: > > Hi All! and sorry for delay :) > > Probably I'm going to say some things already said, so sorry beforehand. > > The two main needed features I think are the proposed: > A. Producer delay PIP-26. B. Consumers delay PR #3155 > > Of course PIP-26 would result in consumers receiving delayed messages > but the important thing here is one of them made the decision about delay. > > First, the easy one, PR #3155. Consumers delay: > > As others have stated before, this is a more trivial approach because > of the nature of having the exactly same period of delay for each message > which is predictable. > > I agree that adding logic at broker should be avoided, but, for this > specific feature #3155 which I don't think is complex I believe there > are others serious advantages: > > 1. Simplicity at client side, we don't need to add any code which is > less error prone. > 2. Clock issues from client side being outdated and causing headache > to users detecting this. > 3. Avoids huge overhead delivering non expired messages across the > network unnecessary. > 4. Consumers are free to decide to consume messages with delay regardless > of the producer. > 5. Delay is uniform for all messages, which sometimes is the solution > to the problem rather than arbitrary delays. > > I think that would be great if pulsar can provide this kind of features > without relaying on users needing to know heavy details about the > mechanism. > > For PIP-26: > > I think we can offer this with the purpose of message's with a more long > delay in terms of time? hours / days? > > So, if this is the case, we can assume a small granularity of time like > 1 minute making ledger's representing 1 minute of time and truncating > each time of message for it corresponding minute and storing in that > special ledger. > Users wanting to receive a messages scheduled for some days in future > rarely would care of a margin of error of 1 minute. > > Of course we need somehow make the broker aware of this in order to only > process ledger's for current corresponding minute and consume it. > And the broker would be the one subject to close current minute truncated > processed ledger. > > One problem I can think about this approach, is it painful for Bookkeeper > to having a lot of opened ledgers? (one for each minute per topic) > > Another problem here might be what happen if consumer was not started? > At startup time the broker should looking for potentially older ledger's > than its current time and this might be expensive. > > Other more trivial issue, we might need to refactor current mechanism > which deletes closed ledgers older than the configured time on name space. > > As a final note I think that would be great to have both features in pulsar > but sometimes not everything desired is achievable. > And please correct me if I said something senseless. > > -- > *Ezequiel Lovelle* > > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <codelipeng...@gmail.com> wrote: > > > > So rather than specifying the absolute timestamp that the message > > > should appear to the user, the dispatcher can specify the relative > > > delay after dispatch that it should appear to the user. > > > > As matteo said the worst case would be that the applied delay to be higher > > for some of the messages, if specify the relative delay to consumer, > > if consumer offline for a period of time, consumer will receive many > > delayed messages > > after connect to broker again will cause the worst case more serious. It's > > difficult to keep > > consumers always online. > > > > In my personal perspective, i refer to use `delay level topic` to approach > > smaller delays scene. > > e.g(10s-topic, 30s-topic), this will not be too much topic. And we are > > using dead letter topic to simulate > > delay message feature, delayed topics has different delay level. > > > > For very long delays scene, in our practice, user may cancel it or restart > > it. > > After previous discussions, i agree that PIP-26 will make broker > > more complexity. > > So I had the idea to consider as a separate mechanism. > > > > > > Sijie Guo <guosi...@gmail.com> 于2019年1月18日周五 下午3:22写道: > > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote: > > > > > > > One thing missing from this discussion is details on the motivating > > > > use-case. How many delayed messages per second are we expecting? And > > > > what is the payload size? > > > > > > > > > If consumer control the delayed message specific execution time we > > must > > > > > trust clock of consumer, this can cause delayed message process ahead > > > of > > > > > time, some applications cannot tolerate this condition. > > > > > > > > This can be handled in a number of ways. Consumer clocks can be skewed > > > > with regard to other clocks, but it is generally safe to assume that > > > > clocks advance at the same rate, especially at the granularity of a > > > > couple of hours. > > > > So rather than specifying the absolute timestamp that the message > > > > should appear to the user, the dispatcher can specify the relative > > > > delay after dispatch that it should appear to the user. > > > > > > > > > > My concern of this category of approaches is "bandwidth" usage. It > > is > > > > > > basically trading bandwidth for complexity. > > > > > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an trading can > > > cause > > > > the > > > > > broker's out going network to be more serious. > > > > > > > > I don't think PIP-26's approach may not use less bandwidth in this > > > > regard. With PIP-26, the msg ids are stored in a ledger, and when the > > > > timeout triggers it dispatches? Are all the delayed message being > > > > cached at the broker? If so, that is using a lot of memory, and it's > > > > exactly the kind of memory usage pattern that is very bad for JVM > > > > garbage collection. If not, then you have to read the message back in > > > > from bookkeeper, so the bandwidth usage is the same, though on a > > > > different path. > > > > > > > > In the client side approach, the message could be cached to avoid a > > > > redispatch. When I was discussing with Matteo, we discussed this. The > > > > redelivery logic has to be there in any case, as any cache (broker or > > > > client side) must have a limited size. > > > > Another option would be to skip sending the payload for delayed > > > > messages, and only send it when the client request redelivery, but > > > > this has the same issue with regard to the entry likely falling out > > > > the cache at the broker-side. > > > > > > > > > There are bandwidth usage at either approaches for sure. The main > > > difference between broker-side and client-side approaches is which part > > of > > > the bandwidth is used. > > > > > > In the broker-side approach, it is using the bookies egress and broker > > > ingress bandwidth. In a typical pulsar deployment, bookies egress is > > mostly > > > idle unless there are consumers falling behind. > > > > > > In the client-side approach, it is using broker’s egress bandwidth and > > > potentially bookies’ egress bandwidth. Brokers’ egress is critical since > > it > > > is shared across consumers. So if the broker egress is doubled, it is a > > red > > > flag. > > > > > > Although I agree the bandwidth usage depends on workloads. But in theory, > > > broker-side approach is more friendly to resource usage and a better > > > approach to use the resources in a multi layered architecture. Because it > > > uses less bandwidth at broker side. A client side can cause more > > bandwidth > > > usage at broker side. > > > > > > Also as what penghui pointed out, clock screw can be another factor > > causing > > > more traffic in a fanout case. In a broker-side approach, the deferred is > > > done in a central point, so when the deferred time point kicks in, broker > > > just need to read the data one time from bookies. However in a > > client-side > > > approach, the messages are asked by different subscriptions, different > > > subscription can ask the deferred message at any time based on their > > > clocks. > > > > > > > > > > > > > > > > > -Ivan > > > > > > > > >