Hmm... interesting use case. I can see the usefulness of persistence in that scenario. Thanks very much for the feedback! I just added your idea to Dory's wiki page http://dory.wikidot.com . Progress at adding features has been slow lately, since I work on Dory in my limited spare time. But I welcome all the outside help I can get. If you or others are interested in working on adding a persistence feature, that would be great. Otherwise hopefully I'll get a chance to work on it myself at some point. Thanks again for the suggestion!
Regards, Dave -----Original Message----- From: "Jason J. W. Williams" <jasonjwwilli...@gmail.com> Sent: Monday, June 13, 2016 10:45am To: users@kafka.apache.org Subject: Re: Introducing Dory Hi Dave, Dory sounds very exciting. Without persistence its less useful for clients connected over a WAN, since if the WAN goes wonky you could build up quite a queue until it comes back. -J On Mon, Jun 13, 2016 at 3:00 AM, Dave Peterson <d...@dspeterson.com> wrote: > Hi Arya, > > In the case of a kernel panic or other failure serious enough to bring > a machine down (hardware failure, power outage, etc.), message loss is > definitely possible, and not totally preventable regardless of what > precautions software takes to avoid data loss. To ensure high > throughput, Dory batches messages together before forwarding them to > Kafka. The amount of batching Dory performs is configurable on a > per-topic basis, and in practice will typically cause a delay of > between several hundred milliseconds and several seconds before > messages are forwarded to Kafka. Thus, if a machine on which Dory is > running crashes, you can expect to lose up to a few seconds of batched > messages depending on the batching configuration. Additionally, you > may lose some sent messages for which Dory has not yet received an ACK > from Kafka indicating successful receipt and persistence to disk. > > To narrow the window of possible message loss, one can set a very > short batching delay, or even completely disable batching for one or > more topics. However this is not recommended since it will have a > serious adverse affect on the performance of your Kafka cluster. > Another possibility would be for Dory to immediately persist messages > to local disk on receipt, even before they are batched, and then > delete them only after a non-error ACK has been received from Kafka. > This would greatly narrow, although not totally eliminate the window > for message loss. However it would add substantial complexity to > Dory, and impose a substantial disk I/O performance cost. An > important consideration is that when you write to a file, the kernel's > default action is to not immediately write the data to disk. It > buffers the data in its page cache for a period of time (typically up > to 30 seconds) to improve throughput. If the machine crashes, all of > this unwritten data is lost. You can get around this by calling > fsync() on a file descriptor or specifying O_DIRECT | O_SYNC when > opening a file, but this will degrade I/O performance. Dory avoids > persisting data to disk with the view that these downsides outweigh > the benefits of avoiding up to a few seconds of data loss when a > machine crashes. However a feature I intend to add (described later > in this email) will provide useful information about which messages > were successfully delivered to Kafka in the event of a machine crash. > > Dory takes great care to avoid message loss due to circumstances > other than machine crashes. When Dory starts up, it preallocates a > fixed amount of memory (configurable with the --msg_buffer_max N > command line option) for storing messages received from clients. > Dory holds on to each message it receives until one of the following > has occurred: > > - It receives an ACK from Kafka indicating that the message has > been successfully received and persisted to disk. > > - It receives an error ACK from Kafka that causes the message to > be immediately discarded. Certain error ACKs cause Dory to > immediately discard messages, and others cause it to pause, > update its metadata, and attempt redelivery based on the new > metadata. ACK error 2 (invalid message, i.e. CRC error) causes > Dory to immediately attempt redelivery without updating its > metadata. The actions Dory takes for various error ACK values > are documented here: > > > https://github.com/dspeterson/dory/blob/master/doc/design.md#dispatcher > > - As mentioned above, certain error ACKs will cause Dory to > attempt redelivery. Each time this occurs for a given message, > Dory increments a failed delivery attempt count on the message. > Once the count exceeds a certain value (specified by the > --max_failed_delivery_attempts N command line option), Dory > discards the message. > > - It has determined that the message is undeliverable due to a > problem such as the message being malformed (see > > https://github.com/dspeterson/dory/blob/master/doc/sending_messages.md > which documents Dory's input message format) or having an > invalid topic. In this case, the message is discarded. > > - It has exhausted its preallocated memory for storing messages. > This may occur in the case where a network-related outage or > serious problem with the Kafka cluster persists for an extended > period of time, preventing normal message delivery. In this > case, new messages received from clients are discarded. > > - Dory receives a shutdown signal (SIGTERM or SIGINT). In this > case, Dory immediately stops accepting new messages from > clients, and for a certain configurable period of time attempts > to deliver any pending messages. Once the time limit expires, > any remaining messages are lost. This type of message loss > can be avoided by not shutting Dory down until no more clients > are sending it messages, and Dory has been given enough time to > process all pending messages. > > A critical aspect of Dory's behavior is that every discard is tracked > along with the reason why the discard occurred. Dory provides a web > interface from which periodic discard reports and other status > information can be obtained. Thus, in the rare event when discards > occur, they can be tracked and recorded on a per-topic basis. The > intent is for monitoring infrastructure to periodically ask Dory for > a discard report, alert an administrator if message loss occurred, and > optionally store the discard reports in a database so that a queryable > history of data quality is available. > > A relevant feature I intend to add at some point is described here: > > http://dory.wikidot.com/wiki:commit-points > > This notion of a commit point can be useful in the event of a machine > crash. If monitoring infrastructure periodically querys Dory for > commit point information, then in the event of a machine crash, we > know that any resulting message loss is limited to messages with > timestamps > T, where T is the latest commit point. > > I hope this helps answer your questions, and that I haven't inundated > you with too much information. Let me know if you have more > questions. > > > Regards, > Dave > > > -----Original Message----- > From: "Arya Ketan" <ketan.a...@gmail.com> > Sent: Sunday, June 12, 2016 11:52am > To: users@kafka.apache.org > Subject: Re: Introducing Dory > > Hi Dave, > Dory looks pretty interesting. I had a few further questions on it > a) How does Dory handle kernel panics? > b) What kind of message guarantees does dory provide and also if you can > share some design decisions taken to enable the guarantees whatever they > are. > > Thanks > Arya > > Arya > > On Sun, Jun 12, 2016 at 8:10 PM, Dave Peterson <d...@dspeterson.com> > wrote: > > > Thanks! Enjoy :-) > > > > > > > > On 6/12/2016 12:24 AM, Gwen Shapira wrote: > > > >> Dory is pretty cool (even though it is named after a somewhat dorky > >> fish). Thank you for sharing :) > >> > >> On Sun, Jun 12, 2016 at 1:24 AM, Dave Peterson <d...@dspeterson.com> > >> wrote: > >> > >>> Hello Kafka users, > >>> > >>> Version 1.1.0 of Dory is now available. See > >>> https://github.com/dspeterson/dory for details. Dory is the successor > >>> to Bruce (https://github.com/tagged/bruce), a Kafka producer daemon I > >>> created while working at if(we) (http://www.ifwe.co/). The code has > >>> seen a number of improvements since its initial release in September > >>> 2014. The list of example clients for various programming languages > >>> has also been extended. Dory maintains full backward compatibility > >>> with Bruce, so existing users can easily switch. > >>> > >>> The latest release adds support for receiving messages from clients by > >>> UNIX domain stream socket or local TCP. Although UNIX domain > >>> datagrams are still the preferred means of sending messages in most > >>> cases, the option of using stream sockets facilitates sending messages > >>> too large to fit in a single datagram. The local TCP option > >>> facilitates adding support for clients written in programming > >>> languages that do not provide easy access to UNIX domain sockets. > >>> > >>> Dory's wiki page http://dory.wikidot.com/start contains a list of > >>> ideas for additional features and other improvements. Community > >>> feedback is welcomed and appreciated. If you have ideas for things > >>> you would like to see in future releases, please add them to the list. > >>> Also, please contribute code if you can afford the time. > >>> > >>> > >>> Thanks, > >>> Dave Peterson > >>> > >>> > >>> > > > > >