Hi Steven, Speaking only for myself, I agree with you. I think these settings/tweaks are the easiest short term way to get some proper non-blocking behavior. Long term, it seems like having a secondary queue in the client to hold raw messages until meta is available and then start blocking or dropping messages once too many are queued.
For those interested, I submitted a patch to add the following options: pre.initialize.topics pre.initialize.timeout.ms And then a new public method isInitialized() that the caller can check and make a decision to blow up or accept the failure and continue. If initialized is false, any sends will fast fail until the initialization completes. Patch is attached here: https://issues.apache.org/jira/browse/KAFKA-1835 Not familiar with Kafka's processes, so any feedback welcome. Thanks, Paul On Mon, Jan 5, 2015 at 1:47 PM, Steven Wu <stevenz...@gmail.com> wrote: > " preinitialize.metadata=true/false" can help to certain extent. if the > kafka cluster is down, then metadata won't be available for a long time > (not just the first msg). so to be safe, we have to set " > metadata.fetch.timeout.ms=1" to fail fast as Paul mentioned. I can also > echo Jay's comment that on-demand fetch of metadata might be more > efficient, since cluster may have many topics that a particular producer > may not care. > > so I plan to do sth similar to what Paul described. > - metadata.fetch.timeout.ms=1 > - enqueue msg to a pending queue when topic metadata not available. > - have a background thread check when metadata become available and drain > the pending queue > - optionally, prime topic metadata asynchronously during init (if > configured) > > Just wondering whether above should be the default behavior of best-effort > non-blocking delivery in kafka clients. then we don't have to reinvent the > wheels. > > Thanks, > Steven > > > > On Mon, Dec 29, 2014 at 11:48 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > I don't think a separate queue will be a very simple solution to > implement. > > > > Could you describe your use case a little bit more. It does seem to me > that > > as long as the metadata fetch happens only once and the blocking has a > > tight time bound this should be okay in any use case I can imagine. And, > of > > course, by default the client blocks anyway whenever you exhaust the > memory > > buffer space. But it sounds like you feel it isn't. Maybe you could > > describe the scenario a bit? > > > > I think one thing we could do is what was discussed in another thread, > > namely add an option like > > preinitialize.metadata=true/false > > which would default to false. When true this would cause the producer to > > just initialize metadata for all topics when it is created. Note that > this > > then brings back the opposite problem--doing remote communication during > > initialization which tends to bite a lot of people. But since this would > be > > an option that would default to false perhaps it would be less likely to > > come as a surprise. > > > > -Jay > > > > On Mon, Dec 29, 2014 at 8:38 AM, Steven Wu <stevenz...@gmail.com> wrote: > > > > > +1. it should be truly async in all cases. > > > > > > I understand some challenges that Jay listed in the other thread. But > we > > > need a solution nonetheless. e.g. can we maintain a separate > > > list/queue/buffer for pending messages without metadata. > > > > > > On Tue, Dec 23, 2014 at 12:57 PM, John Boardman < > boardmanjo...@gmail.com > > > > > > wrote: > > > > > > > I was just fighting this same situation. I never expected the new > > > producer > > > > send() method to block as it returns a Future and accepts a Callback. > > > > However, when I tried my unit test, just replacing the old producer > > with > > > > the new, I immediately started getting timeouts waiting for > metadata. I > > > > struggled with this until I went into the source code and found the > > > wait() > > > > that waits for the metadata. > > > > > > > > At that point I realized that this new "async" producer would have to > > be > > > > executed on its own thread, unlike the old producer, which > complicates > > my > > > > code unnecessarily. I totally agree with Paul that the contract of > > send() > > > > is being completely violated with internal code that can block. > > > > > > > > I did try fetching the metadata first, but that only worked for a few > > > calls > > > > before the producer decided it was time to update the metadata again. > > > > > > > > Again, I agree with Paul that this API should be fixed so that it is > > > truly > > > > asynchronous in all cases. Otherwise, it cannot be used on the main > > > thread > > > > of an application as it will block and fail. > > > > > > > > > >