" preinitialize.metadata=true/false" can help to certain extent. if the kafka cluster is down, then metadata won't be available for a long time (not just the first msg). so to be safe, we have to set " metadata.fetch.timeout.ms=1" to fail fast as Paul mentioned. I can also echo Jay's comment that on-demand fetch of metadata might be more efficient, since cluster may have many topics that a particular producer may not care.
so I plan to do sth similar to what Paul described. - metadata.fetch.timeout.ms=1 - enqueue msg to a pending queue when topic metadata not available. - have a background thread check when metadata become available and drain the pending queue - optionally, prime topic metadata asynchronously during init (if configured) Just wondering whether above should be the default behavior of best-effort non-blocking delivery in kafka clients. then we don't have to reinvent the wheels. Thanks, Steven On Mon, Dec 29, 2014 at 11:48 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > I don't think a separate queue will be a very simple solution to implement. > > Could you describe your use case a little bit more. It does seem to me that > as long as the metadata fetch happens only once and the blocking has a > tight time bound this should be okay in any use case I can imagine. And, of > course, by default the client blocks anyway whenever you exhaust the memory > buffer space. But it sounds like you feel it isn't. Maybe you could > describe the scenario a bit? > > I think one thing we could do is what was discussed in another thread, > namely add an option like > preinitialize.metadata=true/false > which would default to false. When true this would cause the producer to > just initialize metadata for all topics when it is created. Note that this > then brings back the opposite problem--doing remote communication during > initialization which tends to bite a lot of people. But since this would be > an option that would default to false perhaps it would be less likely to > come as a surprise. > > -Jay > > On Mon, Dec 29, 2014 at 8:38 AM, Steven Wu <stevenz...@gmail.com> wrote: > > > +1. it should be truly async in all cases. > > > > I understand some challenges that Jay listed in the other thread. But we > > need a solution nonetheless. e.g. can we maintain a separate > > list/queue/buffer for pending messages without metadata. > > > > On Tue, Dec 23, 2014 at 12:57 PM, John Boardman <boardmanjo...@gmail.com > > > > wrote: > > > > > I was just fighting this same situation. I never expected the new > > producer > > > send() method to block as it returns a Future and accepts a Callback. > > > However, when I tried my unit test, just replacing the old producer > with > > > the new, I immediately started getting timeouts waiting for metadata. I > > > struggled with this until I went into the source code and found the > > wait() > > > that waits for the metadata. > > > > > > At that point I realized that this new "async" producer would have to > be > > > executed on its own thread, unlike the old producer, which complicates > my > > > code unnecessarily. I totally agree with Paul that the contract of > send() > > > is being completely violated with internal code that can block. > > > > > > I did try fetching the metadata first, but that only worked for a few > > calls > > > before the producer decided it was time to update the metadata again. > > > > > > Again, I agree with Paul that this API should be fixed so that it is > > truly > > > asynchronous in all cases. Otherwise, it cannot be used on the main > > thread > > > of an application as it will block and fail. > > > > > >