Hey Ryanne, I see what you're saying about serde blocking, but I think we should consider it out of scope for this patch. Right now we've nailed down a couple of use cases where we can unambiguously say, "I can make progress now" or "I cannot make progress now", which makes it possible to offload to a different thread only if we are unable to make progress. Extending this to CPU work like serde would mean always offloading, which would be a really big performance change. It might be worth exploring anyway, but I'd rather keep this patch focused on improving ergonomics, rather than muddying the waters with evaluating performance very deeply.
I think if we really do want to support serde or interceptors that do IO on the send path (which seems like an anti-pattern to me), we should consider making that a separate SIP, and probably also consider changing the API to use Futures (or CompletionStages). But I would rather avoid scope creep, so that we have a better chance of fixing this part of the problem. Yes, I think some exceptions will move to being async instead of sync. They'll still be surfaced in the Future, so I'm not so confident that it would be that big a shock to users though. Best, Moses On Thu, May 13, 2021 at 7:44 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > re serialization, my concern is that serialization often accounts for a lot > of the cycles spent before returning the future. It's not blocking per se, > but it's the same effect from the caller's perspective. > > Moreover, serde impls often block themselves, e.g. when fetching schemas > from a registry. I suppose it's also possible to block in Interceptors > (e.g. writing audit events or metrics), which happens before serdes iiuc. > So any blocking in either of those plugins would block the send unless we > queue first. > > So I think we want to queue first and do everything off-thread when using > the new API, whatever that looks like. I just want to make sure we don't do > that for clients that wouldn't expect it. > > Another consideration is exception handling. If we queue right away, we'll > defer some exceptions that currently are thrown to the caller (before the > future is returned). In the new API, the send() wouldn't throw any > exceptions, and instead the future would fail. I think that might mean that > a new method signature is required. > > Ryanne > > > On Thu, May 13, 2021, 2:57 PM Nakamura <nakamura.mo...@gmail.com> wrote: > > > Hey Ryanne, > > > > I agree we should add an additional constructor (or else an additional > > overload in KafkaProducer#send, but the new constructor would be easier > to > > understand) if we're targeting the "user provides the thread" approach. > > > > From looking at the code, I think we can keep record serialization on the > > user thread, if we consider that an important part of the semantics of > this > > method. It doesn't seem like serialization depends on knowing the > cluster, > > I think it's incidental that it comes after the first "blocking" segment > in > > the method. > > > > Best, > > Moses > > > > On Thu, May 13, 2021 at 2:38 PM Ryanne Dolan <ryannedo...@gmail.com> > > wrote: > > > > > Hey Moses, I like the direction here. My thinking is that a single > > > additional work queue, s.t. send() can enqueue and return, seems like > the > > > lightest touch. However, I don't think we can trivially process that > > queue > > > in an internal thread pool without subtly changing behavior for some > > users. > > > For example, users will often run send() in multiple threads in order > to > > > serialize faster, but that wouldn't work quite the same if there were > an > > > internal thread pool. > > > > > > For this reason I'm thinking we need to make sure any such changes are > > > opt-in. Maybe a new constructor with an additional ThreadFactory > > parameter. > > > That would at least clearly indicate that work will happen off-thread, > > and > > > would require opt-in for the new behavior. > > > > > > Under the hood, this ThreadFactory could be used to create the worker > > > thread that process queued sends, which could fan-out to per-partition > > > threads from there. > > > > > > So then you'd have two ways to send: the existing way, where serde and > > > interceptors and whatnot are executed on the calling thread, and the > new > > > way, which returns right away and uses an internal Executor. As you > point > > > out, the semantics would be identical in either case, and it would be > > very > > > easy for clients to switch. > > > > > > Ryanne > > > > > > On Thu, May 13, 2021, 9:00 AM Nakamura <nny...@gmail.com> wrote: > > > > > > > Hey Folks, > > > > I just posted a new proposal > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181306446 > > > > > > > > > in the wiki. I think we have an opportunity to improve the > > > > KafkaProducer#send user experience. It would certainly make our > lives > > > > easier. Please take a look! There are two subproblems that I could > > use > > > > guidance on, so I would appreciate feedback on both of them. > > > > Best, > > > > Moses > > > > > > > > > >