From just looking at your description of the problem, I'd say yes, this looks like a typical scenario for Kafka Streams. Kafka Streams supports exactly once semantics too in 0.11.
Cheers Eno > On 12 Jul 2017, at 17:06, Stephen Powis <spo...@salesforce.com> wrote: > > Hey! I was hoping I could get some input from people more experienced with > Kafka Streams to determine if they'd be a good use case/solution for me. > > I have multi-tenant clients submitting data to a Kafka topic that they want > ETL'd to a third party service. I'd like to batch and group these by > tenant over a time window, somewhere between 1 and 5 minutes. At the end > of a time window then issue an API request to the third party service for > each tenant sending the batch of data over. > > Other points of note: > - Ideally we'd have exactly-once semantics, sending data multiple times > would typically be bad. But we'd need to gracefully handle things like API > request errors / service outages. > > - We currently use Storm for doing stream processing, but the long running > time-windows and potentially large amount of data stored in memory make me > a bit nervous to use it for this. > > Thoughts? Thanks in Advance! > Stephen