Thanks Matthias and Eno. For at least once guarantees to be effective in
any system, the source (Kafka receiver) needs to know that the message has
been successfully processed. What I understood is that since processing
happens in single thread, if another record is polled, it is implicitly
assumed that previous record has been successfully processed. Given that,

- When I am batching records in memory for kafka producer, is there a
possibility of data loss since records have not been committed yet and
source has already polled for more records.

- Similarly for operations such as aggregations, records have to be
buffered and they may be lost if source has already committed the offset.
The documentation suggests that the buffered memory is actually a KTable
backed by local state store and each single update to KTable is separately
into changelog topic (doesn't that add latency)


   - When the task fails and re-spawns, does it read the change-log topic
   from beginning?

Thank you for your patience.







On Aug 26, 2016 3:35 AM, "Matthias J. Sax" <matth...@confluent.io> wrote:

> Just want to add something:
>
> I you use Kafka Streams DSL, the library is Kafka centric.
>
> However, you could use low-level Processor API to get data into your
> topology from other systems. The problem will be missing fault-tolerance
> that you would need to code by yourself. When reading from Kafka,
> fault-tolerance is "for free" because Kafka Streams uses Java
> KafkaConsumer client internally.
>
> So as Eno said, it is recommended to use Kafka Connect to get data in
> and out from the Kafka cluster. Of course, you can also use a different
> tool than Connect for data import and export to/from Kafka.
>
> From Eno's answer
>
> > Multiple threads would make sense to run separate DAGs.
>
> If I understand this correctly, he means that you can write multiple
> application and connect them via a topic to get multiple threads for
> different processors (ie, the downstream application consumes the output
> from the upstream application)
>
>
> -Matthias
>
> On 08/25/2016 07:45 PM, Eno Thereska wrote:
> > Good question. All of them would run in a single thread. That is the
> model. Multiple threads would make sense to run separate DAGs.
> >
> > Eno
> >
> >
> >> On 25 Aug 2016, at 18:32, Abhishek Agarwal <abhishc...@gmail.com>
> wrote:
> >>
> >> Hi Eno,
> >>
> >> Thanks for your reply. If my application DAG has three stream
> processors,
> >> first of which is source, would all of them run in single thread? There
> may
> >> be scenarios wherein I want to have different number of threads for
> >> different processors since some may be CPU bound and some may be IO
> bound.
> >>
> >> On Thu, Aug 25, 2016 at 10:49 PM, Eno Thereska <eno.there...@gmail.com>
> >> wrote:
> >>
> >>> Hi Abhishek,
> >>>
> >>> - Correct on connecting to external stores. You can use Kafka Connect
> to
> >>> get things in or out. (Note that in the 0.10.1 release KIP-67 allows
> you to
> >>> directly query Kafka Stream's stores so, for some kind of data you
> don't
> >>> need to move it to an external store. This is pushed in trunk.)
> >>>
> >>> - You can definitely use more threads than partitions, but that will
> not
> >>> buy you much since some threads will be idle. No two threads will work
> on
> >>> the same partition, so you don't have to worry about them repeating
> work.
> >>>
> >>> Hope this helps.
> >>> Eno
> >>>
> >>>> On 25 Aug 2016, at 16:50, Abhishek Agarwal <abhishc...@gmail.com>
> wrote:
> >>>>
> >>>> Hi,
> >>>> I was reading up on kafka streams for a project and came across this
> blog
> >>>> https://softwaremill.com/kafka-streams-how-does-it-fit-strea
> m-landscape/
> >>>> I wanted to validate some assertions made in blog, with kafka
> community
> >>>>
> >>>> - Kafka streams is kafka-in, kafka-out application. Does the user need
> >>>> kafka connect to transfer data from kafka to any external store?
> >>>> - No support for asynchronous processing - Can I use more threads than
> >>>> number of partitions for processors without sacrificing at-least once
> >>>> guarantees?
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Abhishek Agarwal
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Abhishek Agarwal
> >
>
>

Reply via email to