Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread João Peixoto
Thanks Matthias. My doubt is on a more fundamental level, but I'll ask on a separate thread as per Eno's recommendation. On Tue, May 16, 2017 at 3:26 PM Matthias J. Sax wrote: > Just to add to this: > > Streams create re-partitioning topics "lazy", meaning when the key is > (potentially) changes

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread Matthias J. Sax
Just to add to this: Streams create re-partitioning topics "lazy", meaning when the key is (potentially) changes, we only set a flag but don't add the re-partitioning topic. On groupByKey() or join() we check is this "re-partitioning required flag" is set in add the topic. Thus, if you have a st

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread Eno Thereska
(it's preferred to create another email thread for a different topic to make it easier to look back) Yes, there could be room for optimizations, e.g., see this: http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3cCAJikTEUHR=r0ika6vlf_y+qajxg8f_q19og_-s+q-gozpqb...@mail.gmail.com%

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread João Peixoto
Follow up doubt (let me know if a new question should be created). Do we always need a repartitioning topic? If I'm reading things correctly when we change the key of a record we need make sure the new key falls on the same partition that we are processing. This makes a lot of sense if after such

Re: Can state stores function as a caching layer for persistent storage

2017-05-14 Thread João Peixoto
Very useful links, thank you. Part of my original misunderstanding was that the at-least-once guarantee was considered fulfilled if the record reached a sink node. Thanks for all the feedback, you may consider my question answered. Feel free to ask further questions about the use case if found in

Re: Can state stores function as a caching layer for persistent storage

2017-05-14 Thread Matthias J. Sax
Yes. It is basically "documented", as Streams guarantees at-least-once semantics. Thus, we make sure, you will not loose any data in case of failure. (ie, the overall guarantee is documented) To achieve this, we always flush before we commit offsets. (This is not explicitly documented as it's an

Re: Can state stores function as a caching layer for persistent storage

2017-05-14 Thread João Peixoto
I think I now understand what Matthias meant when he said "If you use a global remote store, you would not need to back your changes in a changelog topic, as the store would not be lost if case of failure". I had the misconception that if a state store threw an exception during "flush", all messag

Re: Can state stores function as a caching layer for persistent storage

2017-05-13 Thread João Peixoto
Replies in line as well On Sat, May 13, 2017 at 3:25 AM Eno Thereska wrote: > Hi João, > > Some answers inline: > > > On 12 May 2017, at 18:27, João Peixoto wrote: > > > > Thanks for the comments, here are some clarifications: > > > > I did look at interactive queries, if I understood them cor

Re: Can state stores function as a caching layer for persistent storage

2017-05-13 Thread Eno Thereska
Hi João, Some answers inline: > On 12 May 2017, at 18:27, João Peixoto wrote: > > Thanks for the comments, here are some clarifications: > > I did look at interactive queries, if I understood them correctly it means > that my state store must hold all the results in order for it to be > querie

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread João Peixoto
Thanks for the comments, here are some clarifications: I did look at interactive queries, if I understood them correctly it means that my state store must hold all the results in order for it to be queried, either in memory or through disk (RocksDB). 1. The retention policy on my aggregate operati

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread Matthias J. Sax
Hi, I am not sure about your overall setup. Are your stores local (similar to RocksDB) or are you using one global remote store? If you use a global remote store, you would not need to back your changes in a changelog topic, as the store would not be lost if case of failure. Also (in case that yo

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread Eno Thereska
Hi there, A couple of general comments, plus some answers: - general comment: have you thought of using Interactive Queries to directly query the aggregate data, without needing to store them to an external database (see this blog: https://www.confluent.io/blog/unifying-stream-processing-and-i

Can state stores function as a caching layer for persistent storage

2017-05-12 Thread João Peixoto
On a stream definition I perform an "aggregate" which is configured with a state store. *Goal*: Persist the aggregation results into a database, e.g. MySQL or MongoDB *Working approach*: I have created a custom StateStore backed by a changelog topic like the builtin state stores. Whenever the sto