Hi Ori, Maybe an example would be useful. We use Samza to transform data for materialization in Druid, because Druid is built to index and aggregate a single event stream, but our raw data actually exists in a bunch of streams and tables that need joining. So we have Samza handle the joining and then send the result off to Druid for indexing. In my mind the advantage of this approach is that a stream processor is better at generating derived views of streaming data than most databases are (if the db can do it at all- many can't).
You're right that the way this looks is that Samza will write a new Kafka topic that is basically a changelog of the materialized view. Derived topics like that are common when using Samza in this way. Gian On Thu, Mar 26, 2015 at 3:15 AM, Ori Cohen <o...@ori-cohen.com> wrote: > Hi everyone > > Based on Matrin's StrangeLoop "turning the database inside out" what I > understand is that he meant for Samza to be a tool to pull sequential event > data from a pub-sub such as Kafka, then process the data to generate > materialized views. The next piece of the puzzle I couldn't figure out. > The materialized views are meant to be used as a cache level and to be > read-from instead of reading directly from the DB or other application > cache. So the data store to which Samza would output the data to is another > DB layer, maybe mem-based like redis. This like the traditional DB, because > it also gets both reads and writes, so what is the point? If it will be > optimized for read, than data consistency will take more time. > Is it because the store data is eventually consistent and the store holds > data already processed by business logic? > If views are meant to subscribe to changes in the stores (the actual > materialized views), then we need additional Kafka topics or other pub-sub > mechanism that will contain change events of the materialized views, which > complicates things even more. > > Please help me out here. > > Regards, > Ori >