A colleague and I are trying to understand Samza a bit more, and the ideas
behind it, realized or to be realized. We've been through some of the
referenced videos/articles, documentation, and were discussing a couple of
use cases that we weren't sure how would be solved.

Use case 1 is about how a multi-join query in SQL would be represented in
Samza. When looking at the description of join's on the state management[1]
page, the examples are of 2 table joins. We have use cases of a much higher
number of table joins in order to flatten (denormalize) data to store in
our reporting database. If there were N joins would this be a Samza job
with N input streams, or N-1 jobs each with at most 2 input streams. Where
jobs in layer 2> having one of there input streams come from the output of
the previous job?

Use case 2 is about how to apply referential integrity. Using a shopping
cart analogy, if I have an product added to my cart. The cart is
represented by an order record, the product being added is represented by
an item record with a foreign key to the order record in it. In a
traditional DB setting if I try to insert a item record with an order id in
it, and the order with that id doesn't exist, my referential integrity
checks on the DB would fail. How does this work in the Samza case, do all
writes to the log (Kafka) succeed and I do the integrity check later when
creating my view?

[1]
http://samza.apache.org/learn/documentation/0.9/container/state-management.html

Thanks
Josh

Reply via email to