A colleague and I are trying to understand Samza a bit more, and the ideas behind it, realized or to be realized. We've been through some of the referenced videos/articles, documentation, and were discussing a couple of use cases that we weren't sure how would be solved.
Use case 1 is about how a multi-join query in SQL would be represented in Samza. When looking at the description of join's on the state management[1] page, the examples are of 2 table joins. We have use cases of a much higher number of table joins in order to flatten (denormalize) data to store in our reporting database. If there were N joins would this be a Samza job with N input streams, or N-1 jobs each with at most 2 input streams. Where jobs in layer 2> having one of there input streams come from the output of the previous job? Use case 2 is about how to apply referential integrity. Using a shopping cart analogy, if I have an product added to my cart. The cart is represented by an order record, the product being added is represented by an item record with a foreign key to the order record in it. In a traditional DB setting if I try to insert a item record with an order id in it, and the order with that id doesn't exist, my referential integrity checks on the DB would fail. How does this work in the Samza case, do all writes to the log (Kafka) succeed and I do the integrity check later when creating my view? [1] http://samza.apache.org/learn/documentation/0.9/container/state-management.html Thanks Josh