Re: Samza questions (downtime during deployment and num partition per task)

2017-10-31 Thread xinyu liu
Hi, Tony, For your questions: 1) Having a hot-standby job instance for fail-over may introduce certain operational complications. For example, if they produce to the same output, then both will be running in a short period of time, which might lead to duplicates in output. If the jobs has local s

Samza questions (downtime during deployment and num partition per task)

2017-10-30 Thread Tony Du
Hi, we're looking into Samza for doing real-time processing. We have couple questions w.r.t Samza functionality 1. One "must-have" requirement for us is zero/minimal downtime during deployment of Samza jobs. One approach that we're thinking of is to start a new instance of the same Samza job and m

Re: Samza questions

2015-03-26 Thread Gian Merlino
Hi Ori, Maybe an example would be useful. We use Samza to transform data for materialization in Druid, because Druid is built to index and aggregate a single event stream, but our raw data actually exists in a bunch of streams and tables that need joining. So we have Samza handle the joining and t

Re: Samza questions

2015-03-26 Thread Yi Pan
Hi, Ori, My interpretation on the MV usage in Martin's talk is exactly what you have mentioned: it is considered as a "view" instead of a regular table in DB, hence, read-only and possibly, derived data that already went through the business logic. On Thu, Mar 26, 2015 at 1:55 PM, Yan Fang wrote

Re: Samza questions

2015-03-26 Thread Yan Fang
I guess you mean "Martin", not "Matrin", here is the link for Ori's question. To give everyone a background. https://thestrangeloop.com/sessions/turning-the-database-inside-out-with-apache-samza Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Thu, Mar 26, 2015 at 3:15 AM, Ori Cohen wrote:

Samza questions

2015-03-26 Thread Ori Cohen
Hi everyone Based on Matrin's StrangeLoop "turning the database inside out" what I understand is that he meant for Samza to be a tool to pull sequential event data from a pub-sub such as Kafka, then process the data to generate materialized views. The next piece of the puzzle I couldn't figure out