Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Navina Ramesh
> On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala, > > lines 110-112 > > > > > > personal opinion: should this logic be another method, such

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Navina Ramesh
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33453/ --- (Updated May 6, 2015, 6:22 a.m.) Review request for samza, Yan Fang, Chris Ricc

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Navina Ramesh
> On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala, > > line 55 > > > > > > 1.is it also possible to retrived this value from the config? Then

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Navina Ramesh
> On May 5, 2015, 11:21 p.m., Naveen Somasundaram wrote: > > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, > > line 451 > > > > > > nit: Prefer changelogged over logged, as it seems consis

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Yan Fang
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33453/#review82610 --- samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfi

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Navina Ramesh
> On May 5, 2015, 11:21 p.m., Naveen Somasundaram wrote: > > samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala, > > line 55 > > > > > > Can you also please update the configuration docs (or som

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Naveen Somasundaram
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33453/#review82593 --- samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfi

Re: Review Request 33761: Fix SAMZA-658

2015-05-05 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33761/#review82604 --- Overall looks good. Just some minor comments. samza-kv/src/main/sc

Re: What next for streaming SQL?

2015-05-05 Thread Julian Hyde
Sliding windows are so different from tumbling windows that it doesn’t make sense to give them a “window key”. With sliding windows, every row has its own window. So, given a row R you determine what rows are in R’s window based on its windowing criteria, e.g. “rows between 5 preceding and 5 fol

Re: What next for streaming SQL?

2015-05-05 Thread Yi Pan
Hi, Julian, Great! I am looking forward to it. Could you help to answer my question regarding to the sliding windows in the previous email? Thanks a lot! -Yi On Tue, May 5, 2015 at 10:46 AM, Julian Hyde wrote: > > On May 4, 2015, at 10:52 AM, Yi Pan wrote: > > > Just one observation that I w

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Yi Pan
Hi, Andreas, Are you describing a use case where the *same* copy of data is shared among all tasks? That will depend on a lot factors: 1. is your data size huge? 2. Can your data be partitioned to work with a single partition of input stream? 3. Do you have a means to bootstrap the data from a str

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Andreas Simanowski
Hi Yan, thanks for the reply. So yes, you are correct it would not be random which partition a message hits. We would use a partition key (sorry I missed that). The "data" I was referring to is the local KV-store data for each task. Is there a way to synchronize or replicate the data from the KV-

Re: Questions regarding Samza in production

2015-05-05 Thread Yi Pan
Hi, Jose, Good to know that you chose Samza! I will embed my answers inline below: On Mon, May 4, 2015 at 5:02 PM, José Barrueta wrote: > > - I assume caching will help a lot with serialization/deserialization of > the Value, but have you guys used the value to be of type other than > primiti

Re: Input stream partition to task mapping

2015-05-05 Thread Yan Fang
If the rule does not change, we can get the rule when starting up StreamTasks and then localize it. Cheers, Fang, Yan yanfang...@gmail.com On Tue, May 5, 2015 at 2:41 PM, Yan Fang wrote: > "If I understand it correctly the only viable solution at the moment is to > create a new stream for the

Re: Review Request 33419: SAMZA-625: Provide tool to consume changelog and materialize a state store

2015-05-05 Thread Yan Fang
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33419/ --- (Updated May 5, 2015, 9:56 p.m.) Review request for samza. Changes --- r

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Yan Fang
Hi Andreas, Not quite understand this part "Because the messages coming into the input stream are random (i.e. can hit any partition and therefore any task), each task will need its own copy of the data (i.e. the data needs to be duplicated across each task)." Messages come into the input stream

Re: Input stream partition to task mapping

2015-05-05 Thread Yan Fang
"If I understand it correctly the only viable solution at the moment is to create a new stream for the rules messages with as many partitions as the data stream and write each rules update message to all partitions of the new rules stream." If the data is constantly changing, yes, AFAIK, this is

Re: Input stream partition to task mapping

2015-05-05 Thread Ueli Gallizzi
Hi Yan, Thanks for your quick response. After I read the discussion on SAMZA-353 I think the best solution for my use case is a "shared state" store among StreamTasks described in SAMZA-402. To give you some background I have a stream with rules which are constantly changing and a data stream on

Local state in Samza - sharing data between tasks

2015-05-05 Thread Andreas Simanowski
Hello Samza community: I am very new to Samza and currently looking at how to use Samza and its key-value store. I have run into the following and was hoping someone could point me in the right direction. Say we have an input stream being consumed by more than one task (one task per partition). E

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-05 Thread Navina Ramesh
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33453/ --- (Updated May 5, 2015, 6:47 p.m.) Review request for samza, Yan Fang, Chris Ricc

Re: What next for streaming SQL?

2015-05-05 Thread Julian Hyde
On May 4, 2015, at 10:52 AM, Yi Pan wrote: > Just one observation that I wanted to add in: I noted that actually any > range-based query clause on an ordered stream essentially means the need > for a windowing method in the ordered stream scan. Is it possible to > identify a common syntax expres

Re: Input stream partition to task mapping

2015-05-05 Thread Yan Fang
Hi Ueli, This feature currently is not supported by Samza. There was some discussions in the JIRA - SAMZA-353 . But there are some workaround for this, depends on what you want to achieve. If you can specify what your requirement is, we can help th

Input stream partition to task mapping

2015-05-05 Thread Ueli Gallizzi
Hi, Is it possible that multiple tasks read from the same input stream partition? example: task 0 stream A partition 0, stream B partition 0 task 1 stream A partition 1, stream B partition 0 task 2 stream A partition 3, stream B partition 0 In this example all messages in stream B partition 0 wo