[DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-23 Thread Dong Lin
Hi all, We created SEP-5: Enable partition expansion of input streams. Please find the SEP wiki in the link https://cwiki.apache.org/confluence/display/SAMZA/SEP-5%3A+Enable+partition+expansion+of+input+streams . You feedback is appreciated! Thanks, Dong

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-24 Thread Dong Lin
JobModel is persisted. We need to persist the task-to-sspList mapping in the coordinator stream so that the job can derive the original number of partitions of each input stream regardless of how many times the partition has expanded. Does this make sense? I am not sure how this is related to the

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-29 Thread Dong Lin
Dong On Wed, May 24, 2017 at 11:15 PM, Dong Lin wrote: > Hey Navina, > > Thanks much for your comments. Please see my reply inline. > > On Wed, May 24, 2017 at 10:22 AM, Navina Ramesh (Apache) < > nav...@apache.org> wrote: > >> Thanks for the SEP, Dong. I ha

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Dong Lin
how this can a "general" > solution for all input systems. That's my two cents. I would like to hear > alternative points of view for this from other devs. > > Please make sure you have enough eyes on this SEP. If you do, please start > a VOTE thread to approve this SE

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-01 Thread Dong Lin
meters: a) the previous task number read from the > > coordinator stream; b) the configured new-partition to old-partition > > mapping policy. Then, the grouper's interface method stays the same and > the > > behavior of the grouper is more configurable which is good to support a > > broader set

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-01 Thread Dong Lin
> broader set of use cases in addition to Kafka's built-in partition > expansion policies. > - Minor renaming suggestion to the new grouper class names: > GroupByPartitionWithFixedTaskNum > and GroupBySystemStreamPartitionWithFixedTaskNum > > Thanks! > > - Yi > >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-04 Thread Dong Lin
don't config/class for user to specify new-partition to old-partition mapping. Can you take another look at the proposal and let me know if there is any concern? Cheers, Dong On Thu, Jun 1, 2017 at 12:58 AM, Dong Lin wrote: > Hey Yi, > > Thanks much for the comment. I have up

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-06 Thread Dong Lin
d streams or a single stream. Would be nice to > formulate the proposal in these more general terms. > 3. When switching SSP groupers, how will the users avoid the > org.apache.samza.checkpoint.kafka.DifferingSystemStreamParti > tionGrouperFactoryValues > exception? > 4. Partition to

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-07 Thread Dong Lin
Hey Jacob, Navina, Yi, I am wondering if my answer has addressed your concern. Can you let me know if there is any concern with SEP? Thanks, Dong On Tue, Jun 6, 2017 at 11:06 PM, Dong Lin wrote: > Hey Jacob, > > Thanks for taking time to review the SEP. > > I agree with you

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
er-to-host mappings are both meaningful in > context of the JobModel. Partition-to-task mapping is not meaningful > without some definition of the key-to-partition assignments. It's > incomplete information and therefore misleading. I think it only makes > sense to use this mapping

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
ignment > because > > > we don't need to know whether this is the first time a job is started > or > > > not. On the other hand, you can write topic-to-partition-count mapping > to > > > the coordinator stream only if this is the first time the job is run >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
ture feature that utilizes this mapping > without accounting for the assumptions of this SEP is likely to > malfunction. > > I am not sure it is true that "any future feature that utilizes this mapping without accounting for the assumptions of this SEP is likely to malfuncti

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-14 Thread Dong Lin
". Suppose we allow user to specify new-to-old-partition > > mapping, then we can use the partition-to-task mapping correctly without > > replying on the assumption in this SEP, right? > > Right, but my point was that the partition->task mapping is not sufficient > by its

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-17 Thread Dong Lin
Jun 15, 2017 at 7:53 AM, Jacob Maes wrote: > > > Thanks, Dong. > > > > The summary looks accurate. > > > > I'll let the others chime in, as I believe my perspective has been > > adequately captured in this thread. > > > > -Jake > > > >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-18 Thread Dong Lin
BTW, I will update the SEP-5 wiki with our latest discussion after I have got the wiki edit access. On Sat, Jun 17, 2017 at 11:36 PM, Dong Lin wrote: > Thanks everyone for the comment! > > I am currently leaning towards the current approach. I think Kartik raised > a good point th

[VOTE] SEP-5: Enable partition expansion of input streams

2017-06-19 Thread Dong Lin
Hi everyone, Can you please vote for SEP-5? The wiki can be found at *https://cwiki.apache.org/confluence/display/SAMZA/SEP-5%3A+Enable+partition+expansion+of+input+streams .* Thanks, Dong

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Dong Lin
uns stateful job with input from Kafka and the partition size of Kafka has become too large due to increase in throughput or increase in retention time. I am not sure what kind of feature can be classified at "utmost priority". I am also not sure why a feature needs to be "utmost p

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Dong Lin
7;t think increasing task count should be a rejected > alternative. > > > I am also not sure why a feature needs to be "utmost priority" in order > to be accepted. Can you explain a bit on that? > > I don't think I ever claimed that the feature needs to be of

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-22 Thread Dong Lin
gt; and "Rejected Alternative". There is no question about the future work > > *replacing* SEP-5. Iiuc, this SEP is a subset for the partition expansion > > solution. So, I don't think increasing task count should be a rejected > > alternative. > >

Re: [VOTE] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Dong Lin
we can further scale the task count if needed. > > > > Thanks, > > Xinyu > > > > On Mon, Jun 19, 2017 at 9:27 AM, Dong Lin wrote: > > > > > Hi everyone, > > > > > > Can you please vote for SEP-5? The wiki can be found at > > &g