Re: Samza and sliding window

2015-07-06 Thread Shekar Tippur
Yi, I see incoming messages. I see that the counts are getting aggregated as well. But when I try to access it, I get null. - Shekar On Mon, Jul 6, 2015 at 4:50 PM, Yi Pan wrote: > Hi, Shekar, > > Did you take a look at the stats to see: > > 1) Is there any incoming messages? > 2) Is there any

Re: Review Request 35397: Fix Samza-697

2015-07-06 Thread Boris Shkolnik
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35397/#review90620 --- The latest diff seems incomplete. Missing some file(s). - Boris Shk

Re: Samza and sliding window

2015-07-06 Thread Yi Pan
Hi, Shekar, Did you take a look at the stats to see: 1) Is there any incoming messages? 2) Is there any messages in the changelog topic? Could you also try to change the log4j level to DEBUG to see whether we can see something in the log? Thanks! -Yi On Mon, Jul 6, 2015 at 4:43 PM, Shekar Tip

Re: Samza and sliding window

2015-07-06 Thread Shekar Tippur
Martin, As seen below, I have only 1 partition. What else could be wrong? *$ *kafka-topics.sh --describe --zookeeper localhost:2181 --topic parser Topic:parser PartitionCount:1 ReplicationFactor:1 Configs: Topic: parser Partition: 0 Leader: 0 Replicas: 0 Isr: 0 - Shekar On Mon, Jul 6, 2015

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Jay Kreps
Hey Martin, I agree coupling release schedules is a downside. Definitely we can try to solve some of the integration problems in Confluent Platform or in other distributions. But I think this ends up being really shallow. I guess I feel to really get a good user experience the two systems have to

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
{quote} There is a good chance that the different execution frameworks can be abstract out {quote} Actually, I think that even we define an abstract layer, it won't need to have Samza process be aware of the existence of the abstract layer. The Samza container running as a process can be totally un

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Timothy Chen
Hi all, I remember there was a JIRA/patch about the Mesos integration long time ago that was depending on some Samza feature to go in, and haven't been tracking after that. Not sure what the current status is, but as I see folks are discussing about abstracting cluster manager and Mesos, bunch of

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
@Jay, you got my point. {quote} I think the question is whether for the "as a service" you are proposing actually trying to build some layer over YARN/Mesos/AWS that abstracts these away? {quote} I am not very strong on this, but I do see that as an option. The reason I had is: if all we need from

RE: Thoughts and obesrvations on Samza

2015-07-06 Thread Ken Krugler
Hi Martin, As a lurker here, this has been a very interesting thread. I would suggest talking to one of the Solr committers about their experience in merging with Lucene, as that's got many similarities to what you're discussing here…though Solr was more mature when that happened. From what I'

Review Request 36224: SAMZA-728: Samza job fails due to null pointer in JobCoordinator refreshJobModel

2015-07-06 Thread Navina Ramesh
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36224/ --- Review request for samza, Yan Fang, Naveen Somasundaram, and Yi Pan (Data Infras

Re: [VOTE] Apache Samza 0.9.1 RC1

2015-07-06 Thread Martin Kleppmann
+1 (binding) on RC1. Verified sig, built, tested with hello-samza. On 2 Jul 2015, at 19:22, Yi Pan wrote: > Hi, Jakob, > > A quick code search suggests that you were on Kafka. :) > > It sounds good to me to extend till 5pm 7/6 for the vote, to count for the > US holidays. Any objections? > >

Re: Samza and sliding window

2015-07-06 Thread Martin Kleppmann
Hi Shekar, The store.all() iterator ought to give you the entire contents of the store. However, note that each partition of the input topic results in a separate StreamTask instance, which in turn has a separate store. So there will be as many stores as there are input partitions. Perhaps you'

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Jay Kreps
Hey Yi/Martin, With respect to state reuse: Yeah that is a better explanation of what I was trying to say. That prototype actually includes state reuse (each task checkpoints it's offset for the various partitions during the commit and uses that to determine if the state is valid and can be reused

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Martin Kleppmann
Ok, thanks for the clarifications. Just a few follow-up comments. - I see the appeal of merging with Kafka or becoming a subproject: the reasons you mention are good. The risk I see is that release schedules become coupled to each other, which can slow everyone down, and large projects with many

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
Hi, Martin, Great to hear your voice! I will just try to focus on your questions regarding to "w/o YARN" part. {quote} For example, would host affinity (SAMZA-617) still be possible? {quote} It would be possible if we separate the job execution/process launching from the partition assignment amon

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Jay Kreps
Hey Martin, For the YARN/Mesos/etc decoupling I actually don't think it ties our hands at all, all it does is refactor things. The division of responsibility is that Samza core is responsible for task lifecycle, state, and partition management (using the Kafka co-ordinator) but it is NOT responsib

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
Hi, Guozhang, {quote} but I think if we decide to go this route we'd better do it now than later as the protocol is not officially "released" yet. This may delay the first release of the new consumer. {quote} I totally agree. Given that potential heavy migration cost later, I think that a slight d

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
Hi, Gianmarco, {quote} However, I think the fundamental operation that Samza, Copycat, and Kafka consumers should agree upon is "how can I specify in a simple and transparent way which partitions I want to consume, and how?". {quote} I agree that some basic partition distribution mechanism can be

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Martin Kleppmann
Hi all, Lots of good thoughts here. I agree with the general philosophy of tying Samza more firmly to Kafka. After I spent a while looking at integrating other message brokers (e.g. Kinesis) with SystemConsumer, I came to the conclusion that SystemConsumer tacitly assumes a model so much like