Thanks Yan & Yi. On Wed, Oct 28, 2015 at 11:00 AM, Yi Pan <nickpa...@gmail.com> wrote:
> Hi, Chen, > > > > On Wed, Oct 28, 2015 at 4:05 AM, Yan Fang <yanfangw...@163.com> wrote: > > > > > > > * Is there a tentative date for 0.10.0 release? > > I think it's coming out soon. @Yi Pan , he should know more about > that. > > > > There is a bit delay on the release date due to a recent bug we discovered > in test. The targeted date would be in Nov. > > > > > > > * I checked the checkpoint topic for Samza job and it seems the > checkpoint > > topic is created with1 partition by default. Given that each Samza task > > will need to read from checkpoint topic, it is similar to what I need to > > read (Each Samza task is reading from the same partition of a topic). I > am > > wondering how is that achieved? > > In current implementation, only the AM reads the checkpoint stream > and > > distribute the information to all the nodes using the http server. Not > all > > the nodes are consuming the checkpoint stream. Correct me if I am wrong. > > > > The checkpoint topic is a special one that the containers only read during > the start up phase. Hence, it is not considered as part of the > SystemStreamPartitions that are assigned to the tasks. As Yan mentioned, > broadcast stream in 0.10 is the solution to your use case. > > Thanks! > > > > > > > > Thanks, > > Yan > > > > > > > > > > > > > > At 2015-10-28 02:49:23, "Chen Song" <chen.song...@gmail.com> wrote: > > >Thanks Yan. > > > > > >* Is there a tentative date for 0.10.0 release? > > >* I checked the checkpoint topic for Samza job and it seems the > checkpoint > > >topic is created with1 partition by default. Given that each Samza task > > >will need to read from checkpoint topic, it is similar to what I need to > > >read (Each Samza task is reading from the same partition of a topic). I > am > > >wondering how is that achieved? > > > > > >Chen > > > > > >On Sat, Oct 24, 2015 at 5:52 AM, Yan Fang <yanfangw...@163.com> wrote: > > > > > >> Hi Chen Song, > > >> > > >> > > >> Sorry for the late reply. What you describe is a typical bootstrap use > > >> case. Check > > >> > http://samza.apache.org/learn/documentation/0.9/container/streams.html > > , > > >> the bootstrap configuration. By using this one, Samza will always read > > the > > >> *topicR* from the beginning when it restarts. And then it treats the > > >> *topicR* as a normal topic after reading existing msgs in the > *topicD*. > > >> > > >> > > >> == can we configure each individual Samza task to read data from all > > >> partitions from a topic? > > >> It works in the 0.10.0 by using the broadcast stream. In the 0.9.0, > you > > >> have to "create topicR with the same number of partitions as *topicD*, > > and > > >> replicate data to all partitions". > > >> > > >> > > >> Hope this still helps. > > >> > > >> > > >> Thanks, > > >> Yan > > >> > > >> > > >> At 2015-10-22 04:44:41, "Chen Song" <chen.song...@gmail.com> wrote: > > >> >In our samza app, we need to read data from MySQL (reference table) > > with a > > >> >stream. So the requirements are > > >> > > > >> >* Read data into each Samza task before processing any message. > > >> >* The Samza task should be able to listen to updates happening in > > MySQL. > > >> > > > >> >I did some research after scanning through some relevant > conversations > > and > > >> >JIRAs on the community but did not find a solution yet. Neither I > find > > a > > >> >recommended way to do this. > > >> > > > >> >If my data streams comes from a topic called *topicD*, options in my > > mind > > >> >are: > > >> > > > >> > - Use Kafka > > >> > 1. Use one of CDC based solution to replicate data in MySQL to > a > > >> > topic Kafka. > > https://github.com/wushujames/mysql-cdc-projects/wiki. > > >> > Say the topic is called *topicR*. > > >> > 2. In my Samza app, read reference table from *topicR *and > > persisted > > >> > in a cache in each Samza task's local storage. > > >> > - If the data in *topicR *is NOT partitioned in the same way > > as > > >> > *topicD*, can we configure each individual Samza task to > read > > >> data > > >> > from all partitions from a topic? > > >> > - If the answer to the above question is no, do I need to > > >> >create *topicR > > >> > *with the same number of partitions as *topicD*, and > replicate > > >> > data to all partitions? > > >> > - On start, how to make Samza task to block processing the > > first > > >> > message from *topicD* before reading all data from *topicR*. > > >> > 3. Any new updates/deletes to *topicR *will be consumed to > update > > >> the > > >> > local cache of each Samza task. > > >> > 4. On failure or restarts, each Samza task will read from the > > >> > beginning from *topicR*. > > >> > - Not Use Kafka > > >> > - Each Samza task reads a Snapshot of database and builds its > > local > > >> > cache, and it then needs to read periodically to update its > > >> >local cache. I > > >> > have read about a few blogs, and this doesn't sound a solid way > > >> >in the long > > >> > term. > > >> > > > >> >Any thoughts? > > >> > > > >> >Chen > > >> > > > >> > - > > >> > > > >> >-- > > >> >Chen Song > > >> > > > > > > > > > > > >-- > > >Chen Song > > > -- Chen Song