Wow, what great timing, and what a great thread! I definitely have some good starters to go off of here.
If it is helpful for everyone, once I get the low-level API + ZkJobCoordinator + Docker + K8s working, I'd be glad to formulate an additional sample for hello-samza. One thing I'm still curious about, is what are the drawbacks or complexities of leveraging the Kafka High-level consumer + PassthroughJobCoordinator in a stand-alone setup like this? We do have Zookeeper (because of kafka) so I think either would work. The Kafka High-level consumer comes with other nice tools for monitoring offsets, lag, etc.... Thanks guys! -Thunder -----Original Message----- From: Tom Davis [mailto:t...@recursivedream.com] Sent: Wednesday, March 14, 2018 17:50 To: dev@samza.apache.org Subject: Re: Old style "low level" Tasks with alternative deployment model(s) Hey there! You are correct that this is focused on the higher-level API but doesn't preclude using the lower-level API. I was at the same point you were not long ago, in fact, and had a very productive conversation on the list: you should look for "Question about custom StreamJob/Factory" in the list archive for the past couple months. I'll quote Jagadish Venkatraman from that thread: > For the section on the low-level API, can you use > LocalApplicationRunner#runTask()? It basically creates a new > StreamProcessor and runs it. Remember to provide task.class and set it > to your implementation of StreamTask or AsyncStreamTask. Please note > that this is an evolving API and hence, subject to change. I ended up just switching to the high-level API because I don't have any existing Tasks and the Kubernetes story is a little more straight forward there (there's only one container/configuration to deploy). Best, Tom Thunder Stumpges <tstump...@ntent.com> writes: > Hi all, > > We are using Samza (0.12.0) in about 2 dozen jobs implementing several > processing pipelines. We have also begun a significant move of other > services within our company to Docker/Kubernetes. Right now our > Hadoop/Yarn cluster has a mix of stream and batch "Map Reduce" jobs (many > reporting and other batch processing jobs). We would really like to move our > stream processing off of Hadoop/Yarn and onto Kubernetes. > > When I just read about some of the new progress in .13 and .14 I got > really excited! We would love to have our jobs run as simple libraries > in our own JVM, and use the Kafka High-Level-Consumer for partition > distribution and such. This would let us "dockerfy" our application and > run/scale in kubernetes. > > However as I read it, this new deployment model is ONLY for the > new(er) High Level API, correct? Is there a plan and/or resources for > adapting this back to existing low-level tasks ? How complicated of a task is > that? Do I have any other options to make this transition easier? > > Thanks in advance. > Thunder