Wow, what great timing, and what a great thread! I definitely have some good 
starters to go off of here.

If it is helpful for everyone, once I get the low-level API + ZkJobCoordinator 
+ Docker + K8s working, I'd be glad to formulate an additional sample for 
hello-samza. 

One thing I'm still curious about, is what are the drawbacks or complexities of 
leveraging the Kafka High-level consumer + PassthroughJobCoordinator in a 
stand-alone setup like this? We do have Zookeeper (because of kafka) so I think 
either would work. The Kafka High-level consumer comes with other nice tools 
for monitoring offsets, lag, etc....

Thanks guys!
-Thunder

-----Original Message-----
From: Tom Davis [mailto:t...@recursivedream.com] 
Sent: Wednesday, March 14, 2018 17:50
To: dev@samza.apache.org
Subject: Re: Old style "low level" Tasks with alternative deployment model(s)

Hey there!

You are correct that this is focused on the higher-level API but doesn't 
preclude using the lower-level API. I was at the same point you were not long 
ago, in fact, and had a very productive conversation on the list:
you should look for "Question about custom StreamJob/Factory" in the list 
archive for the past couple months.

I'll quote Jagadish Venkatraman from that thread:

> For the section on the low-level API, can you use 
> LocalApplicationRunner#runTask()? It basically creates a new 
> StreamProcessor and runs it. Remember to provide task.class and set it 
> to your implementation of StreamTask or AsyncStreamTask. Please note 
> that this is an evolving API and hence, subject to change.

I ended up just switching to the high-level API because I don't have any 
existing Tasks and the Kubernetes story is a little more straight forward there 
(there's only one container/configuration to deploy).

Best,

Tom

Thunder Stumpges <tstump...@ntent.com> writes:

> Hi all,
>
> We are using Samza (0.12.0) in about 2 dozen jobs implementing several 
> processing pipelines. We have also begun a significant move of other 
> services within our company to Docker/Kubernetes. Right now our 
> Hadoop/Yarn cluster has a mix of stream and batch "Map Reduce" jobs (many 
> reporting and other batch processing jobs). We would really like to move our 
> stream processing off of Hadoop/Yarn and onto Kubernetes.
>
> When I just read about some of the new progress in .13 and .14 I got 
> really excited! We would love to have our jobs run as simple libraries 
> in our own JVM, and use the Kafka High-Level-Consumer for partition 
> distribution and such. This would let us "dockerfy" our application and 
> run/scale in kubernetes.
>
> However as I read it, this new deployment model is ONLY for the 
> new(er) High Level API, correct? Is there a plan and/or resources for 
> adapting this back to existing low-level tasks ? How complicated of a task is 
> that? Do I have any other options to make this transition easier?
>
> Thanks in advance.
> Thunder

Reply via email to