Re: please unsubscribe me from this samza list - thx

Jake Maes Sat, 17 Mar 2018 18:55:41 -0700

Hey Roger,

Try sending an email to this address:
[email protected]


The links for Samza mailing lists are provided on the mailing lists page:
http://samza.apache.org/community/mailing-lists.html

Hope this helps.

-Jake

On Fri, Mar 16, 2018 at 8:51 PM, Roger Sill <[email protected]>
wrote:

> please unsubscribe me from this samza list -thx
>
> Also - your advice - please.  I am using the instructions:
> REQUEST ADDRESSES FOR [UN]SUBSCRIBING
> To get off a list, send a message to:
> [email protected]
>
> so, I send it to [email protected]
> and I get a bounce back that it does not recognize that
> mailbox.  What am I doing wrong???
> I need to do this with some other lists too and the same
> thing happens.
>
> Roger Sill
> Field Recruiter
> [email protected]
> 408-926-6212
>
> -----Original Message-----
> From: Thunder Stumpges <[email protected]>
> Sent: Friday, March 16, 2018 2:43 PM
> To: [email protected]; Jagadish Venkatraman
> <[email protected]>
> Cc: [email protected]; [email protected]; Yi Pan
> <[email protected]>
> Subject: RE: Old style "low level" Tasks with alternative
> deployment model(s)
>
> Well I have my stand-alone application in docker and running
> in kubernetes. I think something isn't wired up all the way
> though, because my task never actually gets invoked. I see
> no errors, however I'm not getting the usual startup logs
> (checking existing offsets, "entering run loop"...) My logs
> look like this:
>
> 2018-03-16 21:05:55 logback 50797 [debounce-thread-0] INFO
> kafka.utils.VerifiableProperties - Verifying properties
> 2018-03-16 21:05:55 logback 50797 [debounce-thread-0] INFO
> kafka.utils.VerifiableProperties - Property client.id is
> overridden to samza_admin-test_stream_task-1
> 2018-03-16 21:05:55 logback 50798 [debounce-thread-0] INFO
> kafka.utils.VerifiableProperties - Property
> metadata.broker.list is overridden to
> test-kafka-kafka.test-svc:9092
> 2018-03-16 21:05:55 logback 50798 [debounce-thread-0] INFO
> kafka.utils.VerifiableProperties - Property
> request.timeout.ms is overridden to 30000
> 2018-03-16 21:05:55 logback 50799 [debounce-thread-0] INFO
> kafka.client.ClientUtils$ - Fetching metadata from broker
> BrokerEndPoint(0,test-kafka-kafka.test-svc,9092) with
> correlation id 0 for 1 topic(s)
> Set(dev_k8s.samza.test.topic)
> 2018-03-16 21:05:55 logback 50800 [debounce-thread-0] DEBUG
> kafka.network.BlockingChannel - Created socket with
> SO_TIMEOUT = 30000 (requested 30000), SO_RCVBUF = 179680
> (requested -1), SO_SNDBUF = 102400 (requested 102400),
> connectTimeoutMs = 30000.
> 2018-03-16 21:05:55 logback 50800 [debounce-thread-0] INFO
> kafka.producer.SyncProducer - Connected to
> test-kafka-kafka.test-svc:9092 for producing
> 2018-03-16 21:05:55 logback 50804 [debounce-thread-0] INFO
> kafka.producer.SyncProducer - Disconnecting from
> test-kafka-kafka.test-svc:9092
> 2018-03-16 21:05:55 logback 50804 [debounce-thread-0] DEBUG
> kafka.client.ClientUtils$ - Successfully fetched metadata
> for 1 topic(s) Set(dev_k8s.samza.test.topic)
> 2018-03-16 21:05:55 logback 50813 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ -
> SystemStreamPartitionGrouper
> org.apache.samza.container.grouper.stream.GroupByPartition@1
> a7158cc has grouped the SystemStreamPartitions into 10 tasks
> with the following taskNames: [Partition 1, Partition 0,
> Partition 3, Partition 2, Partition 5, Partition 4,
> Partition 7, Partition 6, Partition 9, Partition 8]
> 2018-03-16 21:05:55 logback 50818 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 0 is
> being assigned changelog partition 0.
> 2018-03-16 21:05:55 logback 50819 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 1 is
> being assigned changelog partition 1.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 2 is
> being assigned changelog partition 2.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 3 is
> being assigned changelog partition 3.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 4 is
> being assigned changelog partition 4.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 5 is
> being assigned changelog partition 5.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 6 is
> being assigned changelog partition 6.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 7 is
> being assigned changelog partition 7.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 8 is
> being assigned changelog partition 8.
> 2018-03-16 21:05:55 logback 50820 [debounce-thread-0] INFO
> o.a.s.coordinator.JobModelManager$ - New task Partition 9 is
> being assigned changelog partition 9.
> 2018-03-16 21:05:55 logback 50838
> [main-SendThread(10.0.127.114:2181)] DEBUG
> org.apache.zookeeper.ClientCnxn - Reading reply
> sessionid:0x1622c8b5fc01ac7, packet:: clientPath:null
> serverPath:null finished:false header:: 23,4  replyHeader::
> 23,14024,0  request::
> '/app-test_stream_task-1/dev_test_stream_task-1-coordination
> Data/JobModelGeneration/jobModelVersion,T  response::
> ,s{13878,13878,1521234010089,1521234010089,0,0,0,0,0,0,13878
> }
> 2018-03-16 21:05:55 logback 50838 [debounce-thread-0] INFO
> o.apache.samza.zk.ZkJobCoordinator -
> pid=a14a0434-a238-4ff6-935b-c78d906fe80dGenerated new Job
> Model. Version = 1
> 2018-03-16 21:06:05 logback 60848
> [main-SendThread(10.0.127.114:2181)] DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping response for
> sessionid: 0x1622c8b5fc01ac7 after 2ms
> 2018-03-16 21:06:15 logback 70856
> [main-SendThread(10.0.127.114:2181)] DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping response for
> sessionid: 0x1622c8b5fc01ac7 after 1ms
> 2018-03-16 21:06:25 logback 80865
> [main-SendThread(10.0.127.114:2181)] DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping response for
> sessionid: 0x1622c8b5fc01ac7 after 2ms ...
>
> The zk ping responses continue every 10 seconds, but no
> other activity or messages occur.
> It looks like it gets as far as confirming the JobModel and
> grouping the partitions, but nothing actually starts up.
>
> Any ideas?
> Thanks in advance!
> Thunder
>
>
> -----Original Message-----
> From: Thunder Stumpges [mailto:[email protected]]
> Sent: Thursday, March 15, 2018 16:35
> To: Jagadish Venkatraman <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; Yi Pan <[email protected]>
> Subject: RE: Old style "low level" Tasks with alternative
> deployment model(s)
>
> Thanks a lot for the info. I have something basically
> working at this point! I have not integrated it with Docker
> nor Kubernetes yet, but it does run from my local machine.
>
> I have determined that LocalApplicationRunner does NOT do
> config rewriting. I had to write my own little
> "StandAloneApplicationRunner" that handles the "main"
> entrypoint. It does command parsing using CommandLine, load
> config from ConfigFactory, and perform rewriting before
> creating the new instance of LocalApplicationRunner. This is
> all my StandAloneApplicationRunner contains:
>
>
> object StandAloneSamzaRunner extends App with LazyLogging {
>
>   // parse command line args just like JobRunner.
>   val cmdline = new ApplicationRunnerCommandLine
>   val options = cmdline.parser.parse(args: _*)
>   val config = cmdline.loadConfig(options)
>
>   val runner = new
> LocalApplicationRunner(Util.rewriteConfig(config))
>   runner.runTask()
>   runner.waitForFinish()
> }
>
> The only config settings I needed to make to use this runner
> were (easily configured due to our central Consul config
> system and our rewriter) :
>
> # use the ZK based job coordinator
> job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinator
> Factory
> # need to use GroupByContainerIds instead of
> GroupByContainerCount
> task.name.grouper.factory=org.apache.samza.container.grouper
> .task.GroupByContainerIdsFactory
> # ZKJC config
> job.coordinator.zk.connect=<our_zk_connection>
>
> I did run into one potential problem; as you see above, I
> have started the task using runTask() and then to prevent my
> main method from returning, I have called waitForFinish().
> The first time I ran it, the job itself failed because I had
> forgotten to override the task grouper, and container count
> was pulled from our staging environment. There are some
> failures logged and it appears the JobCoordinator fails, but
> it never returns from waitForFinish. Stack trace and
> continuation of log is below:
>
> 2018-03-15 22:34:32 logback 77786 [debounce-thread-0] ERROR
> o.a.s.zk.ScheduleAfterDebounceTime - Execution of action:
> OnProcessorChange failed.
> java.lang.IllegalArgumentException: Your container count (4)
> is larger than your task count (2). Can't have containers
> with nothing to do, so aborting.
>        at
> org.apache.samza.container.grouper.task.GroupByContainerCoun
> t.validateTasks(GroupByContainerCount.java:212)
>        at
> org.apache.samza.container.grouper.task.GroupByContainerCoun
> t.group(GroupByContainerCount.java:62)
>        at
> org.apache.samza.container.grouper.task.TaskNameGrouper.grou
> p(TaskNameGrouper.java:56)
>        at
> org.apache.samza.coordinator.JobModelManager$.readJobModel(J
> obModelManager.scala:266)
>        at
> org.apache.samza.coordinator.JobModelManager.readJobModel(Jo
> bModelManager.scala)
>        at
> org.apache.samza.zk.ZkJobCoordinator.generateNewJobModel(ZkJ
> obCoordinator.java:306)
>        at
> org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJ
> obCoordinator.java:197)
>        at
> org.apache.samza.zk.ZkJobCoordinator$LeaderElectorListenerIm
> pl.lambda$onBecomingLeader$0(ZkJobCoordinator.java:318)
>        at
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getSche
> duleableAction$0(ScheduleAfterDebounceTime.java:134)
>        at
> java.util.concurrent.Executors$RunnableAdapter.call$$$captur
> e(Executors.java:511)
>        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executor
> s.java)
>        at
> java.util.concurrent.FutureTask.run$$$capture(FutureTask.jav
> a:266)
>        at
> java.util.concurrent.FutureTask.run(FutureTask.java)
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
> tureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
> tureTask.run(ScheduledThreadPoolExecutor.java:293)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1142)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
> 2018-03-15 22:34:32 logback 77787 [debounce-thread-0] DEBUG
> o.a.samza.processor.StreamProcessor - Container is not
> instantiated yet.
> 2018-03-15 22:34:32 logback 77787 [debounce-thread-0] DEBUG
> org.I0Itec.zkclient.ZkClient - Closing ZkClient...
> 2018-03-15 22:34:32 logback 77789
> [ZkClient-EventThread-15-10.0.127.114:2181] INFO
> org.I0Itec.zkclient.ZkEventThread - Terminate ZkClient event
> thread.
>
> And then the application continues on with metric reporters,
> and other debug logging (not actually running the task
> though)
>
> Thanks in advance for the guidance, this has been easier
> than I imagined! I'll report back when I get more of the
> Dockerization/Kubernetes running and test it a bit more.
> Cheers,
> Thunder
>
>
> From: Jagadish Venkatraman [mailto:[email protected]]
> Sent: Thursday, March 15, 2018 14:46
> To: Thunder Stumpges <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; Yi Pan <[email protected]>
> Subject: Re: Old style "low level" Tasks with alternative
> deployment model(s)
>
> >>  Thanks for the info on the tradeoffs. That makes a lot
> of sense. I am on-board with using ZkJobCoordinator, sounds
> like some good benefits over just the Kafka high-level
> consumer.
>
> This certainly looks like the simplest alternative.
>
> For your other questions, please find my answers inline.
>
> >> Q1: If I use LocalApplicationRunner, It does not use
> "ProcessJobFactory" (or any StreamJob or *Job classes)
> correct?
>
> Your understanding is correct. It directly instantiates the
> StreamProcessor, which in-turn creates and runs the
> SamzaContainer.
>
> >> Q2: If I use LocalApplicationRunner, I will need to code
> myself the loading and rewriting of the Config that is
> currently handled by JobRunner, correct?
>
> I don't think you'll need to do this. IIUC, the
> LocalApplicationRunner should automatically invoke rewriters
> and do the right thing.
>
> >>  Q3: Do I need to also handle coordinator stream(s) and
> storing of config that is done in JobRunner (I don't think
> so as the ?
>
> I don't think this is necessary either. The creation of
> coordinator stream and persisting configuration happens in
> the LocalApplicationRunner (more specifically in
> StreamManager#createStreams).
>
> >> Q4: Where/How do I specify the Container ID for each
> instance? Is there a config setting that I can pass, (or
> pull from an env variable and add to the config) ? I am
> assuming it is my responsibility to ensure that each
> instance is started with a unique container ID..?
>
> Nope, If you are using the ZkJobCoordinator, you need not
> have to worry about assigning IDs for each instance. The
> framework will automatically take care of generating IDs and
> reaching consensus by electing a leader. If you are curious
> please take a look at implementations of the
> ProcessorIdGenerator interface.
>
> Please let us know should you have further questions!
>
> Best,
> Jagdish
>
> On Thu, Mar 15, 2018 at 11:48 AM, Thunder Stumpges
> <[email protected]<mailto:[email protected]>> wrote:
>
> Thanks for the info on the tradeoffs. That makes a lot of
> sense. I am on-board with using ZkJobCoordinator, sounds
> like some good benefits over just the Kafka high-level
> consumer.
>
>
>
> To that end, I have made some notes on possible approaches
> based on the previous thread, and from my look into the
> code. I'd love to get feedback.
>
>
>
> Approach 1. Configure jobs to use "ProcessJobFactory" and
> run instances of the job using run-job.sh or using JobRunner
> directly.
>
> I don't think this makes sense from what I can see for a few
> reasons:
>
>   *   JobRunner is concerned with stuff I don't *think* we
> need:
>
>      *   coordinatorSystemProducer|Consumer,
>      *   writing/reading the configuration to the
> coordinator streams
>
>   *   ProcessJobFactory hard-codes the ID to "0" so I don't
> think that will work for multiple instances.
>
>
>
> Approach 2. Configure ZkJobCoordinator, GroupByContainerIds,
> and invoke LocalApplicationRunner.runTask()
>
>
>
>     Q1: If I use LocalApplicationRunner, It does not use
> "ProcessJobFactory" (or any StreamJob or *Job classes)
> correct?
>
>     Q2: If I use LocalApplicationRunner, I will need to code
> myself the loading and rewriting of the Config that is
> currently handled by JobRunner, correct?
>
>     Q3: Do I need to also handle coordinator stream(s) and
> storing of config that is done in JobRunner (I don't think
> so as the ?
>
>     Q4: Where/How do I specify the Container ID for each
> instance? Is there a config setting that I can pass, (or
> pull from an env variable and add to the config) ? I am
> assuming it is my responsibility to ensure that each
> instance is started with a unique container ID..?
>
> I am getting started on the above (Approach 2.), and looking
> closer at the code so I may have my own answers to my
> questions, but figured I should go ahead and ask now anyway.
> Thanks!
>
> -Thunder
>
>
> From: Jagadish Venkatraman
> [mailto:[email protected]<mailto:[email protected]
> >]
> Sent: Thursday, March 15, 2018 1:41
> To: [email protected]<mailto:[email protected]>;
> Thunder Stumpges
> <[email protected]<mailto:[email protected]>>;
> [email protected]<mailto:[email protected]>
> Cc: [email protected]<mailto:[email protected]>; Yi Pan
> <[email protected]<mailto:[email protected]>>
>
> Subject: Re: Old style "low level" Tasks with alternative
> deployment model(s)
>
> >> You are correct that this is focused on the higher-level
> API but
> >> doesn't
> preclude using the lower-level API. I was at the same point
> you were not long ago, in fact, and had a very productive
> conversation on the list
>
> Thanks Tom for linking the thread, and I'm glad that you
> were able to get Kubernetes integration working with Samza.
>
> >> If it is helpful for everyone, once I get the low-level
> API +
> >> ZkJobCoordinator + Docker +
> K8s working, I'd be glad to formulate an additional sample
> for hello-samza.
>
> @Thunder Stumpges:
> We'd be thrilled to receive your contribution. Examples,
> demos, tutorials etc.
> contribute a great deal to improving the ease of use of
> Apache Samza. I'm happy to shepherd design
> discussions/code-reviews in the open-source including
> answering any questions you may have.
>
>
> >> One thing I'm still curious about, is what are the
> drawbacks or
> >> complexities of leveraging the Kafka High-level consumer
> +
> >> PassthroughJobCoordinator in a stand-alone setup like
> this? We do
> >> have Zookeeper (because of kafka) so I think either would
> work. The
> >> Kafka High-level consumer comes with other nice tools for
> monitoring
> >> offsets, lag, etc
>
>
> @Thunder Stumpges:
>
> Samza uses a "Job-Coordinator" to assign your
> input-partitions among the different instances of your
> application s.t. they don't overlap. A typical way to solve
> this "partition distribution"
> problem is to have a single instance elected as a "leader"
> and have the leader assign partitions to the group.
> The ZkJobCoordinator uses Zk primitives to achieve this,
> while the YarnJC relies on Yarn's guarantee that there will
> be a singleton-AppMaster to achieve this.
>
> A key difference that separates the PassthroughJC from the
> Yarn/Zk variants is that it does _not_ attempt to solve the
> "partition distribution" problem. As a result, there's no
> leader-election involved. Instead, it pushes the problem of
> "partition distribution" to the underlying consumer.
>
> The PassThroughJc supports these 2 scenarios:
>
> 1. Consumer-managed partition distribution: When using the
> Kafka high-level consumer (or an AWS KinesisClientLibrary
> consumer) with Samza, the consumer manages partitions
> internally.
>
> 2. Static partition distribution: Alternately, partitions
> can be managed statically using configuration. You can
> achieve static partition assignment by implementing a custom
> SystemStreamPartitionGrouper<https://samza.apache.org/learn/
> documentation/0.8/api/javadocs/org/apache/samza/container/gr
> ouper/stream/SystemStreamPartitionGrouper.html
> <http://s.bl-1.com/h/ccJRHVJT?url=https://samza.apache.org/l
> earn/documentation/0.8/api/javadocs/org/apache/samza/contain
> er/grouper/stream/SystemStreamPartitionGrouper.html> > and
> TaskNameGrouper<https://github.com/apache/samza/blob/master/
> samza-core/src/main/java/org/apache/samza/container/grouper/
> task/TaskNameGrouper.java
> <http://s.bl-1.com/h/ccJRJbjW?url=https://github.com/apache/
> samza/blob/master/samza-core/src/main/java/org/apache/samza/
> container/grouper/task/TaskNameGrouper.java> >. Solutions in
> this category will typically require you to distinguish the
> various processors in the group by providing an "id" for
> each.
> Once the "id"s are decided, you can then statically compute
> assignments using a function (eg: modulo N).
> You can rely on the following mechanisms to provide this id:
>  - Configure each instance differently to have its own id
>  - Obtain the id from the cluster-manager. For instance,
> Kubernetes will provide each POD an unique id in the range
> [0,N). AWS ECS should expose similar capabilities via a REST
> end-point.
>
> >> One thing I'm still curious about, is what are the
> drawbacks or complexities of leveraging the Kafka High-level
> consumer + PassthroughJobCoordinator in a stand-alone setup
> like this?
>
> Leveraging the Kafka High-level consumer:
>
> The Kafka high-level consumer is not integrated into Samza
> just yet. Instead, Samza's integration with Kafka uses the
> low-level consumer because
> i) It allows for greater control in fetching data from
> individual brokers. It is simple and performant in-terms of
> the threading model to have one-thread pull from each
> broker.
> ii) It is efficient in memory utilization since it does not
> do internal-buffering of messages.
> iii) There's no overhead like Kafka-controller heart-beats
> that are driven by consumer.poll
>
> Since there's no built-in integration, you will have to
> build a new SystemConsumer if you need to integrate with the
> Kafka High-level consumer. Further, there's more a fair bit
> of complexity to manage in checkpointing.
>
> >> The Kafka High-level consumer comes with other nice tools
> for
> >> monitoring offsets, lag, etc
>
> Samza
> exposes<https://github.com/apache/samza/blob/master/samza-ka
> fka/src/main/scala/org/apache/samza/system/kafka/KafkaSystem
> ConsumerMetrics.scala
> <http://s.bl-1.com/h/ccJRJg5Y?url=https://github.com/apache/
> samza/blob/master/samza-kafka/src/main/scala/org/apache/samz
> a/system/kafka/KafkaSystemConsumerMetrics.scala> > the below
> metrics for lag-monitoring:
> - The current log-end offset for each partition
> - The last check-pointed offset for each partition
> - The number of messages behind the highwatermark of the
> partition
>
> Please let us know if you need help discovering these or
> integrating these with other systems/tools.
>
>
> Leveraging the Passthrough JobCoordinator:
>
> It's helpful to split this discussion on tradeoffs with
> PassthroughJC into 2 parts:
>
> 1. PassthroughJC + consumer managed partitions:
>
> - In this model, Samza has no control over
> partition-assignment since it's managed by the consumer.
> This means that stateful operations like joins that rely on
> partitions being co-located on the same task will not work.
> Simple stateless operations (eg: map, filter, remote
> lookups) are fine.
>
> - A key differentiator between Samza and other frameworks is
> our support for "host
> affinity<https://samza.apache.org/learn/documentation/0.14/y
> arn/yarn-host-affinity.html
> <http://s.bl-1.com/h/ccJRJlWb?url=https://samza.apache.org/l
> earn/documentation/0.14/yarn/yarn-host-affinity.html> >".
> Samza achieves this by assigning partitions to hosts taking
> data-locality into account. If the consumer can arbitrarily
> shuffle partitions, it'd be hard to support this
> affinity/locality. Often this is a key optimization when
> dealing with large stateful jobs.
>
> 2. PassthroughJC + static partitions:
>
> - In this model, it is possible to make stateful processing
> (including host affinity) work by carefully choosing how
> "id"s are assigned and computed.
>
> Recommendation:
>
> - Owing to the above subtleties, I would recommend that we
> give the ZkJobCoordinator + the built-in low-level Kafka
> integration a try.
> - If we hit snags down this path, we can certainly explore
> the approach with PassthroughJC + static partitions.
> - Using the PassthroughJC + consumer-managed distribution
> would be least preferable owing to the subtleties I outlined
> above.
>
> Please let us know should you have more questions.
>
> Best,
> Jagdish
>
> On Wed, Mar 14, 2018 at 9:24 PM, Thunder Stumpges
> <[email protected]<mailto:[email protected]>> wrote:
> Wow, what great timing, and what a great thread! I
> definitely have some good starters to go off of here.
>
> If it is helpful for everyone, once I get the low-level API
> + ZkJobCoordinator + Docker + K8s working, I'd be glad to
> formulate an additional sample for hello-samza.
>
> One thing I'm still curious about, is what are the drawbacks
> or complexities of leveraging the Kafka High-level consumer
> + PassthroughJobCoordinator in a stand-alone setup like
> this? We do have Zookeeper (because of kafka) so I think
> either would work. The Kafka High-level consumer comes with
> other nice tools for monitoring offsets, lag, etc....
>
> Thanks guys!
> -Thunder
>
> -----Original Message-----
> From: Tom Davis
> [mailto:[email protected]<mailto:[email protected]
> >]
> Sent: Wednesday, March 14, 2018 17:50
> To: [email protected]<mailto:[email protected]>
> Subject: Re: Old style "low level" Tasks with alternative
> deployment model(s)
>
> Hey there!
>
> You are correct that this is focused on the higher-level API
> but doesn't preclude using the lower-level API. I was at the
> same point you were not long ago, in fact, and had a very
> productive conversation on the list:
> you should look for "Question about custom
> StreamJob/Factory" in the list archive for the past couple
> months.
>
> I'll quote Jagadish Venkatraman from that thread:
>
> > For the section on the low-level API, can you use
> > LocalApplicationRunner#runTask()? It basically creates a
> new
> > StreamProcessor and runs it. Remember to provide
> task.class and set it
> > to your implementation of StreamTask or AsyncStreamTask.
> Please note
> > that this is an evolving API and hence, subject to change.
>
> I ended up just switching to the high-level API because I
> don't have any existing Tasks and the Kubernetes story is a
> little more straight forward there (there's only one
> container/configuration to deploy).
>
> Best,
>
> Tom
>
> Thunder Stumpges
> <[email protected]<mailto:[email protected]>> writes:
>
> > Hi all,
> >
> > We are using Samza (0.12.0) in about 2 dozen jobs
> implementing several
> > processing pipelines. We have also begun a significant
> move of other
> > services within our company to Docker/Kubernetes. Right
> now our
> > Hadoop/Yarn cluster has a mix of stream and batch "Map
> Reduce" jobs (many reporting and other batch processing
> jobs). We would really like to move our stream processing
> off of Hadoop/Yarn and onto Kubernetes.
> >
> > When I just read about some of the new progress in .13 and
> .14 I got
> > really excited! We would love to have our jobs run as
> simple libraries
> > in our own JVM, and use the Kafka High-Level-Consumer for
> partition distribution and such. This would let us
> "dockerfy" our application and run/scale in kubernetes.
> >
> > However as I read it, this new deployment model is ONLY
> for the
> > new(er) High Level API, correct? Is there a plan and/or
> resources for
> > adapting this back to existing low-level tasks ? How
> complicated of a task is that? Do I have any other options
> to make this transition easier?
> >
> > Thanks in advance.
> > Thunder
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>
>
>

Re: please unsubscribe me from this samza list - thx

Reply via email to