Re: Producer stopped during leader switch

2016-11-01 Thread David Yu
mproved resiliency and if retries fail, the exception will be > propagated to your *StreamTask*, and will end up failing the container (if > you don't swallow it). > > Thanks, > Jagadish > > > > On Fri, Oct 28, 2016 at 10:12 AM, David Yu > wrote: > > > Hi, &g

Producer stopped during leader switch

2016-10-28 Thread David Yu
Hi, We recently experienced a Kafka broker crash. When a new broker was brought up, we started seeing the following errors in Samza (0.10.1): WARN o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 5199601 on topic-partition session_key_partitioned_sessions-39, re

Re: RecordTooLargeException recovery

2016-10-06 Thread David Yu
e max.requst.size for the producer. In > this case you need to make sure the broker also set up the corresponding > max message size. Last, you might also try to split the key into subkeys so > the value will be smaller. > > Thanks, > Xinyu > > On Thu, Oct 6, 2016 at 9:30

RecordTooLargeException recovery

2016-10-06 Thread David Yu
Hi, Our Samza job (0.10.1) throws RecordTooLargeExceptions when flushing the KV store change to the changelog topic, as well as sending outputs to Kafka. We have two questions to this problem: 1. It seems that after the affected containers failed multiple times, the job was able to recover and mo

Re: Job coordinator stream and job redeployment

2016-08-25 Thread David Yu
27;s a change > to the way Samza reads/writes coordinator stream messages. > > -Jake > > On Thu, Aug 25, 2016 at 8:44 AM, David Yu wrote: > > > After digging around a bit using kafka-console-consumer.sh, I'm able to > > peek into the coordinator stream and see

Re: Job coordinator stream and job redeployment

2016-08-25 Thread David Yu
with new configs. I was assuming that it would get incremented with every deployment/update. Thanks, David On Wed, Aug 24, 2016 at 5:39 PM David Yu wrote: > Hi, > > I'm trying to understand role of the coordinator stream during a job > redeployment. > > From the Samza

Job coordinator stream and job redeployment

2016-08-24 Thread David Yu
Hi, I'm trying to understand role of the coordinator stream during a job redeployment. >From the Samza documentation, I'm seeing the following about the coordinator stream: The Job Coordinator bootstraps configuration from the coordinator stream each time upon job start-up. It periodically catch

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
in the window or commit methods, which > could cause periodic spikes in utilization. > > > > On Wed, Aug 24, 2016 at 2:55 PM, David Yu wrote: > > > Interesting. > > > > To me, "event-loop-utilization" looks like a good indicator that shows us > > ho

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
ignificant because the timer runs even for no-ops > > Since you're on 10.1, there's another useful metric > "event-loop-utilization", which represents > (process-ns+window-ns+commit-ns)/event-loop-ns > (as you defined it). In other words, the proportion of time spend

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
> this behavior, see the following properties in the config table ( > > http://samza.apache.org/learn/documentation/0.10/jobs/ > > configuration-table.html) > > systems.system-name.samza.fetch.threshold > > task.poll.interval.ms > > > > > > > > On W

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
on: Is choose-ns the total number of ms used to choose a message from the input stream? What are some gating factors (e.g. serialization?) for this metric? Thanks, David On Wed, Aug 24, 2016 at 12:34 AM David Yu wrote: > Some metric updates: > 1. We started seeing some containers with a high

Re: Debug Samza consumer lag issue

2016-08-23 Thread David Yu
<https://www.dropbox.com/s/n1wxtngquv607nb/Screenshot%202016-08-24%2000.21.05.png?dl=0> . On Tue, Aug 23, 2016 at 5:56 PM David Yu wrote: > Hi, Jake, > > Thanks for your suggestions. Some of my answers inline: > > 1. > On Tue, Aug 23, 2016 at 11:53 AM Jacob Maes wrot

Re: Debug Samza consumer lag issue

2016-08-23 Thread David Yu
w time is high, but since it's only called once per minute, >it looks like it only represents 1% of the event loop utilization. So I >don't think that's a smoking gun. > > -Jake > > On Tue, Aug 23, 2016 at 9:18 AM, David Yu wrote: > > > Dear Samza guys,

Debug Samza consumer lag issue

2016-08-23 Thread David Yu
Dear Samza guys, We are here for some debugging suggestions on our Samza job (0.10.0), which lags behind on consumption after running for a couple of hours, regardless of the number of containers allocated (currently 5). Briefly, the job aggregates events into sessions (in Avro) during process()

Re: State store changelog format

2016-08-08 Thread David Yu
ll > need to make sure you're using Kafka 0.9 or higher on the Brokers. > > -Jake > > On Fri, Aug 5, 2016 at 10:57 PM, David Yu wrote: > > > I guess this might be the problem: > > > > 2016-08-06 05:23:23,622 [main ] WARN o.a.s.s.kafka.KafkaSystemFactory$ -

Re: Any way to monitor samza job?

2016-08-08 Thread David Yu
Samza has pretty good support for collectng metrics: https://samza.apache.org/learn/documentation/0.10/container/metrics.html With these metrics logged in Kafka, you can simply consume from this stream for monitoring. In our case, we pipe these metrics into OpenTSDB for visualization. For example

Re: State store changelog format

2016-08-05 Thread David Yu
our > Kafka team and they said that's not uncommon. You might want to try > gzip or some other compression. > > > On Fri, Aug 5, 2016 at 12:10 PM, David Yu wrote: > > > I'm reporting back my observations after enabling compression. > > > > Looks like co

Re: State store changelog format

2016-08-05 Thread David Yu
a.producer.compression.type=snappy Am I missing anything? Thanks, David On Wed, Aug 3, 2016 at 1:48 PM David Yu wrote: > Great. Thx. > > On Wed, Aug 3, 2016 at 1:42 PM Jacob Maes wrote: > >> Hey David, >> >> what gets written to the changelog topic >> >&g

Re: Understand KV store restoring

2016-08-05 Thread David Yu
itions owned by the tasks in the container. Restoration should > happen only once and not multiple times. > > What logs statements do you see that indicate that it is going through the > changelog multiple times? Can you please share that ? > > Thanks! > Navina > > On Fri, A

Understand KV store restoring

2016-08-05 Thread David Yu
Within a given container, does the restoration process go through the changelog topic once to restore ALL stores in that container? From the logs, I have a feeling that it is going through the changelog multiple times. Can anyone confirm? Thanks, David

Re: State store changelog format

2016-08-03 Thread David Yu
.producer.compression.type > > Hope this helps. > -Jake > > > > On Wed, Aug 3, 2016 at 11:16 AM, David Yu wrote: > > > I'm trying to understand what gets written to the changelog topic. Is it > > just the serialized value of the particular state store entry?

State store changelog format

2016-08-03 Thread David Yu
I'm trying to understand what gets written to the changelog topic. Is it just the serialized value of the particular state store entry? If I want to compress the changelog topic, do I enable that from the producer? The reason I'm asking is that, we are seeing producer throughput issues and suspect

Number of Kafka producers

2016-07-27 Thread David Yu
Is there a way to control the number of producers? Our Samza job writes a lot of data to the downstream Kafka topic. I was wondering if there is a way to optimize concurrency by creating more async producers. Thanks, David

Re: Sync Kafka producer by default?

2016-07-26 Thread David Yu
I'm using 0.10. Thanks for confirming. On Tue, Jul 26, 2016 at 1:12 PM Navina Ramesh wrote: > Hi David, > Which version of Samza are you using? Starting with Samza 0.8, I think we > always used async producer. > > Thanks! > Navina > > On Tue, Jul 26, 2016 a

Sync Kafka producer by default?

2016-07-26 Thread David Yu
Our session-roll-up Samza job is experience throughput issues. We would like to fine-tune our Kafka producer. Does Samza use "producer.type=sync" by default? From the JMX console, I'm seeing some producer batching metrics. I thought those are for async producers. Thanks, David

Re: No updates to some of the store changelog partitions

2016-06-14 Thread David Yu
containers than others (3), whereas now things are perfectly balanced. But I'm befuddled as to why that might cause the particular issue. Any insight is appreciated. Thanks, David On Mon, Jun 13, 2016 at 10:57 PM, David Yu wrote: > Hi, Yi, > > I couldn't find any errors in the

Re: No updates to some of the store changelog partitions

2016-06-13 Thread David Yu
state store from the changelog. You can use it to > recover the state store from changelog and compare w/ the local RocksDB to > verify if there is any discrepancies. > > Best. > > -Yi > > On Sun, Jun 12, 2016 at 12:49 PM, David Yu > wrote: > > > Jagadish, > &

Re: Update all values in RocksDB

2016-06-13 Thread David Yu
session's > sessionId is *alway* at the tail of your session table, and your session > expiration order is the same as the order determined by the sessionId, that > should work as well. > > -Yi > > On Mon, Jun 6, 2016 at 3:17 PM, David Yu wrote: > > > Hi, Yi, &g

Re: No updates to some of the store changelog partitions

2016-06-12 Thread David Yu
s older than the retention are purged. > 2. Compaction: Newer key-values will over-write older keys and only the > most recent value is retained. > > I'm not sure if offsets are always monotonically increasing in Kafka or > could change after a compaction/ a time based retenti

No updates to some of the store changelog partitions

2016-06-11 Thread David Yu
My understanding of store changelog is that, each task writes store changes to a particular changelog partition for that task. (Does that mean the changelog keys are task names?) One thing that confuses me is that, the last offsets of some changelog partitions do not move. I'm using the kafka GetO

Re: Location of the RocksDB Key-Value store

2016-06-11 Thread David Yu
Not sure if it is configurable. But you should be able to find an entry in your samza log like this: /data/b/yarn/nm/usercache/david/appcache/application_1464853403568_0010/container_e04_1464853403568_0010_01_02/state/session-store/Partition_14 On Fri, Jun 10, 2016 at 7:01 PM, Jack Huang wro

Re: Update all values in RocksDB

2016-06-06 Thread David Yu
rtant to > achieve fast and efficient operation. Please refer to this blog from > RocksDB team: > > https://github.com/facebook/rocksdb/wiki/Implement-Queue-Service-Using-RocksDB > > -Yi > > On Mon, Jun 6, 2016 at 2:25 PM, David Yu wrote: > > > We use Samza RocksDB

Update all values in RocksDB

2016-06-06 Thread David Yu
We use Samza RocksDB to keep track of our user event sessions. The task periodically calls window() to update all sessions in the store and purge all closed sessions. We do all of this in the same iterator loop. Here's how we are doing it: public void window(MessageCollector collector, TaskCoor

Re: Samza job killed by left orphaned on YARN

2016-05-19 Thread David Yu
IGTERM and a SIGKILL, though I can't find it at > the moment. > > -Jake > > On Wed, May 18, 2016 at 10:32 AM, David Yu > wrote: > > > From the NM log, I'm seeing: > > > > 2016-05-18 06:29:06,248 INFO > > > > > org.apache.hadoop.yarn.ser

Re: Samza job killed by left orphaned on YARN

2016-05-18 Thread David Yu
265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application *application_1463512986427_0007* transitioned from RUNNING to FINISHING_CONTAINERS_WAIT (*Highlighted* is the particular samza application.) The status never transitioned from FINISHING_CONTAINERS_WAIT :( On Wed, May 18, 2016 at 10:21 AM, David Y

Re: Samza job killed by left orphaned on YARN

2016-05-18 Thread David Yu
nning when the NM shut down. This option has virtually eliminated > orphaned containers in our clusters. > > -Jake > > On Tue, May 17, 2016 at 11:54 PM, David Yu > wrote: > > > Samza version = 0.10.0 > > YARN version = Hadoop 2.6.0-cdh5.4.9 >

Samza job killed by left orphaned on YARN

2016-05-17 Thread David Yu
Samza version = 0.10.0 YARN version = Hadoop 2.6.0-cdh5.4.9 We are experience issues when killing a Samza job: $ yarn application -kill application_1463512986427_0007 Killing application application_1463512986427_0007 16/05/18 06:29:05 INFO impl.YarnClientImpl: Killed application application_14

Samza not consuming

2016-03-19 Thread David Yu
I'm trying to debug our samza job, which seem to be stuck from consuming from our Kafka stream. Every time I redeploy the job, only the same handful of events get consumed, and then no more events get processed. I manually checked to make sure the input stream is live and flowing. I also tried bot

Re: Samza not consuming

2016-03-19 Thread David Yu
stuck :( Any ideas that could help me debug this will be appreciated. On Wed, Mar 16, 2016 at 4:19 PM, David Yu wrote: > No, instead, I updated the checkpoint topic with the "upcoming" offsets. > (I should have done a check before that though). > > So a related question: if

Re: Samza not consuming

2016-03-19 Thread David Yu
e the debug line after > troubleshooting. Else you risk filling up your logs. > > Let me know if you have more questions. > > Thanks! > Navina > > On Wed, Mar 16, 2016 at 2:12 PM, David Yu wrote: > > > I'm trying to debug our samza job, which seem to be stuck fro

Re: Samza not consuming

2016-03-19 Thread David Yu
Looks like this has nothing to do with checkpointing. Our samza job has an issue communicating an external service, which left the particular process() call waiting indefinitely. And it doesn't look like samza has a way to timeout a processing cycle. On Thu, Mar 17, 2016 at 5:42 PM, Dav

Re: Samza not consuming

2016-03-19 Thread David Yu
om the correct offset in the stream :) > > Thanks! > Navina > > On Wed, Mar 16, 2016 at 3:16 PM, David Yu wrote: > > > Finally seeing events flowing again. > > > > Yes, the "systems.kafka.consumer.auto.offset.reset" option is probably > not > > a fact

Re: Samza not consuming

2016-03-19 Thread David Yu
Strangely, I was not able to get checkpoint value for one particular partition. Could this cause the job to be stuck? On Thu, Mar 17, 2016 at 5:23 PM, David Yu wrote: > Hi, I wanna resurface this thread because I'm still facing issues with our > samza not receiving events. > &

Re: No samza consumer group found

2016-03-16 Thread David Yu
nd then compare it > with the topic offset. > > it's not that straight forward comparing to existing tools such as yours, > and requires additional effort to integrate with dashboard tools such as > grafana. > > I think this group have better ways to do this ;-) >

Re: No samza consumer group found

2016-03-15 Thread David Yu
d = KafkaUtil.getClientId("samza-consumer", config) > > KafkaUtil.getClientId has this logic. > > I'm curious about your usecase as to why you are interested in inspecting > this information? > > Thanks, > Jagadish > > On Tuesday, March 15, 2016, David

No samza consumer group found

2016-03-15 Thread David Yu
Our samza job is consuming from a Kafka topic. AFAIU, samza will auto assign the job a consumer group id and client id. However, I'm not able to see that showing up under zookeeper. Am I missing something?

Re: Understand Samza default metrics

2016-02-24 Thread David Yu
basic metrics by googling the following > > classes: > > SamzaContainerMetrics, TaskInstanceMetrics, SystemConsumersMetrics and > > SystemProducersMetrics. For your example, you can use the > > "process-calls" > > in SamzaContainerMetrics to get

Understand Samza default metrics

2016-02-23 Thread David Yu
Hi, Where can I find the detailed descriptions of the out of the box metrics provided by MetricsSnapshotReporterFactory and JmxReporterFactory? I'm interested in seeing the basic metrics of the my samza job (e.g. messages_processed_per_sec). But it's hard to ping point to the specific metric that