Re: How to implement broadcasted KeyValueStorage with 0.9

2016-05-03 Thread Bae, Jae Hyeon
e you referring to broadcast streams? > > > > On Mon, May 2, 2016 at 7:58 PM, Bae, Jae Hyeon wrote: > > > Hi Samza devs and users > > > > Is it possible to implement broadcasted KeyValueStorage with 0.9? I know > > 0.10 supports broadcast but we're not

How to implement broadcasted KeyValueStorage with 0.9

2016-05-02 Thread Bae, Jae Hyeon
Hi Samza devs and users Is it possible to implement broadcasted KeyValueStorage with 0.9? I know 0.10 supports broadcast but we're not yet to 0.10. Thank you Best, Jae

Question on zero downtime deployment of Samza jobs

2016-01-06 Thread Bae, Jae Hyeon
Hi Samza devs and users I know this will be tricky in Samza because Samza Kafka consumer is not coordinated externally, but do you have any idea how to deploy samza jobs with zero downtime? Thank you Best, Jae

Re: Samza hdfs

2015-11-10 Thread Bae, Jae Hyeon
Hi Aram May I know why you're moving off from Spark streaming to Samza? On Tue, Nov 10, 2015 at 11:04 PM, Aram Mkrtchyan < aram.mkrtch...@picsart.com.invalid> wrote: > Hi Yi, > Many thanks! :) > It's wonderful ! We really need samza-hdfs module. > Thanks to samza being very simple we have been a

Re: KafkaCheckPointManager is too slow

2015-11-04 Thread Bae, Jae Hyeon
be log-compacted. Then, the next question is: what's the size of the > data in the checkpoint topic and how many input topic partitions you have > in the job? > > It would also be helpful if you can share which version of samza you are > using. > > Thanks! > > -Yi &g

Re: KafkaCheckPointManager is too slow

2015-11-04 Thread Bae, Jae Hyeon
Also, I am using Samza 0.9.1. On Wed, Nov 4, 2015 at 12:18 AM, Bae, Jae Hyeon wrote: > Hi Yi > > There are 8 partitions in the input topic. The size of checkpoint topic is > 26 MB. > > segment.bytes26214400cleanup.policycompact > > On Tue, Nov 3, 2015 at 11:12 PM, Yi

Re: KafkaCheckPointManager is too slow

2015-11-03 Thread Bae, Jae Hyeon
act: > 1. How many system stream partitions you have as the input? And how many > tasks are there? > 2. Did you set your checkpoint topic as log-compact topic in Kafka? The > topic size would be much smaller if log compaction is turned on. > > Regards > > -Yi > > On Tu

Re: KafkaCheckPointManager is too slow

2015-11-03 Thread Bae, Jae Hyeon
log-compact topic in Kafka? The > topic size would be much smaller if log compaction is turned on. > > Regards > > -Yi > > On Tue, Nov 3, 2015 at 3:59 PM, Bae, Jae Hyeon wrote: > > > Hi Samza Dev > > > > Do you know why the following job is taking too long? &

KafkaCheckPointManager is too slow

2015-11-03 Thread Bae, Jae Hyeon
Hi Samza Dev Do you know why the following job is taking too long? 2015-11-03 23:58:17 KafkaCheckpointManager [INFO] Get latest offset 3386930 for topic __samza_checkpoint_ver_1_for_xxx_1 and partition 0. This is seriously slowing down development. How can I fix this problem? Thank you Best, Ja

Re: How to synchronize KeyValueStore and Kafka cleanup

2015-10-02 Thread Bae, Jae Hyeon
lization (again, assuming if you > have changelog configured) > > > implement the StreamTask job to consume a Kafka topic and call add() > method? > Why wouldn't you want to use a changelog ? > > > On Fri, Oct 2, 2015 at 3:09 PM, Bae, Jae Hyeon wrote: > > > T

Re: How to synchronize KeyValueStore and Kafka cleanup

2015-10-02 Thread Bae, Jae Hyeon
Thanks Yi Pan, I have one more question. Does KV-store consume automatically from a Kafka topic? Does it consume only on restore()? If so, do I have to implement the StreamTask job to consume a Kafka topic and call add() method? On Fri, Oct 2, 2015 at 2:01 PM, Yi Pan wrote: > Hi, Jae Hyeon, > >

Re: How to synchronize KeyValueStore and Kafka cleanup

2015-10-02 Thread Bae, Jae Hyeon
n Fri, Oct 2, 2015 at 1:43 PM, Bae, Jae Hyeon wrote: > Hi Samza devs and users > > This is my first try with KeyValueStore and I am really excited! > > I glanced through TaskStorageManager source code, it looks creates > consumers for stores and I am wondering that how kafka cle

How to synchronize KeyValueStore and Kafka cleanup

2015-10-02 Thread Bae, Jae Hyeon
Hi Samza devs and users This is my first try with KeyValueStore and I am really excited! I glanced through TaskStorageManager source code, it looks creates consumers for stores and I am wondering that how kafka cleanup will be propagated to KeyValueStore. My KeyValueStore usage is a little bit d

Re: Another checkpoint tool question

2015-07-14 Thread Bae, Jae Hyeon
lt; > https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-ConsumerOffsetChecker > > > > Thanks, > > Fang, Yan > yanfang...@gmail.com > > On Tue, Jul 14, 2015 at 10:33 AM, Bae, Jae Hyeon > wrote: > > > Hi Yan > > > > Thanks for

Re: Another checkpoint tool question

2015-07-14 Thread Bae, Jae Hyeon
e, you can use > > systems.system-name.samza.offset.default=upcoming > systems.system-name.streams.stream-name.samza.reset.offset=true > > What is the reason that you do not want to use > systems.system-name.streams.stream-name.samza.reset.offset > ? > > Thanks, > > Fang, Yan >

Another checkpoint tool question

2015-07-14 Thread Bae, Jae Hyeon
Hi I want to reset the offset for the job to the latest one, which means I want to ignore them without using systems.system-name.streams.stream-name.samza.reset.offset option. If I use checkpoint tool and reset the offset as -1 or Long.MAX_VALUE, in my theory, kafka consumer will throw an excepti

Re: Question on newBlockingQueue in BlockingEnvelopeMap

2015-07-10 Thread Bae, Jae Hyeon
essages in the queue in one partition. > > Thanks, > > Fang, Yan > yanfang...@gmail.com > > On Fri, Jul 10, 2015 at 12:36 PM, Bae, Jae Hyeon > wrote: > > > Hi Samza devs and users > > > > I wrote customized Samza S3 consumer which downloads files from S3

Question on newBlockingQueue in BlockingEnvelopeMap

2015-07-10 Thread Bae, Jae Hyeon
Hi Samza devs and users I wrote customized Samza S3 consumer which downloads files from S3 and put messages in BlockedEnvelopeMap. It was straightforward because there's a nice example, filereader. I tried to a little optimize with newBlockingQueue() method because I guess that single queue shared

Re: How to pass quoted string in the configuration

2015-03-19 Thread Bae, Jae Hyeon
t; the hello-samza, it fails with same error you have. I have opened a bug to > track this here: > https://issues.apache.org/jira/browse/SAMZA-609 > > Thanks, > Naveen > > On Mar 19, 2015, at 3:44 PM, Bae, Jae Hyeon metac...@gmail.com>> wrote: > > Double quote characte

Re: How to pass quoted string in the configuration

2015-03-19 Thread Bae, Jae Hyeon
t;uiStartup.ended\\") or xpath(\\"category\\") = \\"uiIntent\\"\",\" job.name\":\"clevent-kafka\", Anyway, I will use url encode-decode to make json parsing more comfortable. On Thu, Mar 19, 2015 at 3:21 PM, Bae, Jae Hyeon wrote: > The problem i

Re: How to pass quoted string in the configuration

2015-03-19 Thread Bae, Jae Hyeon
;ll have to check your AM container > logs. Can you find those? They're usually linked to from the YARN RM UI. > > Cheers, > Chris > > On Thu, Mar 19, 2015 at 2:32 PM, Bae, Jae Hyeon > wrote: > > > Hi Samza Devs > > > > I want to pass the quoted string

How to pass quoted string in the configuration

2015-03-19 Thread Bae, Jae Hyeon
Hi Samza Devs I want to pass the quoted string like filter.map.filter1.property=xpath("name") in ("uiBrowseStartup.ended", "subscription.ended", "uiStartup.ended") or xpath("category") = "uiIntent" through the configuration to the container but AM keeps failing Application application_142309072

Re: How to get Container ID from configuration

2015-03-05 Thread Bae, Jae Hyeon
ENV_CONTAINER_ID does not look existing in 0.8.0. I will try ENV_CONTAINER_NAME. On Thu, Mar 5, 2015 at 5:09 PM, Chris Riccomini wrote: > Hey Jae, > > Also, this should work: > > System.getenv(ShellCommandConfig.ENV_CONTAINER_ID).toInt > > This is actually how SamzaContainer gets its container

How to get Container ID from configuration

2015-03-05 Thread Bae, Jae Hyeon
Hi Samza devs How can I get container id such as 0, 1, 2, ... up to yarn.container.count? I tried to get some environment variable but it didn't work well. If it's not easy to get the container id, I have to use process id to distinguish containers from the same job running in the same machine b

Re: How to create a single task per container

2015-02-08 Thread Bae, Jae Hyeon
I am reading JobCoordinator and now I can understand why multiple containers were not launched. I need to create multiple tasks, which are grouped again based on containerCount. On Fri, Feb 6, 2015 at 1:26 PM, Bae, Jae Hyeon wrote: > Our current main purpose of samza is for data pipeline, so

How to create a single task per container

2015-02-08 Thread Bae, Jae Hyeon
Our current main purpose of samza is for data pipeline, so we don't want to create multiple tasks in the single SamzaContainer. As I read Samza implementation, it will create as many tasks as the number of partitions assigned in the container, right? The problem of that approach is, each task will

Re: Question on nullEnvelop

2015-02-06 Thread Bae, Jae Hyeon
> longer. The longer the container sleeps, the more latency that's introduced > when there *is* a message available. 10ms is what we use by default. > > Cheers, > Chris > > On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon > wrote: > > > Could you explain why consumerMult

Question on nullEnvelop

2015-02-06 Thread Bae, Jae Hyeon
Could you explain why consumerMultiplexer.choose returns null? Can it happen when there's no message in the kafka topic? If my theory is correct, its frequency is too high, in my testing environment, it's more than 50 per second. Thank you Best, Jae

Re: SystemProducer wants to call commit after flush

2015-01-29 Thread Bae, Jae Hyeon
will file a jira and submit PR. Let me know. If this is not acceptable, I should switch my SystemProducer implementation to StreamTask again. On Thu, Jan 29, 2015 at 9:29 AM, Bae, Jae Hyeon wrote: > Hi Chris > > > > On Thu, Jan 29, 2015 at 9:10 AM, Chris Riccomini > w

Re: SystemProducer wants to call commit after flush

2015-01-29 Thread Bae, Jae Hyeon
itched it to SystemProducer as Metamx Druid Tranquility did. But I overlook that duplicate data which can be caused by flush & commit mismatch. Do you have any idea? > > Could you elaborate on this? > > Cheers, > Chris > > On Thu, Jan 29, 2015 at 12:27 AM, Bae, Jae

Re: SystemProducer wants to call commit after flush

2015-01-29 Thread Bae, Jae Hyeon
Never mind. I found a solution. Flush should be synced with commit. On Thu, Jan 29, 2015 at 12:15 AM, Bae, Jae Hyeon wrote: > Hi Samza Devs > > StreamTask can control SamzaContainer.commit() through task coordinator. > Can we make SystemProducer control commit after flush? With

SystemProducer wants to call commit after flush

2015-01-29 Thread Bae, Jae Hyeon
Hi Samza Devs StreamTask can control SamzaContainer.commit() through task coordinator. Can we make SystemProducer control commit after flush? With this feature, we can prevent any duplicate data on SamzaContainer failure. For example, if we set commit interval as 2 minutes, before commit time int

Re: Testing dev@samza.apache.org

2015-01-23 Thread Bae, Jae Hyeon
got it. On Fri, Jan 23, 2015 at 9:58 AM, TJ Giuli wrote: > Echo > —T > > On Jan 23, 2015, at 9:49 AM, Chris Riccomini > wrote: > > > > Hey all, > > > > Could you please confirm that you're seeing this? I'm trying to verify > the > > TLP migration for: > > > > https://issues.apache.org/jira/bro