Re: samza-hello-samza build cannot find samza 0.10.0-SNAPSHOT artifacts on maven

2015-12-17 Thread Kishore N C
The snapshot versions are also found on Apache's public repo: apache-repo apache-repo https://repository.apache.org/content/groups/public Cheers, KN. On Fri, Aug 28, 2015 at 11:36 PM, Yan Fang wrote: > run ./gradlew publishToMavenLocal ? > > Fang, Yan > yanfang...@gmail.com > >

Re: Configuring RocksDB SST file size

2015-12-17 Thread Kishore N C
T file creation in my case, it is triggered by the > compaction. This may be the reason why you see many small SST files. > > HTH, > -Tao > > On Wed, Dec 16, 2015 at 5:06 AM, Kishore N C wrote: > > > Hi, > > > > During a catch-up job that might require reproce

Configuring RocksDB SST file size

2015-12-16 Thread Kishore N C
Hi, During a catch-up job that might require reprocessing of 100s of millions of records, I wanted to tweak RocksDB configuration to ensure that it's optimized for bulk writes. According to the documentation here

Re: Random connection errors

2015-12-15 Thread Kishore N C
78378ce2203/raw/8537d3d3644eea7cdd9efc9fa8749c0840092f3c/gistfile1.txt > > > > Thanks, > > > > Kishore. > > > > On Tue, Dec 15, 2015 at 4:30 PM, Kishore N C > wrote: > > > > > Hi Yi Pan, > > > > > > I'm using Samza 0.

Re: Random connection errors

2015-12-15 Thread Kishore N C
PM, Kishore N C wrote: > Hi Yi Pan, > > I'm using Samza 0.9.1 and Kafka 0.8.2.1. Here's an example of a full task > log: > > > https://gist.githubusercontent.com/kishorenc/5d65f114a50b9ef6a6b3/raw/5b9ecffdd1af831f713e8b41e5b77e5b881e8173/

Re: Random connection errors

2015-12-15 Thread Kishore N C
sume the log > you attached here is a container log?), it would be greatly helpful. > > Thanks a lot! > > -Yi > > On Mon, Dec 14, 2015 at 5:07 AM, Kishore N C wrote: > > > Hi, > > > > I have a 25 node Samza cluster and I am running a job on a dataset of a >

Random connection errors

2015-12-14 Thread Kishore N C
Hi, I have a 25 node Samza cluster and I am running a job on a dataset of a billion records that is backed by a 7 node Kafka cluster. Some of the tasks on some of the Samza nodes don't seem to start at all (while other tasks run fine on other nodes). The specific error message I see is in the tas

Re: Detecting "done" on a bounded input dataset

2015-10-14 Thread Kishore N C
o you > need to unsubscribe from the stream? Or you are still OK receiving more > messages from the stream? > 2) Scenario 2: you want the Samza jobs to shutdown when detecting the end > of a certain stream. > > Which scenario are you targeting? > > Thanks! > > -Yi

Detecting "done" on a bounded input dataset

2015-10-14 Thread Kishore N C
Hi, Our data processing pipeline consists of a set of Samza jobs, that form a DAG. Sometimes, we have to throw finite datasets into the Kafka topic that acts as the entry point to the pipeline. Given that different Samza jobs in the DAG could have varying latencies in terms of processing the recor

Stream-Stream joins - restricting the size of the KV store

2015-03-03 Thread Kishore N C
Hi all, I just read through this post about implementing streaming joins in Samza using a KV store that Samza provides. Can someone tell me how I can ensure that this KV storage does not grow too