Re: Periodic flush sink?

2017-04-30 Thread Kamil Dziublinski
Hi Niels, This sounds to me like a great use case for using window functions. You could partition your data (use keyby) based on website and then hold your window for certain amount of time. After that you could give your sink already batched object and store it directly. On top of that if you are

Re: Fault tolerance & idempotency on window functions

2017-04-29 Thread Kamil Dziublinski
with an even larger > timestamp arrives late that was not considered when doing the first > processing that failed. > > Best, > Aljoscha > > On 25. Apr 2017, at 19:54, Kamil Dziublinski < > kamil.dziublin...@gmail.com> wrote: > > > > Hi guys, > > > &

Fault tolerance & idempotency on window functions

2017-04-25 Thread Kamil Dziublinski
Hi guys, I have a flink streaming job that reads from kafka, creates some statistics increments and stores this in hbase (using normal puts). I'm using fold function here of with window of few seconds. My tests showed me that restoring state with window functions is not exactly working how I expe

Re: Key by Task number

2017-04-18 Thread Kamil Dziublinski
I am not sure if you really need a keyby, your load will be distributed among your map function without it. But could you explain a bit what is your sink doing? As for setting parallelism on the consumer remember that you wont have higher parallelism than number of partitions in your topic. If y

Re: Submit Flink job programatically

2017-04-07 Thread Kamil Dziublinski
Hey, I had a similar problem when I tried to list the jobs and kill one by name in yarn cluster. Initially I also tried to set YARN_CONF_DIR but it didn't work. What helped tho was passing hadoop conf dir to my application when starting it. Like that: java -cp application.jar:/etc/hadoop/conf Rea

Re: PartitionNotFoundException on deploying streaming job

2017-04-05 Thread Kamil Dziublinski
s > delay before register task in NetworkEnvironment. You can debug the > specific status in upstream when response the PartitionNotFound to track > the reason. Wish your further findings! > > Cheers, > Zhijiang > > ------ &

PartitionNotFoundException on deploying streaming job

2017-04-04 Thread Kamil Dziublinski
Hi guys, When I run my streaming job I almost always have initially PartitionNotFoundException. Job fails, after that restarts and it runs ok. I wonder what is causing that and if I can adjust some parameters to not have this initial failure. I have flink session on yarn with 55 task managers. 4

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-31 Thread Kamil Dziublinski
yep I meant 120 per second :) On Fri, Mar 31, 2017 at 11:19 AM, Ted Yu wrote: > The 1,2million seems to be European notation. > > You meant 1.2 million, right ? > > On Mar 31, 2017, at 1:19 AM, Kamil Dziublinski < > kamil.dziublin...@gmail.com> wrote: > > Hi,

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-31 Thread Kamil Dziublinski
e consumer; that > will directly configure the internal Kafka clients. > Generally, all Kafka settings are applicable through the provided config > properties, so you can perhaps take a look at the Kafka docs to see what > else there is to tune for the clients. > > On March 30, 2017

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Kamil Dziublinski
izations > which would boost write performance further. > > FYI > > On Mar 30, 2017, at 1:07 AM, Kamil Dziublinski < > kamil.dziublin...@gmail.com> wrote: > > Hey guys, > > Sorry for confusion it turned out that I had a bug in my code, when I was > not clearing thi

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Kamil Dziublinski
/ > projects/flink/flink-docs-release-1.2/dev/stream/state. > html#using-managed-keyed-state > > I hope that helps. > > Regards, > Timo > > Am 29/03/17 um 11:27 schrieb Kamil Dziublinski: > > Hi guys, > > I’m using flink on production in Mapp. We recently swapp

20 times higher throughput with Window function vs fold function, intended?

2017-03-29 Thread Kamil Dziublinski
Hi guys, I’m using flink on production in Mapp. We recently swapped from storm. Before I have put this live I was doing performance tests and I found something that “feels” a bit off. I have a simple streaming job reading from kafka, doing window for 3 seconds and then storing into hbase. Initial