Hi Niels,
This sounds to me like a great use case for using window functions. You
could partition your data (use keyby) based on website and then hold your
window for certain amount of time. After that you could give your sink
already batched object and store it directly. On top of that if you are
with an even larger
> timestamp arrives late that was not considered when doing the first
> processing that failed.
>
> Best,
> Aljoscha
> > On 25. Apr 2017, at 19:54, Kamil Dziublinski <
> kamil.dziublin...@gmail.com> wrote:
> >
> > Hi guys,
> >
> &
Hi guys,
I have a flink streaming job that reads from kafka, creates some statistics
increments and stores this in hbase (using normal puts).
I'm using fold function here of with window of few seconds.
My tests showed me that restoring state with window functions is not
exactly working how I expe
I am not sure if you really need a keyby, your load will be distributed
among your map function without it. But could you explain a bit what is
your sink doing?
As for setting parallelism on the consumer remember that you wont have
higher parallelism than number of partitions in your topic.
If y
Hey,
I had a similar problem when I tried to list the jobs and kill one by name
in yarn cluster. Initially I also tried to set YARN_CONF_DIR but it didn't
work.
What helped tho was passing hadoop conf dir to my application when starting
it. Like that:
java -cp application.jar:/etc/hadoop/conf
Rea
s
> delay before register task in NetworkEnvironment. You can debug the
> specific status in upstream when response the PartitionNotFound to track
> the reason. Wish your further findings!
>
> Cheers,
> Zhijiang
>
> ------
&
Hi guys,
When I run my streaming job I almost always have initially
PartitionNotFoundException. Job fails, after that restarts and it runs ok.
I wonder what is causing that and if I can adjust some parameters to not
have this initial failure.
I have flink session on yarn with 55 task managers. 4
yep I meant 120 per second :)
On Fri, Mar 31, 2017 at 11:19 AM, Ted Yu wrote:
> The 1,2million seems to be European notation.
>
> You meant 1.2 million, right ?
>
> On Mar 31, 2017, at 1:19 AM, Kamil Dziublinski <
> kamil.dziublin...@gmail.com> wrote:
>
> Hi,
e consumer; that
> will directly configure the internal Kafka clients.
> Generally, all Kafka settings are applicable through the provided config
> properties, so you can perhaps take a look at the Kafka docs to see what
> else there is to tune for the clients.
>
> On March 30, 2017
izations
> which would boost write performance further.
>
> FYI
>
> On Mar 30, 2017, at 1:07 AM, Kamil Dziublinski <
> kamil.dziublin...@gmail.com> wrote:
>
> Hey guys,
>
> Sorry for confusion it turned out that I had a bug in my code, when I was
> not clearing thi
/
> projects/flink/flink-docs-release-1.2/dev/stream/state.
> html#using-managed-keyed-state
>
> I hope that helps.
>
> Regards,
> Timo
>
> Am 29/03/17 um 11:27 schrieb Kamil Dziublinski:
>
> Hi guys,
>
> I’m using flink on production in Mapp. We recently swapp
Hi guys,
I’m using flink on production in Mapp. We recently swapped from storm.
Before I have put this live I was doing performance tests and I found
something that “feels” a bit off.
I have a simple streaming job reading from kafka, doing window for 3
seconds and then storing into hbase.
Initial
12 matches
Mail list logo