Let me give you specific example, say stream1 event1 happened within your
window 0-5 min with key1, and event2 on stream2 with key2 which could have
matched with key1 happened at 5:01 outside the join window, so now you will
have to co-relate the event2 on stream2 with the event1 with stream1 which
Thanks Balaji. Do you mean you spill the non-matching records after 5
minutes into redis? Does flink give you control on which records is not
matching in the current window such that you can copy into a long-term
storage?
On Wed, Apr 13, 2016 at 11:20 PM, Balaji Rajagopalan <
balaji.rajagopa..
You can implement join in flink (which is a inner join) the below mentioned
pseudo code . The below join is for a 5 minute interval, yes will be some
corners cases when the data coming after 5 minutes will be missed out in
the join window, I actually had solved this problem but storing some data
i
Hi,
We are evaluating different streaming platforms. For a typical join
between two streams
select a.*, b.*
FROM a, b
ON a.id == b.id
How does flink implement the join? The matching record from either stream
can come late, we consider it's a valid join as long as the event time for
record a an
On Wed, Apr 13, 2016 at 2:10 AM, Stephan Ewen wrote:
> If you want to use Flink's internal key/value state, however, you need to
> let Flink re-partition the data by using "keyBy()". That is because Flink's
> internal sharding of state (including the re-sharding to adjust parallelism
> we are cur
Hi Aljoscha,
thanks for your answers. I am currently not in the office, so I can not
run any further analysis until Monday. Just some quick answers to your
questions.
We are using the partitioned state abstraction, most of the state should
correspond to buffered events in windows. Parallelism is
Hi,
calling DataStream.partitionCustom() with the respective arguments
before the sink should do the trick, I think.
Cheers,
Konstantin
On 14.04.2016 01:22, neo21 zerro wrote:
> Hello everybody,
>
> I have an elasticsearch sink in my flink topology.
> My requirement is to write the data in a p
You could simulate the Samza approach by having a RichFlatMapFunction over
cogrouped streams that maintains the sliding window in its ListState. As I
understand the drawback is that the list state is not maintained in the
managed memory.
I'm interested to hear about the right way to implement this.
Hello everybody,
I have an elasticsearch sink in my flink topology.
My requirement is to write the data in a partitioned fashion to my Sink.
For example I have Tuple which contains a user id. I want to group all events
by a user id and partition all events for one particular user to the same Es
Hello everybody,
I have deployed the latest Flink Version 1.0.1 on Yarn 2.5.0-cdh5.3.0.
When I push the WordCount example shipped with the Flink distribution, I can
see metrics (bytes received) in the Flink Ui on the corresponding operator.
However, I used the flink kafka connector and when I r
I am wondering how you would implement the following function in Flink.
The function takes as input two streams. One stream can be viewed a a
tuple with two value *(x, y)*, the second stream is a stream of individual
values *z*. The function keeps a time based window on the first input
(say, 24 h
Does this problem persist? (It might have been caused by maven caches with
bad artifacts).
The many transitive dependencies you see often come from the connectors -
that is also why we do not put the connectors into the lib folder directly,
so that these libraries are not always part of every Flin
You can reduce Flink's internal network buffering by adjusting the total
number of network buffers.
See
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-the-network-buffers
That should also make a difference.
On Wed, Apr 13, 2016 at 3:57 PM, Andrew Ge Wu
Hi everyone,
I'm using Flink 0.10.2 for some benchmarks and had to add some small
changes to Flink, which led me to compiling and running it myself. This is
when I noticed a performance difference in the pre-packaged Flink version
that I downloaded from the web (
http://archive.apache.org/dist/fli
Thanks guys for the explanation, I will give it a try.
After all buffers are filled, the back pressure did it’s job, it works so far
so good, but I will defiantly give a try to control the latency.
Thanks again!
Andrew
> On 11 Apr 2016, at 18:19, Stephan Ewen wrote:
>
> Hi!
>
> Ufuk's su
That's interesting to hear. If you want we can also collaborate on that
one. Using the Flink managed memory for that purpose would require some
changes to lower layers of Flink.
On Wed, 13 Apr 2016 at 13:11 Shannon Carey wrote:
> This is something that my team and I have discussed building, so i
This is something that my team and I have discussed building, so it's great to
know that it's already on the radar. If we beat you to it, I'll definitely try
to make it a contribution.
Shannon
From: Aljoscha Krettek mailto:aljos...@apache.org>>
Date: Wednesday, April 13, 2016 at 1:46 PM
To: mai
Now I got this working in cloud (not locally, but it's ok) so thanks a lot.
Next problem is how to read then these written files and add them to the
als.
I guess it is something like
val als = ALS()
als.factorsOption = Option(users,items)
but I don't get how I could read in the data I hav
No problem ;-)
On Wed, Apr 13, 2016 at 11:38 AM, Stefano Bortoli
wrote:
> Sounds you are damn right! thanks for the insight, dumb on us for not
> checking this before.
>
> saluti,
> Stefano
>
> 2016-04-13 11:05 GMT+02:00 Stephan Ewen :
>
>> Sounds actually not like a Flink issue. I would look in
Sounds you are damn right! thanks for the insight, dumb on us for not
checking this before.
saluti,
Stefano
2016-04-13 11:05 GMT+02:00 Stephan Ewen :
> Sounds actually not like a Flink issue. I would look into the commons pool
> docs.
> Maybe they size their pools by default with the number of c
Sounds actually not like a Flink issue. I would look into the commons pool
docs.
Maybe they size their pools by default with the number of cores, so the
pool has only 8 threads, and other requests are queues?
On Wed, Apr 13, 2016 at 10:29 AM, Flavio Pompermaier
wrote:
> Any feedback about our JD
Hi!
You can exploit that, yes. If you read data from Kafka in Flink, a Kafka
partition is "sticky" to a Flink source subtask. If you have (kafka-source
=> mapFunction) for example, you can be sure that all values with the same
key go through the same parallel mapFunction subtask. If you maintain a
Any feedback about our JDBC InputFormat issue..?
On Thu, Apr 7, 2016 at 12:37 PM, Flavio Pompermaier
wrote:
> We've finally created a running example (For Flink 0.10.2) of our improved
> JDBC imputformat that you can run from an IDE (it creates an in-memory
> derby database with 1000 rows and ba
Hi,
there are two cases where a FoldFunction can be used in the streaming API:
KeyedStream.fold() and WindowedStream.fold()/apply(). In both cases we
internally use the partitioned state abstraction of Flink to keep the
state. So yes, the accumulator value is consistently maintained and will
surviv
Hi Maxim,
yes the plan is to have a cache of hot values that uses the managed memory
abstraction of Flink so that we can make sure that we stay within memory
bounds and don't run into OOM exceptions.
On Tue, 12 Apr 2016 at 23:37 Maxim wrote:
> Is it possible to add an option to store the state i
Hi Stephan,
If we were to do that, would flink leverage the fact that Kafka has already
partitioned the data by the key, or would flink attempt to shuffle the data
again into its own partitions, potentially shuffling data between machines
for no gain?
Thanks,
Andy
On Sun, 10 Apr 2016, 13:22 Ste
26 matches
Mail list logo