Hi Fabian, Thank you for your help.
I want my Flink application to be distributed as well as I want
the facility to support the update of the variable [Coefficients of
LinearRegression].
How you would do in that case?
The problem with iteration is that it expects Dataset with same type to be
fe
Hi,
yes, I was talking about a Flink bug. I forgot to mention the work-around
that Stephan mentioned.
On Thu, 9 Jun 2016 at 20:38 Stephan Ewen wrote:
> You can also make the KeySelector a static inner class. That should work
> as well.
>
> On Thu, Jun 9, 2016 at 7:00 PM, Tarandeep Singh
> wrote
Chesnay: I have two simple questions, related to the previous ones about
encapsulation of transformations.
Question 1. I have tried to extend my code using your suggestions and come up
with a small concern. First, your code:
public static void main(String[] args) throws Exception
{
Executio
Yes Thanks a lot, also the fact that I was using ParallelSourceFunction was
problematic. So as suggested by Fabian and Robert, I used Source Function
and then in the flink job, i set the output of map with a parallelism of 4
to get the desired result.
Thanks again.
--
View this message in conte
You can also make the KeySelector a static inner class. That should work as
well.
On Thu, Jun 9, 2016 at 7:00 PM, Tarandeep Singh wrote:
> Thank you Aljoscha and Fabian for your replies.
>
> @Aljoscha: when you said "cleaner is not used in CoGroup.where(). I'm
> afraid this is a bug", I am assum
Ok, thanks Aljoscha.
As an alternative to using Flink to maintain the schedule state, I could
take the (e, t2) stream and write to a external key-value store with a
bucket for each minute. Then have a separate service which polls the
key-value store every minute and retrieves the current bucket, a
Hello all,
At first, I have a question posted on
http://stackoverflow.com/questions/37732978/join-two-streams-using-a-count-based-window
. I am re-posting this on the mailing list in case some of you are not on
SO.
In addition, I would like to know what is the difference between Flink and
other S
Thank you Aljoscha and Fabian for your replies.
@Aljoscha: when you said "cleaner is not used in CoGroup.where(). I'm
afraid this is a bug", I am assuming you are referring to Flink engine
itself.
@Fabian: thanks for the optimization tip.
This is how I have got it working (with a hack): In my da
Understood!
I have created a WindowStream and now it is working. Thanks !
El jueves, 9 de junio de 2016, Fabian Hueske escribió:
> Hi,
>
> you are computing a running aggregate, i.e., you're getting one output
> record for each input record and the output record is the record with the
> larges
OK, this indicates that the operator following the source is a bottleneck.
If that's the WindowOperator, it makes sense to try the refactoring of the
WindowFunction.
Alternatively, you can try to run that operator with a higher parallelism.
2016-06-09 17:39 GMT+02:00 Christophe Salperwyck <
chris
Great, thank you!
2016-06-09 17:38 GMT+02:00 Elias Levy :
>
> On Thu, Jun 9, 2016 at 5:16 AM, Fabian Hueske wrote:
>
>> thanks for your feedback. I think those are good observations and
>> suggestions to improve the Kafka producers.
>> The best place to discuss such improvements is the dev maili
Hi Fabian,
Thanks for the help, I will try that. The backpressure was on the source
(HBase).
Christophe
2016-06-09 16:38 GMT+02:00 Fabian Hueske :
> Hi Christophe,
>
> where does the backpressure appear? In front of the sink operator or
> before the window operator?
>
> In any case, I think you
On Thu, Jun 9, 2016 at 5:16 AM, Fabian Hueske wrote:
> thanks for your feedback. I think those are good observations and
> suggestions to improve the Kafka producers.
> The best place to discuss such improvements is the dev mailing list.
>
> Would like to repost your mail there or open JIRAs wher
We solved this problem yesterday at the Flink Hackathon.
The issue was that the source function was started with parallelism 4 and
each function read the whole file.
Cheers, Fabian
2016-06-06 16:53 GMT+02:00 Biplob Biswas :
> Hi,
>
> I tried streaming the source data 2 ways
>
> 1. Is a simple st
Hi,
1) Yes, that is correct. If you set the parallelism of an operator to 1 it
is only executed on a single node. It depends on your application, if you
need a global state or whether multiple local states are OK.
2) Flink programs follow the concept a data flow. There is no communication
between
Hi Christophe,
where does the backpressure appear? In front of the sink operator or before
the window operator?
In any case, I think you can improve your WindowFunction if you convert
parts of it into a FoldFunction.
The FoldFunction would take care of the statistics computation and the
WindowFun
Hi,
you are computing a running aggregate, i.e., you're getting one output
record for each input record and the output record is the record with the
largest value observed so far.
If the record with the largest value is the first, the record is sent out
another time. This is what happened with Mat
Hi,
the problem is that the KeySelector is an anonymous inner class and as such
as a reference to the outer RecordFilterer object. Normally, this would be
rectified by the closure cleaner but the cleaner is not used in
CoGroup.where(). I'm afraid this is a bug.
Best,
Aljoscha
On Thu, 9 Jun 2016
Hi Yukun,
the problem is that the KeySelector is internally invoked multiple times.
Hence it must be deterministic, i.e., it must extract the same key for the
same object if invoked multiple times.
The documentation is not discussing this aspect and should be extended.
Thanks for pointing out thi
You should have a look at this project : https://github.com/addthis/stream-lib
You can use it within Flink, storing intermediate values in a local state.
> Le 9 juin 2016 à 15:29, Yukun Guo a écrit :
>
> Thank you very much for the detailed answer. Now I understand a DataStream
> can be re
Hi,
There are some implementations to do that with low memory footprint. Have a
look at the count min sketch for example. There are some Java
implementations.
Christophe
2016-06-09 15:29 GMT+02:00 Yukun Guo :
> Thank you very much for the detailed answer. Now I understand a DataStream
> can be
Thank you very much for the detailed answer. Now I understand a DataStream
can be repartitioned or “joined” (don’t know the exact terminology) with
keyBy.
But another question:
Despite the non-existence of incremental top-k algorithm, I’d like to
incrementally compute the local word count during o
Hi Josh,
I'll have to think a bit about that one. Once I have something I'll get
back to you.
Best,
Aljoscha
On Wed, 8 Jun 2016 at 21:47 Josh wrote:
> This is just a question about a potential use case for Flink:
>
> I have a Flink job which receives tuples with an event id and a timestamp
> (e
Hi Fabian, Thank you for your answers,
1) If there is only single instance of that function, then it will defeat
the purpose of distributed correct me if I am wrong, so If I run
parallelism with 1 on cluster does that mean it will execute on only one
node?
2) I mean to say, when a map operator re
Hi,
I am writing a program to read timeseries from HBase and do some daily
aggregations (Flink streaming). For now I am just computing some average so
not very consuming but my HBase read get slower and slower (I have few
billions of points to read). The back pressure is almost all the time close
Hi Till,
thanks for the clarification. It all makes sense now.
So the keyBy call is more a partitioning scheme and less of an operator,
similar to Storm's field grouping, and Flink's other schemes such as forward
and broadcast. The difference is that it produces KeyedStreams, which are a
prere
Hi again!
I am working with two DataStreams, I want to get the maximun value from
each pair of them, for example:
//Informacion (matchName, LocalOdd, AwayOdd)
Informacion info1= new Informacion("Match1", 1.10, 3.22);
Informacion info2= new Informacion("Match2", 2.11, 1.10);
Hi Ravikumar,
I'll try to answer your questions:
1) If you set the parallelism of a map function to 1, there will be only a
single instance of that function regardless whether it is execution locally
or remotely in a cluster.
2) Flink does also support aggregations, (reduce, groupReduce, combine,
Hi Elias,
thanks for your feedback. I think those are good observations and
suggestions to improve the Kafka producers.
The best place to discuss such improvements is the dev mailing list.
Would like to repost your mail there or open JIRAs where the discussion
about these changes can continue?
T
Hi Tarandeep,
the exception suggests that Flink tries to serialize RecordsFilterer as a
user function (this happens via Java Serialization).
I said suggests because the code that uses RecordsFilterer is not included.
To me it looks like RecordsFilterer should not be used as a user function.
It is
Hi Ahmed,
I tried setting up your use case and for me it all seems to work. However,
I didn't use the Spring framework and executed the program in a local Flink
cluster.
Maybe you can compile a self-containing example (including example data) to
reproduce your problem and send it to us.
Cheers,
I’m playing with the (Window)WordCount example from Flink QuickStart. I
generate a DataStream consisting of 1000 Strings of random digits, which is
windowed with a tumbling count window of 50 elements:
import org.apache.flink.api.common.functions.FlatMapFunction;import
org.apache.flink.api.java.fu
Hi Aljoscha,
I went to the Flink hackathon by Buzzwords yesterday where Fabian and Robert
helped me with this issue. Apparently I was assuming that the file would be
handled in a single thread but I was using parallelsourcefunction and it was
creating 4 different threads and thus reading the same
Hi Till, Thank you for your answer, I have couple of questions
1) Setting parallelism on a single map function in local is fine but on
distributed will it work as local execution?
2) Is there any other way apart from setting parallelism? Like spark
aggregate function?
3) Is it necessary that aft
Hi James,
the TypeInformation must be available at the call site, not in the case
class definition. In your WindowFunction you are using a TestGen[String] so
it should suffice to add this line at some point before the call to apply():
implicit val testGenType = createTypeInformation[TestGen[String
Hi,
could you try pulling the problem apart, i.e. determine at which point in
the pipeline you have duplicate data. Is it after the sources or in the
CoFlatMap or the Map after the reduce, for example?
Cheers,
Aljoscha
On Wed, 1 Jun 2016 at 17:11 Biplob Biswas wrote:
> Hi,
>
> Before giving the
Great to hear :-)
On Wed, Jun 8, 2016 at 7:45 PM, Josh wrote:
> Thanks Till, your suggestion worked!
>
> I actually just created a new SpecificData for each
> AvroDeserializationSchema instance, so I think it's still just as efficient.
>
> Josh
>
> On Wed, Jun 8, 2016 at 4:41 PM, Till Rohrmann
37 matches
Mail list logo