In preparation for putting a flink project in production we wanted to find the
CVE entries for flink applicable to the last few releases, but they appear to
have all Apache projects in one big list. Any help in narrowing it down to
just Flink entries would be appreciated.
Michael
I have a simple test for looking at Flink SQL and hit an exception reported as
a bug. I wonder though if it is a missing dependency.
Michael
Error in test
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at
org.apache.flink.runtime.jobmaster.JobResult.t
I am trying to get a minimal Flink SQL program going to test it out. I have a
StreamExecutionEnvironment and from that created a StreamTableEnvironment. The
docs indicate there should be a fromDataStream method to create a Table, but
none appears to exist according to Eclipse. The method regi
In running tests of flink jobs we are seeing some that yield really good
performance (2.5M records in minutes) and others that are struggleing to get
past 200k records processed. In the later case there are a large number of
keys, and each key gets state in the form of 3 value states. One hold
It looks like it is some issue with backpressure as the same behavior happens
with the client library as a custom source.
Michael
> On Aug 16, 2018, at 6:59 PM, TechnoMage wrote:
>
> I have seen this in the past and running into it again.
>
> I have a kafka consumer that is
I have seen this in the past and running into it again.
I have a kafka consumer that is not getting all the records from the topic.
Kafka conforms there are 300k messages in each partition, and flink only sees a
total of 8000 records in the source.
Kafka is 2.0, flink is 1.4.2 connector is Fli
Has any work been done on support for Windows in 1.5? I tried the scripts in
1.4 with windows 10 with no luck.
Michael
6.0.
>>>>
>>>> Before that, you need to manually handle the state migration in your
>>>> operator’s open method. Lets assume that your OperatorV1 has a state field
>>>> “stateV1”. Your OperatorV2 defines field “stateV2”, which is incompatible
>>&
We are still pretty new to Flink and I have a conceptual / DevOps question.
When a job is modified and we want to deploy the new version, what is the
preferred method? Our jobs have a lot of keyed state.
If we use snapshots we have old state that may no longer apply to the new
pipeline.
If we
CoRichFlatMap or union will work. If you need to know which is historical the
flatmap will be better as you can tell which stream it cam from. But, be
careful about reading historical data and trying to process it all before
processing the new data. That can lead to buffering a lot of incomin
If you use a KeyedStream you can group records by key (city) and then use a
RichFlatMap to aggregate state in a MapState or ListState per key. You can
then have that operator publish the updated results as a new aggregated record,
or send it to a database or such as you see fit.
Michael
> On
If the external web service call does not modify the state of that external
system all the approaches you list are probably ok. If there is external state
modification then you want to ensure on restart the Flink job does not resend
requests to that service or that it can handle duplicate reque
Any itterable of Tuples will work for a for loop: List, Set, etc.
Michael
> On Apr 27, 2018, at 10:47 AM, Soheil Pourbafrani
> wrote:
>
> Thanks, what did you consider the return type of parse method? Arraylist of
> tuples?
>
> On Friday, April 27, 2018, Te
it would look more like:
for (Tuple2<> t2 : parse(t.f3) {
collector.collect(t2);
}
Michael
> On Apr 27, 2018, at 9:08 AM, Soheil Pourbafrani wrote:
>
> Hi, I want to use flatMap to pass to function namely 'parse' a tuple and it
> will return multiple tuple, that each should be a recor
Kafka for both?
>
> --
> Dhruv Kumar
> PhD Candidate
> Department of Computer Science and Engineering
> University of Minnesota
> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>
>> On Apr 26, 2018, at 16:37, TechnoMage > <mailto:mla...@technomage.com>> wr
e real world
> streaming systems.
>
> --
> Dhruv Kumar
> PhD Candidate
> Department of Computer Science and Engineering
> University of Minnesota
> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>
>> On Apr 26, 2018, at 13:26, Techno
In a single machine system this may work ok. In a multi-machine system this is
not as reliable as the time skew from one machine (source) to another (sink)
can impact the measurements. This also does not account for back presure on
the source. We are using an external process to in parallel r
icious the par = 8
>
> Jps? Meaning?
>
> Oh I should mention that the JobManager node is also a TaskManager.
>
> Best,
> Max
>
>> On 27 Apr 2018, at 01:39, TechnoMage > <mailto:mla...@technomage.com>> wrote:
>>
>> Check that you have slaves
> I have set numOfSlotsPerTaskManager = 8. Which is reasonable as each has 8
> cpus.
>
> Best,
> Makis
>
> On Fri, 27 Apr 2018, 01:26 TechnoMage, <mailto:mla...@technomage.com>> wrote:
> You need to verify your configs are correct. Check that the local machine
> sees all the
You need to verify your configs are correct. Check that the local machine sees
all the task managers, that is the most likely reason it will reject a higher
parallelism. I use a java program to submit to a 3 node 18 slot cluster
without issue on a job with 18 parallelism. I have not used the
Go to the web UI and verify all 136 TaskManagers are visible in the machine you
are submitting the job from. I have encountered issues where not all
TaskManagers start, or you may not have all 17 configured properly to be one
cluster vs 17 clusters of 8.
Michael
> On Apr 26, 2018, at 10:48 AM
If you are using keyed messages in Kafka, or keyed streams in flink, then only
partitions that get hashed to the proper value will get data. If not keyed
messages, then yes they should all get data.
Michael
> On Apr 25, 2018, at 8:25 PM, 潘 功森 wrote:
>
> The event is running all the time in o
I agree in the general case you need to operate on the stream data based on the
metadata you have. The side input feature coming some day may help you, in
that it would give you a means to receive inputs out of band. But, given
changing metadata and changing stream data I am not sure this is a
Using a flat map function, you can always buffer the non-meta data stream in
the operator state until the metadata is aggregated, and then process any
collected data. It would require a RichFlatMap to hold data.
Michael
> On Apr 25, 2018, at 1:20 PM, Ken Krugler wrote:
>
> Hi Fabian,
>
>> O
019/flink-kafka-consumer-groupid-not-working
>
> <https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working>
>
> From: TechnoMage [mailto:mla...@technomage.com]
> Sent: Wednesday, April 25, 2018 8:52 AM
> To: Tzu-Li (Gordon) Tai
> Cc: user
Just in case it is a metrics bug, I will add a step to do my own counting in
the Flink job.
Michael
> On Apr 25, 2018, at 9:52 AM, TechnoMage wrote:
>
> I have another java program reading the topic to monitor the test. It
> receives 60,000 records on the “travel” topic, whi
of your
> test are incorrect?
> Or are you assuming that it is incorrect based on the weird metric numbers
> shown on the web ui?
>
> Cheers,
> Gordon
>
> On 25 April 2018 at 6:13:07 AM, TechnoMage (mla...@technomage.com
> <mailto:mla...@technomage.com>) wrot
I have been using the kafka connector sucessfully for a while now. But, am
getting weird results in one case.
I have a test that submits 3 streams to kafka topics, and monitors them on a
separate process. The flink job has a source for each topic, and one such is
fed to 3 separate map functio
The different versions of the connector correspond to different versions of
Kafka. If you are using Kafka 0.8 use 0.8 connector, etc. Versions of the
connector after 0.10 support exactly once delivery, versions prior to that only
offer at least once delivery.
Kafka supports distributed proces
the kafka messages in
about 34 min while the flink job has yet to read all the kafka messages
1hr40min later.
Michael
> On Apr 17, 2018, at 12:58 PM, TechnoMage wrote:
>
> Also, I note some messages in the log about my java class not being a valid
> POJO because it is missing acc
Also, I note some messages in the log about my java class not being a valid
POJO because it is missing accessors for a field. Would this impact
performance significantly?
Michael
> On Apr 17, 2018, at 12:54 PM, TechnoMage wrote:
>
> No checkpoints are active.
> I will try t
eap and offheap (rocksdb).
> - Do you have some expensive types (JSON, etc)? Try activating object reuse
> (which avoids some extra defensive copies)
>
> On Tue, Apr 17, 2018 at 5:50 PM, TechnoMage <mailto:mla...@technomage.com>> wrote:
> Memory use is steady through
>> to disk like regular Linux, then that might be triggered if your JVM heap is
>> bigger than can be handled within the available RAM.
>>
>> On Tue, Apr 17, 2018 at 9:26 AM, TechnoMage > <mailto:mla...@technomage.com>> wrote:
>> I am doing a short Proo
If I use defaults for the most part but configure flink to have parallelism 5
and kafka to have 5 brokers (one of each on 5 nodes) will the connector and
kafka be smart enough to use the kafka partition on the same node as the flink
task manager for the 5 partitions? Do I need to explicitly ass
I am doing a short Proof of Concept for using Flink and Kafka in our product.
On my laptop I can process 10M inputs in about 90 min. On 2 different EC2
instances (m4.xlarge and m5.xlarge both 4core 16GB ram and ssd storage) I see
the process hit a wall around 50min into the test and short of 7
ested the behavior on Flink 1.4.2 and setting the parallelism in the
> flink-conf.yaml of the client was working correctly in a simple local setup.
>
> If this doesn't solve your problem, we'd need a bit more information about
> the job submission and setup.
>
> Bes
ür the parallelism of a job or
>> operator. The scheduler only cares about the number of slots.
>>
>> How did you set the default parallelism? In the config or in the program /
>> StreamExecutionEnvironment?
>>
>> Best, Fabian
>>
>>
>> T
If you look at the web UI for flink it will tell you the bytes received and
sent for each stage of a job. I have not seen any similar metric for persisted
state per stage, which would be nice to have as well.
Michael
> On Apr 13, 2018, at 6:37 AM, Darshan Singh wrote:
>
> Hi
>
> Is there an
I am pretty new to flink. I have a flink job that has 10 transforms (mostly
CoFlatMap with some simple filters and key extractrs as well. I have the
config set for 6 slots and default parallelism of 6, but all my stages show
paralellism of 1. Is that because there is only one task manager? S
ter/ops/state/state_backends.html#configuring-a-state-backend
>
> <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html#configuring-a-state-backend>
>
> On Wed, Apr 11, 2018 at 11:04 PM, TechnoMage <mailto:mla...@technomage.com>> wrote:
> I a
e code hard to read. I’m thinking about
> watermark, but not sure how to do this.
>
>
> --
> Thanks
> Ivan
> From: TechnoMage
> Date: Thursday, 12 April 2018 at 3:21 AM
> To: Ivan Wang
> Cc: "user@flink.apache.org"
> Subject: Re: Is Flink able to do
I am pretty new to flink and have an initial streaming job working both locally
and remotely. But, both ways if the data volume is too high it runs out of
heap. I am using RichMapFunction to process multiple streams of data. I
assumed Flink would manage keeping state in ram when possible, and
I am new to Flink so others may have more complete answer or correct me.
If you are counting the events in a tumbling window you will get output at the
end of each tumbling window, so a running count of events/window. It sounds
like you want to compare the raw data to the smoothed data? You ca
I have seen this when my task manager ran out of RAM. Increase the heap size.
flink-conf.yaml:
taskmanager.heap.mb
jobmanager.heap.mb
Michael
> On Apr 8, 2018, at 2:36 AM, 王凯 wrote:
>
>
> hi all, recently, i found a problem,it runs well when start. But after long
> run,the exception displa
I have a use case that I wonder if Flink handles well:
1) 5M+ keys in a KeyedStream
2) Using RichFlatMap to track data on each key
Will Flink spread one operator’s partitions over multiple
machines/taskmanager/jobmanager?
Michael
I am new to Flink and trying to understand the keyBy and KeyedStream. From the
short doc description I expected it to partition the data such that the
following flatMap would only see elements with the same key. That events with
different keys would be presented to different instances of FlatM
46 matches
Mail list logo