Re: Flink performance

2024-03-12 Thread Robin Moffatt via user
Principal DevEx Engineer, Decodable On Tue, 12 Mar 2024 at 06:59, Kamal Mittal via user wrote: > Hello Community, > > > > Please share info. for below query. > > > > Rgds, > > Kamal > > > > *From:* Kamal Mittal via user > *Sent:* Monday, March 11

RE: Flink performance

2024-03-11 Thread Kamal Mittal via user
Hello Community, Please share info. for below query. Rgds, Kamal From: Kamal Mittal via user Sent: Monday, March 11, 2024 1:18 PM To: user@flink.apache.org Subject: Flink performance Hello, Can you please point me to documentation if any such available where flink talks about or documented

Flink performance

2024-03-11 Thread Kamal Mittal via user
Hello, Can you please point me to documentation if any such available where flink talks about or documented performance numbers w.r.t certain use cases? Rgds, Kamal

Re: Flink Performance Issue

2021-09-27 Thread Arvid Heise
Hi Kamaal, I did a quick test with a local Kafka in docker. With parallelism 1, I can process 20k messages of size 4KB in about 1 min. So if you use parallelism of 15, I'd expect it to take it below 10s even with bigger data skew. What I recommend you to do is to start from scratch and just work

Re: Flink Performance Issue

2021-09-27 Thread Mohammed Kamaal
Hi Robert, I have removed all the business logic (keyBy and window) operator code and just had a source and sink to test it. The throughput is 20K messages in 2 minutes. It is a simple read from source (kafka topic) and write to sink (kafka topic). Don't you think 2 minutes is also not a better

Re: Flink Performance Issue

2021-09-22 Thread Robert Metzger
Hi Kamaal, I would first suggest understanding the performance bottleneck, before applying any optimizations. Idea 1: Are your CPUs fully utilized? if yes, good, then scaling up will probably help If not, then there's another inefficiency Idea 2: How fast can you get the data into your job, with

Re: Flink Performance Issue

2021-09-22 Thread Mohammed Kamaal
Hi Arvid, The throughput has decreased further after I removed all the rebalance(). The performance has decreased from 14 minutes for 20K messages to 20 minutes for 20K messages. Below are the tasks that the flink application is performing. I am using keyBy and Window operation. Do you think a

Re: Flink Performance Issue

2021-09-06 Thread Arvid Heise
Hi Mohammed, something is definitely wrong in your setup. You can safely say that you can process 1k records per second and core with Kafka and light processing, so you shouldn't even need to go distributed in your case. Do you perform any heavy computation? What is your flatMap doing? Are you em

Re: Flink Performance Issue

2021-09-02 Thread Mohammed Kamaal
Hi Fabian, Just an update, Problem 2:- Caused by: org.apache.kafka.common.errors.NetworkException It is resolved. It was because we exceeded the number of allowed partitions for the kafka cluster (AWS MSK cluster). Have deleted unused topics and partitions to resolve the issue.

Re: Flink performance with multiple operators reshuffling data

2021-08-31 Thread JING ZHANG
Hi Jason, > In our case, our input/output ratio of these Flin operators are all 1 to 1, so I guess it doesn't matter that much.. Yes > But I think the keys we are using in general are pretty uniform. Cool. You could run for a period of time to see if there is data skew. If there is indeed a data sk

Re: Flink performance with multiple operators reshuffling data

2021-08-31 Thread Jason Liu
Thanks for the help guys! Yea we can potentially append random strings to the keys and duplicate data across them to avoid skewness, if necessary. But I think the keys we are using in general are pretty uniform. The lowest selectivity at the up fornt method is really interesting though. In our cas

Re: Flink performance with multiple operators reshuffling data

2021-08-30 Thread JING ZHANG
Hi Jason, A job with multiple reshuffle data could be scalable under normal circumstances. But we should carefully avoid data skew. Because if input stream has data skew, add more resources would not help. Besides that, if we could adjust the order of the functions, we could put the keyed process f

Re: Flink performance with multiple operators reshuffling data

2021-08-30 Thread Caizhi Weng
Hi! Key-by operations can scale with parallelisms. Flink will shuffle your record to different sub-task according to the hash value of the key modulo number of parallelism, so the more parallelism you have the faster Flink can process data, unless there is a data skew. Jason Liu 于2021年8月31日周二 上午

Flink performance with multiple operators reshuffling data

2021-08-30 Thread Jason Liu
Hi there, We have this use case where we need to have multiple keybys operators with its own MapState, all with different keys, in a single Flink app. This obviously means we'll be reshuffling our data a lot. Our TPS is around 1-2k, with ~2kb per event and we use Kinesis Data Analytics as

Re: Flink Performance Issue

2021-08-24 Thread Fabian Paul
Hi Mohammed, 200records should definitely be doable. The first you can do is remove the print out Sink because they are increasing the load on your cluster due to the additional IO operation and secondly preventing Flink from fusing operators. I am interested to see the updated job graph after

Re: Flink Performance Issue

2021-08-24 Thread Fabian Paul
Hi Mohammed, Without diving too much into your business logic a thing which catches my eye is the partitiong you are using. In general all calls to`keyBy`or `rebalance` are very expensive because all the data is shuffled across down- stream tasks. Flink tries to fuse operators with the same keyG

Flink Performance Issue

2021-08-24 Thread Mohammed Kamaal
Hi, Apologize for the big message, to explain the issue in detail. We have a Flink (version 1.8) application running on AWS Kinesis Analytics. The application has a source which is a kafka topic with 15 partitions (AWS Managed Streaming Kafka) and the sink is again a kafka topic with 15 partiti

Re: Flink performance testing

2020-09-17 Thread Piotr Nowojski
Hi, But what are you asking for? Is it possible to do such benchmarks? Yes, it is possible. People are doing it all the time. Start a cluster, feed the data, measure the throughput (either via custom diagnostic operators, or via metrics [1]). Is there some framework to do it? Not that I know of.

Re: Flink performance testing

2020-09-16 Thread mahesh salunkhe
I would like to do performance testing for my flink job specially related with volume, how my flink job perform if more streaming data coming to my source connectors and measure benchmark for various operators? On Wed, 16 Sep 2020 at 12:03, Piotr Nowojski wrote: > Hi, > > I'm not sure what you a

Re: Flink performance testing

2020-09-16 Thread Piotr Nowojski
Hi, I'm not sure what you are asking for. We do not provide benchmarks for all of the operators. We currently have a couple of micro benchmarks [1] for some of the operators, and we are also setting up some adhoc benchmarks when implementing various features. If you want to benchmark something bes

Flink performance testing

2020-09-16 Thread mahesh salunkhe
Team, What are the framework I should be using for Flink End-to-end Performance Testing? I would like to test performance of each flink operators, back pressure etc

Re: Flink performance tuning on operators

2020-05-18 Thread Arvid Heise
Hi Ivan, Just to add up to chaining: When splitting the map into two parts, objects need to be copied from one operator to the chained operator. Since your objects are very heavy that can take quite long, especially if you don't have a specific serializer configured but rely on Kryo. You can avoi

Re: Flink performance tuning on operators

2020-05-15 Thread Chesnay Schepler
Generally there should be no difference. Can you check whether the maps are running as a chain (as a single task)? If they are running in a chain, then I would suspect that /something/ else is skewing your results. If not, then the added network/serialization pressure would explain it. I will a

Flink performance tuning on operators

2020-05-14 Thread Ivan Yang
Hi, We have a Flink job that reads data from an input stream, then converts each event from JSON string Avro object, finally writes to parquet files using StreamingFileSink with OnCheckPointRollingPolicy of 5 mins. Basically a stateless job. Initially, we use one map operator to convert Json st

Re: Flink Performance

2020-01-21 Thread Dharani Sudharsan
Thanks David. But I don’t see any solutions provided for the same. On Jan 21, 2020, at 7:13 PM, David Magalhães mailto:speeddra...@gmail.com>> wrote: I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone c

Re: Flink Performance

2020-01-21 Thread David Magalhães
I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy. On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan < dharani.sudhar...@outlook.in> wrote: > Hi All, > > Currently

Flink Performance

2020-01-21 Thread Dharani Sudharsan
Hi All, Currently, I’m running a flink streaming application, the configuration below. Task slots: 45 Task Managers: 3 Job Manager: 1 Cpu : 20 per machine My sample code below: Process Stream: datastream.flatmap().map().process().addsink Data size: 330GB approx. Raw Stream: datastream.ke

Re: Flink performance drops when async checkpoint is slow

2019-03-20 Thread Stephan Ewen
we might find something if seeing which operation > delays the task to cause the backpressure, and this operation might be > involved with HDFS. :) > > Best, > Zhijiang > > -- > From:Paul Lam > Send Time:2019年2月

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread zhijiang
which operation delays the task to cause the backpressure, and this operation might be involved with HDFS. :) Best, Zhijiang -- From:Paul Lam Send Time:2019年2月28日(星期四) 19:17 To:zhijiang Cc:user Subject:Re: Flink performance drops

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread Paul Lam
Hi Zhijiang, Thanks a lot for your reasoning! I tried to set the checkpoint to at-leaset-once as you suggested, but unluckily the problem remains the same :( IMHO, if it’s caused by barrier alignment, the state size (mainly buffers during alignment) would be big, right? But actually it’s not,

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread zhijiang
Hi Paul, I am not sure whether task thread is involverd in some works during snapshoting states for FsStateBackend. But I have another experience which might also cause your problem. From your descriptions below, the last task is blocked by `SingleInputGate.getNextBufferOrEvent` that means the

Flink performance drops when async checkpoint is slow

2019-02-27 Thread Paul Lam
Hi, I have a Flink job (version 1.5.3) that consumes from Kafka topic, does some transformations and aggregates, and write to two Kafka topics respectively. Meanwhile, there’s a custom source that pulls configurations for the transformations periodically. The generic job graph is as below. T

Re: Improving Flink Performance

2017-02-06 Thread Fabian Hueske
ent one, the performance problems are gone. > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > Performance-tp11248p11447.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink Performance

2017-02-05 Thread Jonas
.nabble.com/Improving-Flink-Performance-tp11248p11447.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-26 Thread Stephan Ewen
@jonas Flink's Fork-Join Pool drives only the actors, which are doing coordination. Unless your job is permanently failing/recovering, they don't do much. On Thu, Jan 26, 2017 at 2:56 PM, Robert Metzger wrote: > Hi Jonas, > > The good news is that your job is completely parallelizable. So if yo

Re: Improving Flink Performance

2017-01-26 Thread Robert Metzger
1:23 PM, Jonas wrote: > JProfiler > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > Performance-tp11248p11311.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink Performance

2017-01-26 Thread Jonas
JProfiler -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11311.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-26 Thread dromitlabs
com/file/n11305/Tv6KnR6.png > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11307.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.

Re: Improving Flink Performance

2017-01-25 Thread Jonas
.nabble.com/Improving-Flink-Performance-tp11248p11307.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-25 Thread Jonas
che-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png> *Any ideas? * -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11305.html Sent from the Apache Flink User Mailing

Re: Improving Flink Performance

2017-01-25 Thread Jonas
I tried and it added a little performance (~10%) but nothing outstanding. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11301.html Sent from the Apache Flink User Mailing List archive. mailing list

Re: Improving Flink Performance

2017-01-25 Thread Stephan Ewen
ache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > Performance-tp11248p11272.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink Performance

2017-01-24 Thread Jonas
know how to improve that? Might setting the buffer size / timeout be worth exploring? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11272.html Sent from the Apache Flink User Mailing List archive. mailing

Re: Improving Flink Performance

2017-01-24 Thread Stephan Ewen
a way to make this >> faster?* >> >> *Measurements were taken with def writeToSocket[?](d: DataStream[?], >> port: Int): Unit = { d.writeToSocket("localhost", port, new >> SerializationSchema[?] { override def serialize(element: ?): Array[Byte] = >> { &quo

Re: Improving Flink Performance

2017-01-24 Thread Aljoscha Krettek
riteToSocket("localhost", port, new > SerializationSchema[?] { override def serialize(element: ?): Array[Byte] = > { "\n".getBytes(CharsetUtil.UTF_8) } }) } and nc -lk PORT | pv --line-mode > --rate --average-rate --format "Current: %r, Avg:%a, Total: %b" > >

Improving Flink Performance

2017-01-24 Thread Jonas
nk-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink performance

2017-01-24 Thread Jonas
I don't even have images in there :O Will delete this thread and create a new one. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11245.html Sent from the Apache Flink User Mailing List archive. ma

Re: Improving Flink performance

2017-01-23 Thread Ted Yu
context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > performance-tp11211p11225.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink performance

2017-01-23 Thread Jonas
I received it well-formatted. May it be that the issue is your Mail reader? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11225.html Sent from the Apache Flink User Mailing List archive. mailing list

Re: Improving Flink performance

2017-01-23 Thread Greg Hogan
with > > and / > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > performance-tp11211.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Improving Flink performance

2017-01-23 Thread Jonas
ay to make this faster?* /Measurements were taken with and / -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Flink performance tuning

2016-05-17 Thread Robert Metzger
Task Managers > > 21 > > Task Slots > > 20 > > Available Task Slots > > > > > > Best regards, > > Serhiy. > > > > *From:* Robert Metzger [mailto:rmetz...@apache.org] > *Sent:* 13 May 2016 15:26 > *To:* user@flink.apache.org > *Su

RE: Flink performance tuning

2016-05-17 Thread Serhiy Boychenko
eing occupied. Something I am doing is wrong.. 3 Task Managers 21 Task Slots 20 Available Task Slots Best regards, Serhiy. From: Robert Metzger [mailto:rmetz...@apache.org] Sent: 13 May 2016 15:26 To: user@flink.apache.org Subject: Re: Flink performance tuning Hi, Can you try running the job with

Re: How to measure Flink performance

2016-05-13 Thread Ken Krugler
gt; >> Cheers, >> >> Konstantin >> >> On 12.05.2016 18:57, prateekarora wrote: >>> Hi >>> >>> How can i measure throughput and latency of my application in flink 1.0.2 >>> ? >>> >>> Regards >>> Prate

Re: Flink performance tuning

2016-05-13 Thread Stephan Ewen
One issue may be that the selection of YARN containers is not HDFS locality aware here. Hence, Flink may read more splits remotely, where MR reads more splits locally. On Fri, May 13, 2016 at 3:25 PM, Robert Metzger wrote: > Hi, > > Can you try running the job with 8 slots, 7 GB (maybe you need

Re: Flink performance tuning

2016-05-13 Thread Robert Metzger
Hi, Can you try running the job with 8 slots, 7 GB (maybe you need to go down to 6 GB) and only three TaskManagers (-n 3) ? I'm suggesting this, because you have many small JVMs running on your machines. On such small machines you can probably get much more use out of your available memory by run

Flink performance tuning

2016-05-13 Thread Serhiy Boychenko
Hey, I have successfully integrated Flink into our very small test cluster (3 machines with 8 cores, 8GBytes of memory and 2x1TB disks). Basically I am started the session to use YARN as RM and the data is being read from HDFS. /yarn-session.sh -n 21 -s 1 -jm 1024 -tm 1024 My code is very simpl

Re: How to measure Flink performance

2016-05-12 Thread Dhruv Gohil
message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: How to measure Flink performance

2016-05-12 Thread Konstantin Knauf
ve.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com. > -- Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 TNG Technology Consulting GmbH, Betastr.

Re: How to measure Flink performance

2016-05-12 Thread prateekarora
Hi How can i measure throughput and latency of my application in flink 1.0.2 ? Regards Prateek -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html Sent from the Apache Flink User Mailing List

Re: How to measure Flink performance

2016-05-09 Thread prateek arora
Hi Thanks for the answer , then how can i measure the performance of flink ? i want to run my application with both spark and flink . and want to measure the performance . so i can check how fast flink process my data as compare to spark. Regards prateek On Mon, May 9, 2016 at 2:17 AM, Ufuk Cele

Re: How to measure Flink performance

2016-05-09 Thread Ufuk Celebi
Hey Prateek, On Fri, May 6, 2016 at 6:40 PM, prateekarora wrote: > I have below information from spark . do i can get similar information from > Flink also ? if yes then how can i get that. You can get GC time via the task manager overview. The other metrics don't necessarily translate to Flink

How to measure Flink performance

2016-05-06 Thread prateekarora
context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
You're obviously right, the configs were different. In the downloaded version I had set off heap memory to true, whereas in the version I compiled myself this one-time change to flink-conf.yaml was overwritten by recompiling. I have fixed it now and performance is the same. For the record, I had 3

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Ovidiu-Cristian MARCU
Hi, Your assumption may be incorrect related to the TeraSort use case for eastcirclek's implementation. How many time did you run your program? It would be helpful to give more details about your experiment, in terms of configuration, dataset size. Best, Ovidiu > On 14 Apr 2016, at 17:14, Rob

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
I have tried multiple Maven and Scala Versions, but to no avail. I can't seem to achieve performance of the downloaded archive. I am stumped by this and will need to do more experiments when I have more time. Robert On Thu, Apr 14, 2016 at 1:13 PM, Robert Schmidtke wrote: > Hi Robert, > > thank

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
Hi Robert, thanks for the hint! Looks like something I could have figured out myself -.-" I'll let you know if I find something. Robert On Thu, Apr 14, 2016 at 1:06 PM, Robert Metzger wrote: > Hi Robert, > > check out the tools/create_release_files.sh file in the source tree. There > you can s

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Metzger
Hi Robert, check out the tools/create_release_files.sh file in the source tree. There you can see how we are building the release binaries. It would be quite interesting to find out what caused the performance difference. On Wed, Apr 13, 2016 at 5:03 PM, Robert Schmidtke wrote: > Hi everyone, >

Flink performance pre-packaged vs. self-compiled

2016-04-13 Thread Robert Schmidtke
Hi everyone, I'm using Flink 0.10.2 for some benchmarks and had to add some small changes to Flink, which led me to compiling and running it myself. This is when I noticed a performance difference in the pre-packaged Flink version that I downloaded from the web ( http://archive.apache.org/dist/fli