Re:Re: Re: Flink SQL Count Distinct performance optimization

2020-01-08 Thread sunfulin
hi, Thanks for the reply. I am using default FsStateBackend rather than rocksdb with checkpoint off. So I really cannot see any state info from the dashboard. I will research more details and see if any alternative can be optimized. At 2020-01-08 19:07:08, "Benchao Li" wrote: >hi sun

Re: Re: Flink SQL Count Distinct performance optimization

2020-01-08 Thread Benchao Li
hi sunfulin, As Kurt pointed out, if you use RocksDB state backend, maybe slow disk IO bound your job. You can check WindowOperator's latency metric to see how long it tasks to process an element. Hope this helps. sunfulin 于2020年1月8日周三 下午4:04写道: > Ah, I had checked resource usage and GC from fl

Re:Re: Flink SQL Count Distinct performance optimization

2020-01-08 Thread sunfulin
Ah, I had checked resource usage and GC from flink dashboard. Seem that the reason is not cpu or memory issue. Task heap memory usage is less then 30%. Could you kindly tell that how I can see more metrics to help target the bottleneck? Really appreciated that. At 2020-01-08 15:59:17, "

Re:Re: Flink SQL Count Distinct performance optimization

2020-01-08 Thread sunfulin
hi,godfreyhe As far as I can see, I rewrite the running sql from one count distinct level to 2 level agg, just as the table.optimizer.distinct-agg.split.enabled param worked. Correct me if I am telling the wrong way. But the rewrite sql does not work well for the performance throughout. For n

Re: Flink SQL Count Distinct performance optimization

2020-01-07 Thread Kurt Young
Hi, Could you try to find out what's the bottleneck of your current job? This would leads to different optimizations. Such as whether it's CPU bounded, or you have too big local state thus stuck by too many slow IOs. Best, Kurt On Wed, Jan 8, 2020 at 3:53 PM 贺小令 wrote: > hi sunfulin, > you ca

Re: Flink SQL Count Distinct performance optimization

2020-01-07 Thread 贺小令
hi sunfulin, you can try with blink planner (since 1.9 +), which optimizes distinct aggregation. you can also try to enable *table.optimizer.distinct-agg.split.enabled* if the data is skew. best, godfreyhe sunfulin 于2020年1月8日周三 下午3:39写道: > Hi, community, > I'm using Apache Flink SQL to build so

Flink SQL Count Distinct performance optimization

2020-01-07 Thread sunfulin
Hi, community, I'm using Apache Flink SQL to build some of my realtime streaming apps. With one scenario I'm trying to count(distinct deviceID) over about 100GB data set in realtime, and aggregate results with sink to ElasticSearch index. I met a severe performance issue when running my flink jo