Re: Network PartitionNotFoundException when run on multi nodes

2018-07-22 Thread zhangminglei
Hi, Steffen You can take a look on this https://github.com/apache/flink/pull/6103 . Hopes can help! Cheers Minglei > 在 2018年7月22日,下午10:22,Steffen Wohlers 写道: > > Hi all, > > I have some problems when running my application on more than one Task > M

Re: Serialization questions

2018-07-18 Thread zhangminglei
Hi, Flavio > addDefaultKryoSerializer differs from registerTypeWithKryoSerializer because > addDefaultKryoSerializer use the passed serializer also for subclasses of the > configured class. Am I right? This is not very clear in the method's Javadoc… I think it is not exactly a problem with flin

Re: Let BucketingSink roll file on each checkpoint

2018-07-09 Thread zhangminglei
Hi, Xilang You can watch the jira what you referred to. I will work on this in the next couple of days. Cheers Minglei > 在 2018年7月9日,上午9:50,XilangYan 写道: > > Hi Febian, > > With watermark, I understand it could only write those that are smaller than > the received watermark, but could I know

Re: compiling flink job with maven is failing with error: object flink is not a member of package org.apache

2018-07-01 Thread zhangminglei
cycle.LifecycleExecutionException: Failed to execute >> goal on project maven-compiler-plugin: Could not resolve dependencies for >> project flink:maven-compiler-plugin:jar:1.5: The following artifacts could >> not be re >> solved: org.apache.flink:flink-java:jar:1.5, &

Re: compiling flink job with maven is failing with error: object flink is not a member of package org.apache

2018-07-01 Thread zhangminglei
cala/myPackage/md_streaming.scala:10: > error: object flink is not a member of package org.apache > [INFO] import org.apache.flink.streaming.util.serialization.SimpleStringSchema > [INFO] ^ > [ERROR] > /home/hduser/dba/bin/flink/md_streaming/src/main/scala/myPack

Re: compiling flink job with maven is failing with error: object flink is not a member of package org.apache

2018-07-01 Thread zhangminglei
Hi, Mich. > Is there a basic MVN pom file for flink? The default one from GitHub does not > seem to be working! Please take a look on https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml

Re: Displaying topic data with Flink streaming

2018-06-30 Thread zhangminglei
Please try new FlinkKafkaConsumer09[String]("md", new SimpleStringSchema(), properties).setStartFromEarliest() and try again. Cheers Minglei. > 在 2018年6月30日,下午10:08,Mich Talebzadeh 写道: > > > Hi, > > I have a streaming topic called "md" that displays test market data. > > I have written a s

Re: The difference between legacy mode and new mode

2018-06-30 Thread zhangminglei
From my point of view, you can choose one of each. But prefer the new mode. And in the future, there is a plan to remove the legacy mode. Please see more about new mode [1] Cheers Minglei [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

Re: Let BucketingSink roll file on each checkpoint

2018-06-29 Thread zhangminglei
By the way, I do not think below is a correct way. As @ Fabian said. The BucketingSink closes files once they reached a certain size (BatchSize) or have not been written to for a certain amount of time (InactiveBucketThreshold). > . If we can close > file during checkpoint, then the result is a

Re: Let BucketingSink roll file on each checkpoint

2018-06-29 Thread zhangminglei
Hi Xilang I think you are doing a together work with the offline team. Also what you said ETL, ETL team want to use the data in HDFS. I would like to confirm one question from you. What is their scheduling time for every job ? 5mins or 10 mins ? > My user case is we read data from message que

Re: Streaming

2018-06-27 Thread zhangminglei
Hi, Sihua & Aitozi I would like add more here, As @Sihua said, we need to query the state frequently. Assume if you use redis to store these states, it will consume a lot of your redis resources. So, you can use a bloomfilter before access to redis. If a pv is told to exist by bloomfilter, th

Re: Streaming

2018-06-27 Thread zhangminglei
it ends up with too many timers in the java heap which might leads to OOM. Cheers Shimin > 在 2018年6月27日,下午5:34,zhangminglei <18717838...@163.com> 写道: > > Aitozi > > From my side, I do not think distinct is very easy to deal with. Even though > together work with ka

Re: Streaming

2018-06-27 Thread zhangminglei
me. > > However, the time resolution of this operator is 1 millisecond, so it ends up > with too many timers in the java heap which might leads to OOM. > > Cheers > Shimin > > 2018-06-27 17:34 GMT+08:00 zhangminglei <18717838...@163.com > <mailto:18717838...

Re: Streaming

2018-06-27 Thread zhangminglei
Aitozi From my side, I do not think distinct is very easy to deal with. Even though together work with kafka support exactly-once. For uv, we can use a bloomfilter to filter pv for geting uv in the end. Window is usually used in an aggregate operation, so I think all should be realized by win

Re: env.setStateBackend deprecated in 1.5 for FsStateBackend

2018-06-26 Thread zhangminglei
At the moment, it seems you can not. Because FsStateBackend extends AbstructFileStateBackend then extend AbstructStateBackend which is deprecated in setStateBackend parameter.. I think you can do what you want like below now but it is very bad. env.setStateBackend(new StateBackend() { @Overrid

Re: Measure Latency from source to sink

2018-06-26 Thread zhangminglei
feedback, > > Well for now I just Want to measure the time that takes form Source to Sink > each transaction add the start and end time in mills > > > > El mar., 26 jun. 2018 a las 5:19, zhangminglei (<18717838...@163.com > <mailto:18717838...@163.com>&

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-26 Thread zhangminglei
By the way, in HA set up. > 在 2018年6月26日,下午5:39,zhangminglei <18717838...@163.com> 写道: > > Hi, Gary Yao > > Once I discovered that there was a change in the ip address[ > jobmanager.rpc.address ]. From 10.208.73.129 to localhost. I think that will > cause t

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-26 Thread zhangminglei
Hi, Gary Yao Once I discovered that there was a change in the ip address[ jobmanager.rpc.address ]. From 10.208.73.129 to localhost. I think that will cause the issue. What do you think ? Cheers Minglei > 在 2018年6月26日,下午4:53,Gary Yao 写道: > > Hi Vishal, > > Could it be that you are not using

Re: Measure Latency from source to sink

2018-06-26 Thread zhangminglei
Hi,Antonio Usually, the measurement of delay is for specific business I think it is more reasonable. What I understand of latency from my experience is data preparation time plus query calculation time. It is like an end to end latency test. Hopes this can help you. Not point to the latency of

Re: Multiple kafka consumers

2018-06-25 Thread zhangminglei
80 task running, a task here is a consumer operator] for 80 number of partitions if you set the kafka partition number is 80. DataStream dataStream = env.addSource(kafkaConsumer08).setParallelism(80); Cheers Minglei > 在 2018年6月25日,下午6:02,Amol S - iProgrammer 写道: > > Thanks zha

Re: Multiple kafka consumers

2018-06-25 Thread zhangminglei
Hi, Amol As @Sihua said. Also in my case, if the kafka partition is 80. I will also set the job source operator parallelism to 80 as well. Cheers Minglei > 在 2018年6月25日,下午5:39,sihua zhou 写道: > > Hi Amol, > > I think If you set the parallelism of the source node equal to the number of > the p

Re: [DISCUSS] Flink 1.6 features

2018-06-24 Thread zhangminglei
Hi, Community By the way, there is a very important feature I think it should be. Currently, the BucketingSink does not support when a bucket is ready for user use. This situation will be very obvious when flink work with offline end. We called that real time/offline integration in business. In

Re: Some doubts related to Rocksdb state backed and checkpointing!

2018-06-24 Thread zhangminglei
Hi,Ashwin > What is the exact difference between checkpoint and state backend? Ans: I can answer the first question you asked. Checkpoint is a mechanism that can make your program fault tolerant. Flink uses distributed snapshots implements checkpoint. But here is the question, where do I to s

Re: Is it reasonable to use Flink in a local machine

2018-06-24 Thread zhangminglei
Hi, Soheil You can set up several taskmanager processes in one node. I think it is reasonable to use Flink like this if you do not have enough machines. Generate data we can think it is source operator and you can set the parallelism for this operator. Process to those data we can think, like

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-06-23 Thread zhangminglei
Hi, Rinat I tried this situation you said and it works fine for me. The partCounter incremented as we hope. When the new part file is created, I did not see any same part index. Here is my code for that, you can take a look. In my case, the max index of part file is part-0-683PartSuffix, other t

Re: [Flink-9407] Question about proposed ORC Sink !

2018-06-22 Thread zhangminglei
Yes, it should be exit. Thanks to Ted Yu. Very exactly! Cheers Zhangminglei > 在 2018年6月23日,下午12:40,Ted Yu 写道: > > For #1, the word exist should be exit, right ? > Thanks > > Original message > From: zhangminglei <18717838...@163.com> > Dat

Re: [Flink-9407] Question about proposed ORC Sink !

2018-06-22 Thread zhangminglei
se and when time the data for the specify bucket is ready. So, you can take a look on https://issues.apache.org/jira/browse/FLINK-9609 <https://issues.apache.org/jira/browse/FLINK-9609>. Cheers Zhangminglei > 在 2018年6月23日,上午8:23,sagar loke 写道: > > Hi Zhangminglei, >

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread zhangminglei
r Flink 1.5.0. > If you can't upgrade yet, you can also implement a user-defined function that > evaluates the big CASE WHEN statement. > > Best, Fabian > > 2018-06-19 16:27 GMT+02:00 zhangminglei <18717838...@163.com > <mailto:18717838...@163.com>>: > Hi

DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread zhangminglei
0.219.252','116.31.114.202','116.31.114.204',\ '116.31.114.206','116.31.114.208') \ then '佛山力通电信_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('183.232.169.11','183.232.169.12','183.232.169.13','183.232.169.14','183.232.169.15','183.232.169.16',\ '183.232.169.17','183.232.169.18') \ then '佛山力通移动_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('112.93.112.11','112.93.112.12','112.93.112.13','112.93.112.14','112.93.112.15','112.93.112.16','112.93.112.17','112.93.112.18') \ then '佛山力通联通_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('114.67.56.79','114.67.56.80','114.67.56.83','114.67.56.84','114.67.56.87','114.67.56.88','114.67.56.112',\ '114.67.56.113','114.67.56.116','114.67.56.117','114.67.60.214','114.67.60.215','114.67.54.111') \ then '佛山力通BGP_GSLB' \ when host in ('114.67.54.112','114.67.56.95','114.67.56.96','114.67.54.12','114.67.54.13','114.67.56.93','114.67.56.94','114.67.56.102','114.67.56.103','114.67.56.106',\ '114.67.56.107','183.60.220.231','183.60.220.232','183.60.219.247','183.60.219.248','114.67.60.201','114.67.60.203','114.67.60.205','114.67.60.207') \ then '佛山力通BGP_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('183.240.167.24','183.240.167.25','183.240.167.26','183.240.167.27','183.240.167.28','183.240.167.29',\ '183.240.167.30','183.240.167.31') \ then '佛山互联移动_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('43.255.228.11','43.255.228.12','43.255.228.13','43.255.228.14','43.255.228.15','43.255.228.16',\ '43.255.228.17') \ then '佛山互联BGP_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('43.255.228.18','43.255.228.19','43.255.228.20') \ then '佛山互联BGP_GSLB' \ when host in ('mapi.appvipshop.com') and mapi_ip in ('43.255.228.21') \ then '佛山互联BGP_GSLB' else '其它' end as access_type from dw_log_app_api_monitor_ds Thanks Zhangminglei

Re: [DISCUSS] Flink 1.6 features

2018-06-18 Thread zhangminglei
more unit tests in there. > 3. Are there plans to add support for other data types ? Ans: Yes. Since I have been busy these days. After a couple of days, I will add the rest data type. And give more tests for that. Cheers Zhangminglei > 在 2018年6月19日,上午9:10,sagar loke 写道: > > Tha

Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread zhangminglei
sink > 2. Flink ML on stream > > >> On Jun 17, 2018, at 8:34 AM, zhangminglei <18717838...@163.com> wrote: >> >> Actually, I have been an idea, how about support hive on flink ? Since lots >> of business are written by hive sql. And users wants to tra

Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread zhangminglei
Actually, I have been an idea, how about support hive on flink ? Since lots of business are written by hive sql. And users wants to transform map reduce to fink without changing the sql. Zhangminglei > 在 2018年6月17日,下午8:11,zhangminglei <18717838...@163.com> 写道: > > Hi, S

Re: [DISCUSS] Flink 1.6 features

2018-06-17 Thread zhangminglei
wse/FLINK-9411> For ORC format, Currently only support basic data types, such as Long, Boolean, Short, Integer, Float, Double, String. Best Zhangminglei > 在 2018年6月17日,上午11:11,sagar loke 写道: > > We are eagerly waiting for > > - Extends Streaming Sinks: > - Bu