Re: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2

2020-02-20 Thread vino yang
+1 for dropping Savepoint compatibility with Flink 1.2 Flink 1.2 is quite far away from the latest 1.10. Especially after the release of Flink 1.9 and 1.10, the code and architecture have undergone major changes. Currently, I am updating state migration tests for Flink 1.10. I can still see some

Re: Are there pipeline API's for ETL?

2020-01-10 Thread vino yang
Hi kant, Can you provide more context about your question? What do you mean about "pipeline API"? IMO, you can build an ETL pipeline via composing several Flink transform APIs. About choosing which transform APIs, it depends on your business logic. Here are the generic APIs list.[1] Best, Vino

Re: Checkpoints issue and job failing

2020-01-05 Thread vino yang
Regarding flink 1.9, we haven't migrated yet but we are planning to do. > Since we have to test it might take sometime. > > Thanks > > On Fri, Jan 3, 2020 at 2:14 AM Congxian Qiu > wrote: > >> Hi >> >> Do you have ever check that this problem exists on F

Re: Flink logging issue with logback

2020-01-05 Thread vino yang
Hi Bajaj, >> Logs from main method(outside of job graph) do not show up in jobmanager logs. IMO, it's normal phenomena. Other ideas, please check the JVM options mentioned by Yang. Best, Vino Yang Wang 于2020年1月6日周一 上午11:18写道: > Hi Bajaj, Abhinav, > > Could you share

Re: Checkpoints issue and job failing

2020-01-02 Thread vino yang
Hi Navneeth, Did you check if the path contains in the exception is really can not be found? Best, Vino Navneeth Krishnan 于2020年1月3日周五 上午8:23写道: > Hi All, > > We are running into checkpoint timeout issue more frequently in production > and we also see the below exception. We are running flink

Re: Session Window with dynamic gap

2020-01-02 Thread vino yang
Hi KristoffSC, >> Are there any plans to add support of Flink State into SessionWindowTimeGapExtractor? As I said, `SessionWindowTimeGapExtractor` is neither a general UDF nor an operator. But I cannot give a clear answer. Let me ping @Aljoscha Krettek to give the answer. Best, Vino Kristoff

Re: Session Window with dynamic gap

2020-01-02 Thread vino yang
Hi KristoffSC, Firstly, IMO, you can implement this feature by customizing the `SessionWindowTimeGapExtractor`. Additionally, let me clearify a concept. A component that implements the `SessionWindowTimeGapExtractor` interface should not be an operator in Flink. In Flink's concepts, Window is an

Re: Sub-user

2020-01-02 Thread vino yang
Hi Jary, All the Flink's mailing list information can be found here[1]. [1]: https://flink.apache.org/community.html#mailing-lists Best, Vino Benchao Li 于2020年1月2日周四 下午4:56写道: > Hi Jary, > > You need to send a email to *user-subscr...@flink.apache.org > * to subscribe, not user@flink.apache.o

Re: An issue with low-throughput on Flink 1.8.3 running Yahoo streaming benchmarks

2019-12-30 Thread vino yang
Hi Shinhyung, Can you compare the performance of the different Flink versions based on the same environment (Or at least the same configuration of the node and framework)? I see there are some different configurations of both clusters and frameworks. It would be better to comparison in the same e

Re: Flink TaskManager Memory

2019-12-26 Thread vino yang
Hi Tim, Reference a blog comes from Ververica: "When you choose RocksDB as your state backend, your state lives as a serialized byte-string in either the off-heap memory or the local disk." It also contains many tune config options you can consider.[1] Best, Vino [1]: https://www.ververica.com

Re: Flink Dataset to ParquetOutputFormat

2019-12-26 Thread vino yang
; > df.output(parquetFormat); > env.execute(); > > > Please suggest. > > Thanks, > Anuj > > On Mon, Dec 23, 2019 at 12:59 PM vino yang wrote: > >> Hi Anuj, >> >> After searching in Github, I found a demo repository about how to use >>

Re: Rewind offset to a previous position and ensure certainty.

2019-12-25 Thread vino yang
Hi Ruibin, Are you finding how to generate watermark pre Kafka partition? Flink provides Kafka-partition-aware watermark generation. [1] Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_timestamps_watermarks.html#timestamps-per-kafka-partition 邢瑞斌 于2019年12月25日周三

Re: Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-24 Thread vino yang
Hi Mans, IMO, the mechanism of metrics reporter does not depend on any deployment mode. >> is there any Prometheus configuration or service discovery option available that will dynamically pick up the metrics from the Filnk job and task managers running in cluster ? Can you share more informatio

Re: Flink Dataset to ParquetOutputFormat

2019-12-22 Thread vino yang
Hi Anuj, After searching in Github, I found a demo repository about how to use parquet in Flink.[1] You can have a look. I can not make sure whether it is helpful or not. [1]: https://github.com/FelixNeutatz/parquet-flinktacular Best, Vino aj 于2019年12月21日周六 下午7:03写道: > Hello All, > > I am ge

Re: Flink On K8s, build docker image very slowly, is there some way to make it faster?

2019-12-22 Thread vino yang
Hi Lake, Can you clearly count or identify which steps are taking a long time? Best, Vino LakeShen 于2019年12月23日周一 下午2:46写道: > Hi community , when I run the flink task on k8s , the first thing is that > to build the flink task jar to > Docker Image . I find that It would spend much time to buil

Re: Flink Prometheus metric doubt

2019-12-19 Thread vino yang
Hi Jesus, IMHO, maybe @Chesnay Schepler can provide more information. Best, Vino Jesús Vásquez 于2019年12月19日周四 下午6:57写道: > Hi all, i'm monitoring Flink jobs using prometheus. > I have been trying to use the metrics flink_jobmanager_job_uptime/downtime > in order to create an alert, that fires

Re: Unit testing filter function in flink

2019-12-19 Thread vino yang
Hi Vishwas, Apache Flink provides some test harness to test your application code on multiple levels of the testing pyramid. You can use them to test your UDF. Please see more examples offered by the official documentation[1]. Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stab

Re: Can trigger fire early brefore specific element get into ProcessingTimeSessionWindow

2019-12-19 Thread vino yang
Hi Utopia, Flink provides a high scalability window mechanism.[1] For your scene, you can customize your window assigner and trigger. [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html Best, Vino Utopia 于2019年12月19日周四 下午5:56写道: > Hi, > > I want to f

Re: DataStream API min max aggregation on other fields

2019-12-19 Thread vino yang
Hi weizheng, IMHO, I do not know where is not clear to you? Is the result not correct? Can you share the correct result based on your understanding? The "keyBy" specifies group field and min/max do the aggregation in the other field based on the position you specified. Best, Vino Lu Weizheng 于

Re: Apache Flink - Flink Metrics - How to distinguish b/w metrics for two job manager on the same host

2019-12-18 Thread vino yang
Hi Mans, IMO, one job manager represents one Flink cluster and one Flink cluster has a suite of Flink configuration e.g. metrics reporter. Some metrics reporters support tag feature, you can specify it to distinguish different Flink cluster.[1] [1]: https://ci.apache.org/projects/flink/flink-doc

Re: Cannot get value from ValueState in ProcessingTimeTrigger

2019-12-18 Thread vino yang
he ValueState before > update? > > before update value : 3 >> >> after update value: 4 >> >> > What’s more, How can I stored the previous value so that I can get the > value when next element come in and invoke the onElement method? > > > > Best r

Re: Cannot get value from ValueState in ProcessingTimeTrigger

2019-12-18 Thread vino yang
Hi Utopia, The behavior may be correct. First, the default value is null. It's the correct value. `ValueStateDescriptor` has multiple constructors, some of them can let you specify a default value. However, these constructors are deprecated. And the doc does not recommend them.[1] For the other c

Re: [Question] How to use different filesystem between checkpoint data and user data sink

2019-12-18 Thread vino yang
Hi ouywl, *>>Thread.currentThread().getContextClassLoader();* What does this statement mean in your program? In addition, can you share your implementation of the customized file system plugin and the related exception? Best, Vino ouywl 于2019年12月18日周三 下午4:59写道: > Hi all, > We have im

Re: RichAsyncFunction Timeout

2019-12-17 Thread vino yang
Hi Polarisary, IMO, firstly, it would be better to monitor the OS and Flink/HBase metrics. For example: - Flink and HBase cluster Network I/O metrics; - Flink TM CPU/Memory/Backpressure metrics and so on; You can view these metrics to find some potential reasons. If you can not figure it

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

2019-12-17 Thread vino yang
Hi Ethan, Share two things: - I have found "taskmanager.memory.preallocate" config option has been removed in the master codebase. - After researching git history, I found the description of " taskmanager.memory.preallocate" was written by @Chesnay Schepler (from 1.8 branch). So

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-16 Thread vino yang
r* */* Data Platform Developer > M: +972.528197720 */* Skype: sidney.feiner.startapp > > [image: emailsignature] > > -- > *From:* vino yang > *Sent:* Monday, December 16, 2019 7:56 AM > *To:* Sidney Feiner > *Cc:* user@flink.apache.org > *

Re: Documentation tasks for release-1.10

2019-12-16 Thread vino yang
+1 for centralizing all the documentation issues so that the community can take more effective to fix them. Best, Vino Xintong Song 于2019年12月16日周一 下午6:02写道: > Thank you Kostas. > Big +1 for keeping all the documentation related issues at one place. > > I've added the documentation task for reso

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

2019-12-15 Thread vino yang
Hi Ethan, For now, my suggestion is that you can set "preallocate" to false. The description(the link provided by you) of "taskmanager.memory.preallocate" says: "When taskmanager.memory.off-heap is set to true, then it is advised that this configuration is also set to true." Best, Vino Ethan Li

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-15 Thread vino yang
Hi Sidney, Firstly, the `open` method of UDF's instance is always invoked when the task thread starts to run. >From the second code snippet image that you provided, I guess you are trying to get a dynamic handler with reflection technology, is that correct? I also guess that you want to get a dyn

Re: TypeInformation problem

2019-12-15 Thread vino yang
Hi Nick, >From StackOverflow, I see a similar issue which answered by @Till Rohrmann . [1] FYI. Best, Vino [1]: https://stackoverflow.com/questions/38214958/flink-error-specifying-keys-via-field-positions-is-only-valid-for-tuple-data-ty Nicholas Walton 于2019年12月14日周六 上午12:01写道: > I was refac

Re: How to understand create watermark for Kafka partitions

2019-12-13 Thread vino yang
Hi Alex, >> But why also say created watermark for each Kafka topic partitions ? IMO, the official documentation has explained the reason. Just copied here: When using Apache Kafka as a data source, each Kafka par

Re: State Processor API: StateMigrationException for keyed state

2019-12-12 Thread vino yang
Hi pwestermann, Can you share the relevant detailed exception message? Best, Vino pwestermann 于2019年12月13日周五 上午2:00写道: > I am trying to get the new State Processor API but I am having trouble with > keyed state (this is for Flink 1.9.1 with RocksDB on S3 as the backend). > I can read keyed sta

Re: Flink client trying to submit jobs to old session cluster (which was killed)

2019-12-12 Thread vino yang
Hi Pankaj, Can you tell us what's Flink version do you use? And can you share the Flink client and job manager log with us? This information would help us to locate your problem. Best, Vino Pankaj Chand 于2019年12月12日周四 下午7:08写道: > Hello, > > When using Flink on YARN in session mode, each Flin

Re: Processing Events by custom rules kept in Broadcast State

2019-12-10 Thread vino yang
Hi KristoffSC, It seems the main differences are when to parse your rules and what could be put into the broadcast state. IMO, multiple solutions all can take effect. I prefer option 3. I'd like to parse the rules ASAP and let them be real rule event stream (not ruleset stream) in the source. The

Re: Thread access and broadcast state initialization in BroadcastProcessFunction

2019-12-10 Thread vino yang
Hi kristoffSC, >> I've noticed that all methods are called by the same thread. Would it be always the case, or could those methods be called by different threads? No, open/processXXX/close methods are called in the different stages of a task thread's life cycle. The framework must keep the call o

Re: Flink ML feature

2019-12-10 Thread vino yang
t; > > On Tue, Dec 10, 2019 at 7:11 AM vino yang wrote: > >> Hi Chandu, >> >> AFAIK, there is a project named Alink[1] which is the Machine Learning >> algorithm platform based on Flink, developed by the PAI team of Alibaba >> computing platform. FYI >>

Re: Flink ML feature

2019-12-09 Thread vino yang
Hi Chandu, AFAIK, there is a project named Alink[1] which is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. FYI Best, Vino [1]: https://github.com/alibaba/Alink Tom Blackwood 于2019年12月10日周二 下午2:07写道: > You may try Spark ML, whi

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread vino yang
Hi Li, A potential reason could be conflicting logging frameworks. Can you share the log in your .out file and let us know if the print format of the log is the same as the configuration file you gave. Best, Vino Li Peng 于2019年12月10日周二 上午10:09写道: > Hey folks, I noticed that my kubernetes flink

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread vino yang
you @vino yang for the reply. I suspect > keyBy will beneficial in those cases where my subsequent operators are > computationally intensive. Their computation time being > than network > reshuffling cost. > > Regards, > Komal > > On Mon, 9 Dec 2019 at 15:23, vino yang wro

Re: Sample Code for querying Flink's default metrics

2019-12-09 Thread vino yang
Hi Pankaj, > Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters? What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-08 Thread vino yang
Hi dev, The time of the window may have different semantics. In the session window, it's only a time gap, the size of the window is driven via activity events. In the tumbling or sliding window, it means the size of the window. For more details, please see the official documentation.[1] Best, Vi

Re: KeyBy/Rebalance overhead?

2019-12-08 Thread vino yang
Hi Komal, KeyBy(Hash Partition, logically partition) and rebalance(physical partition) are both one of the partitions been supported by Flink.[1] Generally speaking, partitioning may cause network communication(network shuffles) costs which may cause more time cost. The example provided by you ma

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-04 Thread vino yang
Hi devinbost, Sharing two example links with you : - the example code of official documentation[1]; - a StackOverflow answer of a similar question[2]; [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html#aggregatefunction [2]: https://stackove

Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-04 Thread vino yang
+1 jincheng sun 于2019年12月5日周四 上午10:26写道: > +1 for drop it, and Thanks for bring up this discussion Chesnay! > > Best, > Jincheng > > Jark Wu 于2019年12月5日周四 上午10:19写道: > >> +1 for dropping, also cc'ed user mailing list. >> >> >> Best, >> Jark >> >> On Thu, 5 Dec 2019 at 03:39, Konstantin Knauf

Re: Building with Hadoop 3

2019-12-04 Thread vino yang
Hi Marton, Thanks for your explanation. Personally, I look forward to your contribution! Best, Vino Márton Balassi 于2019年12月4日周三 下午5:15写道: > Wearing my Cloudera hat I can tell you that we have done this exercise for > our distros of the 3.0 and 3.1 Hadoop versions. We have not contributed > t

Re: Auto Scaling in Flink

2019-12-03 Thread vino yang
ure, when the new scheduling architecture is implemented >> https://issues.apache.org/jira/browse/FLINK-10407 . >> >> You can do it externally by cancel the job with a savepoint, update the >> parallelism, and restart the job, according to the rate of data. like what >> prave

Re: Building with Hadoop 3

2019-12-03 Thread vino yang
cc @Chesnay Schepler to answer this question. Foster, Craig 于2019年12月4日周三 上午1:22写道: > Hi: > > I don’t see a JIRA for Hadoop 3 support. I see a comment on a JIRA here > from a year ago that no one is looking into Hadoop 3 support [1]. Is there > a document or JIRA that now exists which would poi

Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread vino yang
+1, One concern: these two classes are marked with `@publicEvolving` annotation. Shall we mark them with `@Deprecated` annotation firstly? Best, Vino Dian Fu 于2019年12月3日周二 下午8:56写道: > +1 to remove them. It seems that we should also drop the class Option as > it's currently only used in Require

Re: Access to CheckpointStatsCounts

2019-12-02 Thread vino yang
Hi min, If it is only for monitoring purposes, you can just use checkpoint REST API[1] to do this work. [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/rest_api.html#jobs-jobid-checkpoints Best, Vino 于2019年12月2日周一 下午5:01写道: > Hi, > > > > Just wonder how to access t

Re: Firing timers on ProcessWindowFunction

2019-12-02 Thread vino yang
Hi Avi, Firstly, let's clarify that the "timer" you said is the timer of the window? Or a timer you want to register to trigger some action? Best, Vino Avi Levi 于2019年12月2日周一 下午4:11写道: > Hi, > Is there a way to fire timer in a ProcessWindowFunction ? I would like to > mutate the global state

Re: Read multiline JSON/XML

2019-12-01 Thread vino yang
Also, say sorry to Flavio! Best, Vino vino yang 于2019年12月2日周一 上午10:29写道: > Hi Chesnay, > > Sorry, yes, I lost the "like" keyword. I mistakenly thought he wanted to > ask how to use Spark to accomplish this job. > > Best, > Vino > > Chesnay Schepler 于2

Re: Read multiline JSON/XML

2019-11-29 Thread vino yang
Hi Flavio, IMO, it would take more effect to ask this question in the Spark user mailing list. WDYT? Best, Vino Flavio Pompermaier 于2019年11月29日周五 下午7:09写道: > Hi to all, > is there any out-of-the-box option to read multiline JSON or XML like in > Spark? > It would be awesome to have something

Re: Auto Scaling in Flink

2019-11-28 Thread vino yang
Hi Akash, You can use Pravega connector to integrate with Flink, the source code is here[1]. In short, relying on its rescalable state feature[2] flink supports scalable streaming jobs. Currently, the mainstream solution about auto-scaling is Flink + K8S, I can share some resources with you[3].

Re: JobGraphs not cleaned up in HA mode

2019-11-28 Thread vino yang
Hi, Why do you not use HDFS directly? Best, Vino 曾祥才 于2019年11月28日周四 下午6:48写道: > > anyone have the same problem? pls help, thks > > > > -- 原始邮件 -- > *发件人:* "曾祥才"; > *发送时间:* 2019年11月28日(星期四) 下午2:46 > *收件人:* "Vijay Bhaskar"; > *抄送:* "User-Flink"; > *主题:* 回复: JobGra

Re: Flink 'Job Cluster' mode Ui Access

2019-11-28 Thread vino yang
09 CST"} > > # curl :4081/jobs > {"jobs":[{"id":"___job_Id_____","status":"RUNNING"}]} > > Which shows the state of the job as running. > > What else can we do ? > > Best regards, > Jatin > > On Thu, Nov 28, 2019

Re: Flink 'Job Cluster' mode Ui Access

2019-11-27 Thread vino yang
Hi Jatin, Flink web UI does not depend on any deployment mode. You should check if there are error logs in the log file and the job status is running state. Best, Vino Jatin Banger 于2019年11月28日周四 下午3:43写道: > Hi, > > It seems there is Web Ui for Flink Session cluster, But for Flink Job > Clust

Re: SQL Performance

2019-11-26 Thread vino yang
Hi Nick, Can you provide more details? Are you using JDBCOutputFormat? If yes, can `JDBCOutputFormatBuilder#setBatchInterval` help you? Best, Vino Nicholas Walton 于2019年11月26日周二 下午9:20写道: > I’m streaming records down to an Embedded Derby database, at a rate of > around 200 records per second.

Re: Pre-process data before it hits the Source

2019-11-26 Thread vino yang
; Best, > Felipe > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Tue, Nov 26, 2019 at 10:09 AM vino yang wrote: > >> Hi Felipe, >> &g

Re: Pre-process data before it hits the Source

2019-11-26 Thread vino yang
.blogspot.com>* > > > On Tue, Nov 26, 2019 at 2:57 AM vino yang wrote: > >> Hi Vijay, >> >> IMO, the semantics of the source is not changeless. It can contain >> integrate with third-party systems and consume events. However, it can also >> contain mo

Re: Flink behavior as a slow consumer - out of Heap MEM

2019-11-25 Thread vino yang
Hi Hanan, Sometimes, the behavior depends on your implementation. Since it's not a built-in connector, it would be better to share your customized source with the community so that the community would be better to help you figure out where is the problem. WDYT? Best, Vino Hanan Yehudai 于2019年

Re: Question about to modify operator state on Flink1.7 with State Processor API

2019-11-25 Thread vino yang
Hi Kaihao, Ping @Aljoscha Krettek @Tzu-Li (Gordon) Tai to give more professional suggestions. What's more, we may need to give a statement about if the state processor API can process the snapshots generated by the old version jobs. WDYT? Best, Vino Kaihao Zhao 于2019年11月25日周一 下午11:39写道: >

Re: Pre-process data before it hits the Source

2019-11-25 Thread vino yang
Hi Vijay, IMO, the semantics of the source is not changeless. It can contain integrate with third-party systems and consume events. However, it can also contain more business logic about your data pre-process after consuming events. Maybe it needs some customization. WDYT? Best, Vino Vijay Bala

Re: Flink Kudu Connector

2019-11-25 Thread vino yang
Hi Rahul, Only found some resources from the Internet you can consider.[1][2] Best, Vino [1]: https://bahir.apache.org/docs/flink/current/flink-streaming-kudu/ [2]: https://www.slideshare.net/0xnacho/apache-flink-kudu-a-connector-to-develop-kappa-architectures Rahul Jain 于2019年11月25日周一 下午6:32写

Re: Idiomatic way to split pipeline

2019-11-25 Thread vino yang
ue etc' ) . I thought there is more idiomatic but if this is > it, than I will go with that. > > On Mon, Nov 25, 2019 at 10:42 AM vino yang wrote: > >> *This Message originated outside your organization.* >> -- >> Hi Avi, >>

Re: Idiomatic way to split pipeline

2019-11-25 Thread vino yang
Hi Avi, As the doc of DataStream#split said, you can use the "side output" feature to replace it.[1] [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html Best, Vino Avi Levi 于2019年11月25日周一 下午4:12写道: > Hi, > I want to split the output of one of the operators

Re: How to submit two jobs sequentially and view their outputs in .out file?

2019-11-25 Thread vino yang
; > Thank you! That's exactly what's happening. Is there any way to force it > write to a specific .out of a TaskManager? > > > Best Regards, > Komal > > > > On Mon, 25 Nov 2019 at 11:10, vino yang wrote: > >> Hi Komal, >> >> Since you us

Re: Side output from Flink Sink

2019-11-24 Thread vino yang
strategies? > > I know the Sink is usually the "last" portion of a data stream as its name > indicates, but I was wondering if for some reason something can't be sinked > (after retries, etc), what is the usual way to deal with such cases? > > Thanks again for your

Re: Side output from Flink Sink

2019-11-24 Thread vino yang
Hi Victor, Firstly, you can get your side output stream via OutputTag. Please refer to the official documentation[1]. Then, specify a sink for your side output stream. Of course, you can specify a Kafka sink. Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_

Re: flink session cluster ha on k8s

2019-11-24 Thread vino yang
Hi 祥才, You can refer to the reply of this old thread[1]. Best, Vino [1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Re-how-to-setup-a-ha-flink-cluster-on-k8s-td31089.html 曾祥才 于2019年11月25日周一 上午9:28写道: > hi, is there any example about ha on k8s for flink sessi

Re: How to submit two jobs sequentially and view their outputs in .out file?

2019-11-24 Thread vino yang
Hi Komal, Since you use the Flink standalone deployment mode, the tasks of the jobs which print information to the STDOUT may randomly deploy in any task manager of the cluster. Did you check other Task Managers out file? Best, Vino Komal Mariam 于2019年11月22日周五 下午6:59写道: > Dear all, > > Thank y

Re: Flink on YARN: frequent "container released on a *lost* node"

2019-11-21 Thread vino yang
Hi Amran, Did you monitor or have a look at your memory metrics(e.g. full GC) of your TM. There is a similar thread that a user reported the same question due to full GC, the link is here[1]. Best, Vino [1]: http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/%3cfa4068d3-2696-4f29-8

Re: Retrieving Flink job ID/YARN Id programmatically

2019-11-21 Thread vino yang
al job ID (from webUI): 1357d21be640b6a3b8a86a063f4bba8a > > > > > > > > On Thu, Nov 7, 2019 at 6:56 PM vino yang wrote: > > > > > > Hi Lei Nie, > > > > > > You can use > `StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID`

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
er. > > Best, > Jingsong Lee > > > > On Thu, Nov 21, 2019 at 2:25 PM vino yang wrote: > >> Hi Piper, >> >> The understanding of two deploy modes For Flink on Yarn is right. >> >> AFAIK, The single job (job cluster) mode is more popular th

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
i.e. Flink will > have persistent TMs/containers and request YARN for more TMs/containers > when needed (or release TMs/containers back to YARN). > > Thank you, > > Piper > > On Wed, Nov 20, 2019 at 9:39 PM vino yang wrote: > >> Hi Piper, >> >> C

Re: Completed job wasn't saved to archive

2019-11-20 Thread vino yang
If everything is OK(your config options about archive dir and history server is correct), Flink should archive the completed job. You said you did not find any exceptions in the log about failing to archive. But any other exceptions? Can you share the logs about your scene? Best, Vino Pavel Pots

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
Hi Piper, Can you share more reason and details of your requirements. Best, Vino Piper Piper 于2019年11月21日周四 上午5:48写道: > Hi, > > How can I make Flink's Resource Manager request YARN to spin up new (or > destroy/reclaim existing) TaskManagers in YARN containers? > > Preferably at runtime (i.e. d

Re: [ANNOUNCE] Launch of flink-packages.org: A website to foster the Flink Ecosystem

2019-11-19 Thread vino yang
Hi Robert, Just added it under the "Tools" category[1]. [1]: https://flink-packages.org/packages/kylin-flink-cube-engine Best, Vino Robert Metzger 于2019年11月19日周二 下午4:33写道: > Thanks. > You can add Kylin whenever you think it is ready. > > On Tue, Nov 19, 2019 at 9:

Re: [ANNOUNCE] Launch of flink-packages.org: A website to foster the Flink Ecosystem

2019-11-19 Thread vino yang
Thanks Robert. Great job! The web site looks great. In the future, we can also add my Kylin Flink cube engine[1] to the ecosystem projects list. [1]: https://github.com/apache/kylin/tree/engine-flink Best, Vino Oytun Tez 于2019年11月19日周二 上午12:09写道: > Congratulations! This is exciting. > > > --

Re: Flink configuration at runtime

2019-11-18 Thread vino yang
Hi Amran, Change the config option at runtime? No, Flink does not support this feature currently. However, for Flink on Yarn job cluster mode, you can specify different config options for different jobs via program or flink-conf.yaml(copy a new flink binary package then change config file). Best

Re: how to setup a ha flink cluster on k8s?

2019-11-16 Thread vino yang
Hi Rock, I searched by Google and found a blog[1] talk about how to config JM HA for Flink on k8s. Do not know whether it suitable for you or not. Please feel free to refer to it. Best, Vino [1]: http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ Rock 于2019年11月16日周六 上

Re: slow checkpoints

2019-11-15 Thread vino yang
Hi Yubraj, So the frequent job failure is the root reason, you need to fix it. Yes, when too many messages are squashed into the message system. If the messages can not be consumed normally, there would exist catchup consuming which will cause your streaming system more pressure than usual. Best,

Re: Flink on Yarn resource arrangement

2019-11-13 Thread vino yang
Hi Alex, Which Flink version are you using? AFAIK, since Flink 1.8+, the config option: "-yn" for Flink on YARN job cluster mode does not take effect(always 1 and would be overridden). So, the config option "-ys" and "-p" will decide the number of TM. The first example: -p(20)/-ys(3) should be

Re: Initialization of broadcast state before processing main stream

2019-11-13 Thread vino yang
Hi Vasily, Currently, Flink did not do the coordination between a general stream and broadcast stream, they are both streams. Your scene of using the broadcast state is a special one. In a more general scene, the states need to be broadcasted is an unbounded stream, the state events may be broadca

Re: Flink (Local) Environment Thread Leaks?

2019-11-13 Thread vino yang
Hi Theo, If you think there is a thread leakage problem. You can create a JIRA issue and write a detailed description. Ping @Gary Yao and @Zhu Zhu to help to locate and analyze this problem? Best, Vino Theo Diefenthal 于2019年11月14日周四 上午3:16写道: > I included a Solr End2End test in my project,

Re: Document an example pattern that makes sources and sinks pluggable in the production code for testing

2019-11-11 Thread vino yang
Hi Hung, Your suggestion is reasonable. Giving an example of a pluggable source and sink can make it more user-friendly, you can open a JIRA issue to see if there is anyone who wants to improve this. IMO, it's not very difficult to implement it. Because the source and sink in Flink has two unifie

Re: Job Distribution Strategy On Cluster.

2019-11-11 Thread vino yang
Hi srikanth, What's your job's parallelism? In some scenes, many operators are chained with each other. if it's parallelism is 1, it would just use a single slot. Best, Vino srikanth flink 于2019年11月6日周三 下午10:03写道: > Hi there, > > I'm running Flink with 3 node cluster. > While running my jobs(

Re: static table in flink

2019-11-09 Thread vino yang
Hi Jaqie, If I understand your question correctly, it seems you are finding a solution about the Stream table and Dim table(you called static table) join. There were many users who asked this question. Linked some reply here[1][2] to let you consider. Best, Vino [1]: http://apache-flink-mailing

Re: Retrieving Flink job ID/YARN Id programmatically

2019-11-07 Thread vino yang
Hi Lei Nie, You can use `StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID` to get the job id. Best, Vino Lei Nie 于2019年11月8日周五 上午8:38写道: > Hello, > I am currently executing streaming jobs via StreamExecutionEnvironment. Is > it possible to retrieve the Flink job ID/YARN ID within

Re: flink's hard dependency on zookeeper for HA

2019-11-07 Thread vino yang
Hi Vishwas, In the standalone cluster HA mode, Flink heavily depends on ZooKeeper. Not only for leader election, but also for: - Checkpoint metadata info; - JobGraph store; - So you should make sure your ZooKeeper Cluster works normally. More details please see[1][2]. Best, Vino

Re: When using udaf, the startup job has a “Cannot determine simple type name 'com' ” exception(Flink version 1.7.2)

2019-11-06 Thread vino yang
Hi mailtolrl, Can you share more context about your program and UDAF. Best, Vino mailtolrl 于2019年11月7日周四 下午3:05写道: > My flink streaming job use a udaf, set 60 parallelisms,submit job in yarn > cluster mode,and then happens every time I start. > > > >

Re: What is the slot vs cpu ratio?

2019-11-06 Thread vino yang
Hi srikanth, Referred from the official document: "Each Flink TaskManager provides processing slots in the cluster. The number of slots is typically proportional to the number of available CPU cores of each TaskManager. As a general recommendation, the number of available CPU cores is a good defa

Re: RocksDB and local file system

2019-11-06 Thread vino yang
Hi Jaqie, For testing, you can use the local file system pattern (e.g. "file:///"). Technically speaking, it's OK to specify the string path provided by you. However, in the production environment, we do not recommend using the local file system. Because it does not provide high availability. Be

Re: Limit max cpu usage per TaskManager

2019-11-06 Thread vino yang
Hi Lu, When using Flink on YARN, it will rely on YARN's resource management capabilities, and Flink cannot currently limit CPU usage. Also, what version of Flink do you use? As far as I know, since Flink 1.8, the -yn parameter will not work. Best, Vino Lu Niu 于2019年11月6日周三 下午1:29写道: > Hi, > >

Re: Partitioning based on key flink kafka sink

2019-11-06 Thread vino yang
Hi Vishwas, You should pay attention to the other args. The constructor provided by you has a `KeyedSerializationSchema` arg, while the comments of the constructor which made you confused only has a `SerializationSchema` arg. That's their difference. Best, Vino Vishwas Siravara 于2019年11月6日周三 上

Re: Checkpoint in FlinkSQL

2019-11-04 Thread vino yang
Hi Simon, Absolutely, yes. Before using Flink SQL, you need to initialize a StreamExecutionEnvirnoment instance[1], then call StreamExecutionEnvirnoment#setStateBackend or StreamExecutionEnvirnoment#enableCheckpointing to specify the information what you want. [1]: https://ci.apache.org/projects/

Re: Async operator with a KeyedStream

2019-11-01 Thread vino yang
Hi Bastien, Your analysis of using KeyedStream in Async I/O is correct. It will not figure out the key. In your scene, the good practice about interacting with DB is async I/O + thread pool[1] + connection Pool. You can use a connection pool to reuse and limit the mysql connection. Best, Vino

Re: Stateful functions presentation code (UI part)

2019-10-31 Thread vino yang
Hi Flavio, Please see this link.[1] Best, Vino [1]: https://github.com/ververica/stateful-functions/tree/master/stateful-functions-examples/stateful-functions-ridesharing-example Flavio Pompermaier 于2019年10月31日周四 下午4:53写道: > Hi to all, > yould it be possible to provide also the source code of

Re: Sending custom statsd tags

2019-10-31 Thread vino yang
Hi Prakhar, You need to customize StatsDReporter[1] in the Flink source. If you want to flexibly get configurable tags from the configuration file[2], you can refer to the implementation of DatadogHttpReporter#open[3] (for reference only how to get the tag). Best, Vino [1]: https://github.com/a

Re: Flink S3 error

2019-10-30 Thread vino yang
Hi Harrison, So did you check whether the file exists or not? And what's your question? Best, Vino Harrison Xu 于2019年10月31日周四 上午5:24写道: > I'm seeing this exception with the S3 uploader - it claims a previously > part file was not found. Full jobmanager logs attached. (Flink 1.8) > > java.io.Fi

Re: flink run jar throw java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

2019-10-30 Thread vino yang
Hi Franke, >From the information provided by Alex: >> mvn build jar include com.mysql.jdbc.Driver. it seems he has packaged a fat jar? Best, Vino Jörn Franke 于2019年10月30日周三 下午2:47写道: > > > You can create a fat jar (also called Uber jar) that includes all > dependencies in your application ja

  1   2   3   4   5   >