Re: AllwindowStream and RichReduceFunction

2020-07-27 Thread Aljoscha Krettek
I think that should work with an aggregate() instead of reduce(). Best, Aljoscha On 24.07.20 17:02, Flavio Pompermaier wrote: In my reduce function I want to compute some aggregation on the sub-results of a map-partition (that I tried to migrate from DataSet to DataStream without success). The

Re: improve the performance of flink sql job which lookup 40+ table

2020-07-27 Thread Jark Wu
Hi, Yes, currently, multiple lookup join is not parallel and execute one by one. Async lookup + cache is the suggested way to improve performance. If the lookup tables are not large, you can also implement a ALL cache for the LookupTableSource to cache all the data in the database, and reload peri

Re: Is there a way to use stream API with this program?

2020-07-27 Thread Piotr Nowojski
MAX_WATERMARK should be emitted automatically by the WatermarkAssignerOperator. Piotrek pon., 27 lip 2020 o 09:16 Flavio Pompermaier napisał(a): > Yes it could..where should I emit the MAX_WATERMARK and how do I detect > that the input reached its end? > > On Sat, Jul 25, 2020 at 8:08 PM David

Re: How to stream CSV from S3?

2020-07-27 Thread Jingsong Li
Hi John, Do you mean you want to read S3 CSV files using partition/bucket pruning? If just using the DataSet API, you can use CsvInputFormat to read csv files. If you want to use Table/Sql API, In 1.10, Csv format in table not support partitioned table. So the only way is specific the partition/

improve the performance of flink sql job which lookup 40+ table

2020-07-27 Thread snack white
HI: My flink version is 1.10 use per-job mode , my sql like ``` select column1, t2.xx2, t3.xx3,t4.xx4 … t40.xx40 from main_table left join lookup_1 FOR SYSTEM_TIME AS OF t1.proc_time AS t2 on t1.xx= t2.xx left join lookup_2 FOR SYSTEM_TIME AS OF t1.proc_time AS t3 on t1.xx=

答复: Flink Session TM Logs

2020-07-27 Thread 范超
Hi Rechard Maybe you can try using cli “yarn logs –applicationId yourYarnAppId” to check your logs or just to find your app logs in the yarn webui 发件人: Richard Moorhead [mailto:richard.moorh...@gmail.com] 发送时间: 2020年7月24日 星期五 23:55 收件人: user 主题: Flink Session TM Logs When running a flink session

Unable to submit high parallelism job in cluster

2020-07-27 Thread Annemarie Burger
Hi, I am running Flink on a cluster with 24 workers, each with 16 cores. Starting the cluster works fine and the Web interface confirms there are 384 slots working. Executing my code with parallelism 24 works fine, but when I try a higher parallelism, eg. 384, the job never succeeds in submitting.

Re: Count of records coming into the Flink App from KDS and at each step through Execution Graph

2020-07-27 Thread Vijay Balakrishnan
Hi Al, I am looking at the Custom User Metrics to count incoming records by an incomng attribute, event_name and aggregate it over 5 secs. I looked at https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter . I am trying to figure out which one to use Counter or Mete

Re: Count of records coming into the Flink App from KDS and at each step through Execution Graph

2020-07-27 Thread Vijay Balakrishnan
Hi David, Thanks for your reply. I am already using the PrometheusReporter. I am trying to figure out how to dig into the application data and count grouped by an attribute called event_name in the incoming application data and report to Grafana via Prometheus. I see the following at a high level

[ANNOUNCE] Weekly Community Update 2020/29-30

2020-07-27 Thread Konstantin Knauf
Dear community, happy to share an update for the last two weeks with the release of Apache Flink 1.11.1, planning for Flink 1.12, a proposal for better interoperability with Microsoft Azure services, a few blog posts and more. Flink Development == * [releases] Flink 1.11.1 was releas

JobManager refusing connections when running many jobs in parallel?

2020-07-27 Thread Hailu, Andreas
Hi team, We've observed that when we submit a decent number of jobs in parallel from a single Job Master, we encounter job failures due with Connection Refused exceptions. We've seen this behavior start at 30 jobs running in parallel. It's seemingly transient, however, as upon several retries t

Re:  problem with build from source flink 1.11

2020-07-27 Thread Timo Walther
Great to hear. Thanks for letting us know. Regards, Timo On 27.07.20 17:58, Felipe Lolas wrote: Seems fixed! I was replacing only flink-dist.jar. When replaced all the compiled jar's from flink-1.1.0-bin fixed the issue. Thanks! El 27 de julio de 2020 4:28, Felipe Lolas escribió: Hi!! T

Re: problem with build from source flink 1.11

2020-07-27 Thread Felipe Lolas
Seems fixed! I was replacing only flink-dist.jar. When replaced all the compiled jar's from flink-1.1.0-bin fixed the issue. Thanks! El 27 de julio de 2020 4:28, Felipe Lolas escribió: Hi!! Timo and Chesnay: Thanks for helping!!! Here is the full stack trace: 2020-07-27 05:27:38,661 INFO

Re: problem with build from source flink 1.11

2020-07-27 Thread Felipe Lolas
Hi!! Timo and Chesnay: Thanks for helping!!! Here is the full stack trace: 2020-07-27 05:27:38,661 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job insert-into_default_catalog.default_database.print_table (ca40bd10a729f5cad56a7db6bef17a6f) switched from state FAILIN

How to stream CSV from S3?

2020-07-27 Thread John Smith
Hi, using Flink 1.10 1- How do we go about reading CSV files that are copied to s3 buckets? 2- Is there a source that can tail S3 and start reading a CSV when it is copied to S3? 3- Is that part of the table APIs?

Re: Flink 1.11.0 UDAF fails with "No match found for function signature fun()"

2020-07-27 Thread Timo Walther
Hi Dmytro, aggregate functions will support the new type system in Flink 1.12. Until then, they cannot be used with the new `call()` syntax as anonymous functions. In order to use the old type system, you need to register the function explicilty using SQL `CREATE FUNCTION a AS 'myFunc'` and t

Flink 1.11.0 UDAF fails with "No match found for function signature fun()"

2020-07-27 Thread Dmytro Dragan
Hi All, I see strange behavior of UDAF functions: Let`s say we have a simple table: EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build(); TableEnvironment t = TableEnvironment.create(settings); Table table = t.fromValues(DataTypes.ROW(

Re: problem with build from source flink 1.11

2020-07-27 Thread Chesnay Schepler
@Timo maven 3.2.5 is the recommended Maven version for building Flink. @Felipe Can you provide us the full stacktrace? This could be a library issue in regards to JDK compatibility. On 27/07/2020 15:23, Timo Walther wrote: Hi Felipe, are you sure that Maven and the TaskManagers are using the

Re: problem with build from source flink 1.11

2020-07-27 Thread Timo Walther
Hi Felipe, are you sure that Maven and the TaskManagers are using the JDK version that you mentioned? Usually, a `mvn clean install` in the `.../flink/` directory should succeed without any problems. Also your Maven version seems pretty old. I'm using Apache Maven 3.6.3 for example. The No

来自kandy.wang的邮件

2020-07-27 Thread kandy.wang

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread Xingbo Huang
Hi, You can use the `set_python_requirements` method to specify your requirement.txt which you can refer to the documentation[1] for details [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html#python-dependency Best, Xingbo rookieCOder 于2

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread rookieCOder
Hi, And I've got another question. If I use user-defined function in pyflink, which only depends library A. And what the flink does is using the udf in tables. Does that mean I only need to install library A on the slaves? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.na

Re: Parquet batch table sink in Flink 1.11

2020-07-27 Thread Flavio Pompermaier
I think that's not true when you need to integrate Flink into an existing data-lake..I think it should be very straightforward (in my opinion) to read/ write Parquet data with objects serialized with avro/thrift/protobuf...or at least reuse hadoop input/output formats with table API. At the moment

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread Xingbo Huang
Yes. You are right. Best, Xingbo rookieCOder 于2020年7月27日周一 下午6:30写道: > Hi, Xingbo > Thanks for your reply. > So the point is that simply link the source or the sink to the master's > local file system will cause the error that the slaves cannot read the > source/sink files? Thus the simplest so

答复: How to get CLI parameters when deploy on yarn cluster

2020-07-27 Thread 范超
Thanks Jake , I’ll try it out. It worked! 发件人: Jake [mailto:ft20...@qq.com] 发送时间: 2020年7月27日 星期一 18:33 收件人: 范超 抄送: user (user@flink.apache.org) 主题: Re: How to get CLI parameters when deploy on yarn cluster Hi fanchao You can use params after jar file. /usr/local/flink/bin/flink run -m yarn-cl

Re: How to get CLI parameters when deploy on yarn cluster

2020-07-27 Thread Jake
Hi fanchao You can use params after jar file. /usr/local/flink/bin/flink run -m yarn-cluster -d -p 3 ~/project/test/app/test.jar param1 param2 param3 https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/yarn_setup.html

Re: Is outputting from components other than sinks or side outputs a no-no ?

2020-07-27 Thread Tom Fennelly
No, I think David answered the specific question that I asked i.e. is it okay (or not) for operators other than sinks and side outputs to do I/O. Purging DLQ entries is something we'll need to be able to do anyway (for some scenarios - aside from successful checkpoint retries) and I specifically wa

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread rookieCOder
Hi, Xingbo Thanks for your reply. So the point is that simply link the source or the sink to the master's local file system will cause the error that the slaves cannot read the source/sink files? Thus the simplest solution is to make sure that slaves have access to the master's local filesystem (by

How to get CLI parameters when deploy on yarn cluster

2020-07-27 Thread 范超
Hi, Flink community I’m starter at Flink ,and don’t know how to passing parameters to my jar file, where I want to start the job in detached mode on the yarn cluster. Here is my shell code: /usr/local/flink/bin/flink run -m yarn-cluster -d -p 3 ~/project/test/app/test.jar -runat=test 2>&1 In m

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread Xingbo Huang
Hi rookieCOder, You need to make sure that your files can be read by each slaves, so an alternative solution is to put your files on hdfs Best, Xingbo rookieCOder 于2020年7月27日周一 下午5:49写道: > 'm coding with pyflink 1.10.0 and building cluster with flink 1.10.0 > I define the source and the sink as

problem with build from source flink 1.11

2020-07-27 Thread Felipe Lolas
Hi, Im Felipe, just started learning flink a few weeks ago(moving spark streaming workloads). Now, I currently testing some changes into flink-yarn, but when using my builded flink-dist.jar, the Job in TaskManager fails because of: java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)

Pyflink 1.10.0 issue on cluster

2020-07-27 Thread rookieCOder
'm coding with pyflink 1.10.0 and building cluster with flink 1.10.0 I define the source and the sink as following: When I run this code only on master, it's OK. When I run this code o

Re: Kafka connector with PyFlink

2020-07-27 Thread Timo Walther
Hi Dian, we had this discussion in the past. Yes, it might help in certain cases. But on the other hand also helps in finding version mismatches when people misconfigured there dependencies. Different JVM versions should not result incompatible classes as the default serialVersionUID is stand

Re: Flink 1.11 Simple pipeline (data stream -> table with egg -> data stream) failed

2020-07-27 Thread Timo Walther
Hi Dmytro, one major difference between legacy and Blink planner is that the Blink planner is not build on top of DataStream API. It uses features of lower levels (StreamOperator, Transformation). In the mid-term we want to remove the check and make Table API and DataStream API 100% back and

Re: Is outputting from components other than sinks or side outputs a no-no ?

2020-07-27 Thread Stephen Connolly
I am not 100% certain that David is talking about the same pattern of usage that you are Tom. David, the pattern Tom is talking about is something like this... try { do something with record } catch (SomeException e) { push record to DLQ } My concern is that if we have a different failure,

Re: Unable to deduce RocksDB api calls in streaming.

2020-07-27 Thread Timo Walther
Hi Aviral, as far as I know we are not calling RocksDB API to perform snapshots. As the Stackoverflow answer also indicates most of the snapshotting is done outside of RocksDB by just dealing with the SST files. Have you checked the available metrics in the web UI? https://ci.apache.org/proj

Re: Kafka connector with PyFlink

2020-07-27 Thread Timo Walther
Hi, the InvalidClassException indicates that you are using different versions of the same class. Are you sure you are using the same Flink minor version (including the Scala suffix) for all dependencies and Kubernetes? Regards, Timo On 27.07.20 09:51, Wojciech Korczyński wrote: Hi, when

Re: Flink Session TM Logs

2020-07-27 Thread Yang Wang
Just share another method about how to access the finished TaskManager logs on Yarn. Currently, only when a Yarn application finished/failed/killed, the logs will be aggregated to HDFS. That means if the Flink application is still running, you could still use the Yarn NodeManager webUI to access t

Re: Kafka connector with PyFlink

2020-07-27 Thread Wojciech Korczyński
Hi, when I try it locally it runs well. The problem is when I run it using Kubernetes. I don't know how to make Flink and Kubernetes go well together in that case. Best, Wojtek pt., 24 lip 2020 o 17:51 Xingbo Huang napisał(a): > Hi Wojciech, > In many cases, you can make sure that your code ca

Re: Is outputting from components other than sinks or side outputs a no-no ?

2020-07-27 Thread Tom Fennelly
Thank you David. In the case we have in mind it should only happen literally on the very rare Exception i.e. in some cases if somehow an uncaught exception occurs, we want to send the record to a DLQ and handle the retry manually Vs checkpointing and restarting. Regards, Tom. On Sun, Jul 26, 2

Re: Is there a way to use stream API with this program?

2020-07-27 Thread Flavio Pompermaier
Yes it could..where should I emit the MAX_WATERMARK and how do I detect that the input reached its end? On Sat, Jul 25, 2020 at 8:08 PM David Anderson wrote: > In this use case, couldn't the custom trigger register an event time timer > for MAX_WATERMARK, which would be triggered when the bounde