Re: Avro format in pyFlink

2020-08-10 Thread Xingbo Huang
Hi Rodrigo, Flink doesn't support an avro uber jar, so you need to add all dependency jars manually, such as avro, jackson-core-asl, jackson-mapper-asl and joda-time in release-1.11. However, I found that there was a JIRA[1] that provided a default version of avro uber jar a few days ago. [1] ht

Re: Avro format in pyFlink

2020-08-13 Thread Xingbo Huang
Hi Rodrigo, For the connectors, Pyflink just wraps the java implementation. And I am not an expert on Avro and corresponding connectors, but as far as I know, DataTypes really cannot declare the type of union you mentioned. Regarding the bytes encoding you mentioned, I actually have no good suggest

Re: How to write a customer sink partitioner when using flinksql kafka-connector

2020-08-18 Thread Xingbo Huang
Hi Lei, If you want to write your custom partitioner, I think you can refer to the built-in FlinkFixedPartitioner[1] [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/partitioner/FlinkFixedPartitio

Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-25 Thread Xingbo Huang
Thanks Zhu for the great work and everyone who contributed to this release! Best, Xingbo Guowei Ma 于2020年8月26日周三 下午12:43写道: > Hi, > > Thanks a lot for being the release manager Zhu Zhu! > Thanks everyone contributed to this! > > Best, > Guowei > > > On Wed, Aug 26, 2020 at 11:18 AM Yun Tang wr

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Xingbo Huang
Congratulations Dian! Best, Xingbo jincheng sun 于2020年8月27日周四 下午5:24写道: > Hi all, > > On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now part > of the Apache Flink Project Management Committee (PMC). > > Dian Fu has been very active on PyFlink component, working on various >

Re: PyFlink cluster runtime issue

2020-08-28 Thread Xingbo Huang
Hi Manas, I think you forgot to add kafka jar[1] dependency. You can use the argument -j of the command line[2] or the Python Table API to specify the jar. For details about the APIs of adding Java dependency, you can refer to the relevant documentation[3] [1] https://ci.apache.org/projects/flink

Re: PyFlink cluster runtime issue

2020-08-29 Thread Xingbo Huang
a > pyFlink job is through the command line right? Can I do that through the > GUI? > > On Fri, Aug 28, 2020 at 8:17 PM Xingbo Huang wrote: > >> Hi Manas, >> >> I think you forgot to add kafka jar[1] dependency. You can use the >> argument -j of the comm

Re: PyFlink - detect if environment is a StreamExecutionEnvironment

2020-09-01 Thread Xingbo Huang
Hi Manas, When running locally, you need `ten_sec_summaries.get_job_client().get_job_execution_result().result()` to wait job finished. However, when you submit to the cluster, you need to delete this code. In my opinion, the current feasible solution is that you prepare two sets of codes for this

Re: PyFlink - detect if environment is a StreamExecutionEnvironment

2020-09-02 Thread Xingbo Huang
ossible to detect the environment programmatically. >> >> Regards, >> Manas >> >> On Wed, Sep 2, 2020 at 7:32 AM Xingbo Huang wrote: >> >>> Hi Manas, >>> >>> When running locally, you need >>> `ten_sec_summaries.get_job_clien

Re: [PyFlink] register udf functions with different versions of the same library in the same job

2020-10-11 Thread Xingbo Huang
Hi, I will do my best to provide pyflink related content, I hope it helps you. >>> each udf function is a separate process, that is managed by Beam (but I'm not sure I got it right). Strictly speaking, it is not true that every UDF is in a different python process. For example, the two python fu

Re: PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-12 Thread Xingbo Huang
Hi, You can use api to set configuration: table_env.get_config().get_configuration().set_string("taskmanager.memory.task.off-heap.size", '80m') The flink-conf.yaml way will only take effect when submitted through flink run, and the minicluster way(python xxx.py) will not take effect. Best, Xingb

Re: PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-13 Thread Xingbo Huang
ou think that it looks like > an issue ? > > Thx ! > > вт, 13 окт. 2020 г. в 05:35, Xingbo Huang : > >> Hi, >> >> You can use api to set configuration: >> table_env.get_config().get_configuration().set_string("taskmanager.memory.task.off-heap.size", >

Re: why this pyflink code has no output?

2020-10-13 Thread Xingbo Huang
Hi, Which version of pyflink are you using? I think the api you are using is not the pyflink since flink 1.9. For detailed usage of pyflink, you can refer to doc[1] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table_api_tutorial.html Best, Xingbo 大森林 于2020年10月13日周

Re: Failing to create Accumulator over multiple columns

2020-10-25 Thread Xingbo Huang
Hi, I think you can directly declare `def accumulate(acc: MembershipsIDsAcc, value1: Long, value2: Boolean)` Best, Xingbo Rex Fenley 于2020年10月26日周一 上午9:28写道: > If I switch accumulate to the following: > def accumulate(acc: MembershipsIDsAcc, value: > org.apache.flink.api.java.tuple.Tuple2[java.

Re: Flink Kafka Table API for python with JAAS

2020-11-10 Thread Xingbo Huang
Hi, You can use the following API to add all the dependent jar packages you need: table_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar") For more related content, you can refer to the pyflink doc[1] [1] https://ci.

Re: Concise example of how to deploy flink on Kubernetes

2020-11-20 Thread Xingbo Huang
Hi George, Have you referred to the official document[1]? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html Best, Xingbo 在 2020年11月21日星期六,George Costea 写道: > Hi there, > > Is there an example of how to deploy a flink cluster on Kubernetes? > I'd lik

Re: Filesystem as a stream source in Table/SQL API

2020-11-22 Thread Xingbo Huang
Hi Kai, I took a look at the implementation of the filesystem connector. It will decide which files to read at startup and won't change during running. If you want to need this function, you may need to customize a new connector. Best, Xingbo eef hhj 于2020年11月21日周六 下午2:38写道: > Hi, > > I'm faci

Re: PyFlink Table API and UDF Limitations

2020-11-25 Thread Xingbo Huang
Hi Niklas, Regarding `Exception in thread "grpc-nio-worker-ELG-3-2" java.lang.NoClassDefFoundError: org/apache/beam/vendor/grpc/v1p26p0/io/netty/buffer/PoolArena$1`, it does not affect the correctness of the result. The reason is that some resources are released asynchronously when Grpc Server is

Re: PyFlink Table API and UDF Limitations

2020-11-26 Thread Xingbo Huang
> Information Art. 13 DSGVO B2B: > Für die Kommunikation mit Ihnen verarbeiten wir ggf. > Ihre personenbezogenen Daten. > Alle Informationen zum Umgang mit Ihren Daten finden Sie unter > https://www.uniberg.com/impressum.html. > > On 26. Nov 2020, at 02:59, Xingbo H

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-11-27 Thread Xingbo Huang
Hi Pierre, Sorry for the late reply. Your requirement is that your `Table` has a `field` in `Json` format and its key has reached 100k, and then you want to use such a `field` as the input/output of `udf`, right? As to whether there is a limit on the number of nested key, I am not quite clear.

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-11-30 Thread Xingbo Huang
ch-schema-to-a-flink-datastream-on-the-fly > [2] > https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-schema-registry.html > > Le sam. 28 nov. 2020 à 04:04, Xingbo Huang a écrit : > >> Hi Pierre, >> >> Sorry for the late reply. >> Your req

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-01 Thread Xingbo Huang
.f2), source.f3) result.execute_insert("print_table") if __name__ == '__main__': test() Best, Xingbo Pierre Oberholzer 于2020年12月1日周二 下午6:10写道: > Hi Xingbo, > > That would mean giving up on using Flink (table) features on the content > of the parsed J

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-02 Thread Xingbo Huang
> that could be used ? > > Thanks a lot ! > > Best, > > > Le mer. 2 déc. 2020 à 04:50, Xingbo Huang a écrit : > >> Hi Pierre, >> >> I wrote a PyFlink implementation, you can see if it meets your needs: >> >> >> from pyflink.datastream impo

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-02 Thread Xingbo Huang
ding benchmarks > vs. other frameworks) are also highly welcome. > > Thanks ! > > Best > > > Le jeu. 3 déc. 2020 à 02:53, Xingbo Huang a écrit : > >> Hi Pierre, >> >> This example is written based on the syntax of release-1.12 that is about >> to be re

Re: Python UDF filter problem

2020-12-08 Thread Xingbo Huang
Hi, This problem has been fixed[1] in 1.12.0,1.10.3,1.11.3, but release-1.11.3 and release-1.12.0 have not been released yet (VOTE has been passed). I run your job in release-1.12, and the plan is correct. [1] https://issues.apache.org/jira/browse/FLINK-19675 Best, Xingbo László Ciople 于2020年

Re: how to register TableAggregateFunction?

2020-12-08 Thread Xingbo Huang
Hi, As far as I know, TableAggregateFunction is not supported yet in batch mode[1]. You can try to use it in stream mode. [1] https://issues.apache.org/jira/browse/FLINK-10978 Best, Xingbo Leonard Xu 于2020年12月8日周二 下午6:05写道: > Hi, appleyuchi > > Sorry for the late reply, > but could you descr

Re: A group window expects a time attribute for grouping in a stream environment.THANKS for your help

2020-12-09 Thread Xingbo Huang
Hi, Your code does not show how to create the Table of `Orders`. For how to specify the time attribute according to DDL, you can refer to the official document[1]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html Best, Xingbo Appleyuchi 于2020年12月9日周三

Re: Is working with states supported in pyflink1.12?

2020-12-15 Thread Xingbo Huang
Hi, As Chesnay said, PyFlink has already supported Python DataStream stateless APIs so that users are able to perform some basic data transformations, but doesn't provide state access support yet in release-1.12. The proposal[1] of enhancing the API with state access has been made and related disc

Re: Pyflink UDF with ARRAY as input

2020-12-17 Thread Xingbo Huang
Hi Torben, It is indeed a bug, and I have created a JIRA[1]. The work around solution is to use the index to solve (written in release-1.12): env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) t_env = StreamTableEnvironment.create(env, environment_set

Re: PyFflink UDF Permission Denied

2020-12-27 Thread Xingbo Huang
Hi Andrew, According to the error, you can try to check the file permission of "/test/python-dist-78654584-bda6-4c76-8ef7-87b6fd256e4f/python-files/site-packages/site-packages/pyflink/bin/pyflink-udf-runner.sh" Normally, the permission of this script would be -rwxr-xr-x Best, Xingbo Andrew Kram

Re: PyFflink UDF Permission Denied

2020-12-28 Thread Xingbo Huang
a python process that executes python udf code. If you don’t use python udf, the contents of the execution are all running on the JVM. Best, Xingbo Andrew Kramer 于2020年12月28日周一 下午8:29写道: > Hi Xingbo, > > That file does not exist on the file system. > > Thanks, > Andrew > > On

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-03 Thread Xingbo Huang
@Gordon Thanks a lot for the release and for being the release manager. And thanks to everyone who made this release possible! Best, Xingbo Till Rohrmann 于2021年1月3日周日 下午8:31写道: > Great to hear! Thanks a lot to everyone who helped make this release > possible. > > Cheers, > Till > > On Sat, Jan

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-14 Thread Xingbo Huang
Hi meneldor, I guess Shuiqiang is not using the pyflink 1.12.0 to develop the example. The signature of the `process_element` method has been changed in the new version[1]. In pyflink 1.12.0, you can use `collector`.collect to send out your results. [1] https://issues.apache.org/jira/browse/FLINK

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-18 Thread Xingbo Huang
Hi Shuiqiang, meneldor, 1. In fact, there is a problem with using Python `Named Row` as the return value of user-defined function in PyFlink. When serializing a Row data, the serializer of each field is consistent with the order of the Row fields. But the field order of Python `Named Ro

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-18 Thread Xingbo Huang
Context')? Can i access the value as in > process_element() in the ctx for example? > > Thank you! > > On Mon, Jan 18, 2021 at 4:18 PM Xingbo Huang wrote: > >> Hi Shuiqiang, meneldor, >> >> 1. In fact, there is a problem with using Python `Named Row` as the >> re

Re: [ANNOUNCE] Apache Flink 1.12.1 released

2021-01-18 Thread Xingbo Huang
Thanks Xintong for the great work! Best, Xingbo Peter Huang 于2021年1月19日周二 下午12:51写道: > Thanks for the great effort to make this happen. It paves us from using > 1.12 soon. > > Best Regards > Peter Huang > > On Mon, Jan 18, 2021 at 8:16 PM Yang Wang wrote: > > > Thanks Xintong for the great wor

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-19 Thread Xingbo Huang
: > Thank you Xingbo! > > Do you plan to implement CoProcess functions too? Right now i cant find a > convenient method to connect and merge two streams? > > Regards > > On Tue, Jan 19, 2021 at 4:16 AM Xingbo Huang wrote: > >> Hi meneldor, >> >> 1. Yes. Al

Re: pyflink 使用udf函数 处理成json字符串发送到kafka的格式问题

2020-06-01 Thread Xingbo Huang
Hi, 其实这个是CSV connector的一个可选的 quote_character参数的效果,默认值是",所以你每个字段都会这样外面包一层,你可以配置一下这个参数为\0,就可以得到你要的效果了。 st_env.connect( Kafka() .version("0.11") .topic("logSink") .start_from_earliest() .property("zookeeper.connect", "localhost:2181")

Re: pyflink 使用udf函数 处理成json字符串发送到kafka的格式问题

2020-06-01 Thread Xingbo Huang
客气客气,互相交流学习😀 Best, Xingbo jack 于2020年6月1日周一 下午9:07写道: > 非常感谢解惑,刚开始使用pyflink,不是很熟悉,还请多多指教 > > > > > > > 在 2020-06-01 20:50:53,"Xingbo Huang" 写道: > > Hi, > 其实这个是CSV connector的一个可选的 > quote_character参数的效果,默认值是",所以你每个字段都会这样外面包一层,你可以配置一下这个参数为\0

Re: Flink Kafka connector in Python

2020-06-29 Thread Xingbo Huang
Hi Manas, Since Flink 1.9, the entire architecture of PyFlink has been redesigned. So the method described in the link won't work. But you can use more convenient DDL[1] or descriptor[2] to read kafka data. Besides, You can refer to the common questions about PyFlink[3] [1] https://ci.apache.org/

Re: Flink Kafka connector in Python

2020-07-02 Thread Xingbo Huang
5, in register_table_source > self._j_connect_table_descriptor.registerTableSource(name) > File > "/home/manas/anaconda3/envs/flink_env/lib/python3.7/site-packages/py4j/java_gateway.py", > line 1286, in __call__ > answer, self.gateway_client, self.target_id, self.na

Re: PyFlink Table API - "A group window expects a time attribute for grouping in a stream environment."

2020-07-13 Thread Xingbo Huang
Hi Manas, Maybe it is the bug of Java Descriptor. You can try the DDL[1] way which is the more recommended way [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/ Best, Xingbo Khachatryan Roman 于2020年7月13日周一 下午7:23写道: > Hi Manas, > > Do you have the same erro

Re: PyFlink Table API - "A group window expects a time attribute for grouping in a stream environment."

2020-07-14 Thread Xingbo Huang
do that. > @Xingbo Huang - okay, I didn't know DDL was the more > recommended way. > Please let me know if you confirm that this is a bug. > Thanks! > > On Mon, Jul 13, 2020 at 5:07 PM Xingbo Huang wrote: > >> Hi Manas, >> Maybe it is the bug of Java Descript

Re: pyFlink 1.11 streaming job example

2020-07-14 Thread Xingbo Huang
Hi Manas, I tested your code, but there are no errors. Because execute_sql is an asynchronous method, you need to await through TableResult, you can try the following code: from pyflink.datastream import StreamExecutionEnvironment, TimeCharacteristic from pyflink.table import StreamTableEnviron

Re: Pyflink JavaPackage Error

2020-07-14 Thread Xingbo Huang
Hi Jesse, For how to add jar packages, you can refer to the Common Questions doc[1] of PyFlink. PyFlink 1.10 and 1.11 have some differences in the way of adding jar packages which the document has a detailed introduction [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/p

Re: pyFlink 1.11 streaming job example

2020-07-14 Thread Xingbo Huang
he latest pyFlink 1.11 API? I couldn't find anything related to > awaiting asynchronous results. > > Thanks, > Manas > > On Tue, Jul 14, 2020 at 1:29 PM Xingbo Huang wrote: > >> Hi Manas, >> >> >> I tested your code, but there are no errors. Because

Re: Flink Kafka connector in Python

2020-07-14 Thread Xingbo Huang
uot;, >> "properties": { >> "monitorId": { >> "type": "string" >> }, >> "deviceId": { >> "type": "string" >> }, >>

Re: pyFlink UDTF function registration

2020-07-15 Thread Xingbo Huang
Hi Manas, You need to join with the python udtf function. You can try the following sql: ddl_populate_temporary_table = f""" INSERT INTO {TEMPORARY_TABLE} SELECT * FROM ( SELECT monitorId, featureName, featureData, time_st FROM {INPUT_TABLE}, LATERAL TABLE(split(data)) as T(feature

Re: Pyflink sink rowtime field

2020-07-16 Thread Xingbo Huang
Hi Jesse, I think that the type of rowtime you declared on the source schema is DataTypes.Timestamp(), you also use DataTypes.Timestamp() on the sink schema Best, Xingbo Jesse Lord 于2020年7月15日周三 下午11:41写道: > I am trying to sink the rowtime field in pyflink 1.10. I get the following > error > >

Re: Kafka connector with PyFlink

2020-07-23 Thread Xingbo Huang
Hi Wojtek, you need to use the fat jar 'flink-sql-connector-kafka_2.11-1.11.0.jar' which you can download in the doc[1] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html Best, Xingbo Wojciech Korczyński 于2020年7月23日周四 下午4:57写道: > Hello, > > I am tr

Re: Kafka connector with PyFlink

2020-07-23 Thread Xingbo Huang
ain(CliFrontend.java:992) > > Maybe the way the python program is written is incorrect. Can it be > deprecated taking into account that the installed flink version is 1.11? > > Best regards, > Wojtek > > czw., 23 lip 2020 o 12:01 Xingbo Huang napisał(a): > >> Hi Wojt

Re: Kafka connector with PyFlink

2020-07-24 Thread Xingbo Huang
.create_temporary_table('mySink') > > > t_env.scan('mySource') \ > .select('"flink_job_" + value') \ > .insert_into('mySink') > > t_env.execute("tutorial_job") > > I have installed PyFlink 1.11 s

Re: Kafka connector with PyFlink

2020-07-24 Thread Xingbo Huang
l.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.r

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread Xingbo Huang
Hi rookieCOder, You need to make sure that your files can be read by each slaves, so an alternative solution is to put your files on hdfs Best, Xingbo rookieCOder 于2020年7月27日周一 下午5:49写道: > 'm coding with pyflink 1.10.0 and building cluster with flink 1.10.0 > I define the source and the sink as

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread Xingbo Huang
Yes. You are right. Best, Xingbo rookieCOder 于2020年7月27日周一 下午6:30写道: > Hi, Xingbo > Thanks for your reply. > So the point is that simply link the source or the sink to the master's > local file system will cause the error that the slaves cannot read the > source/sink files? Thus the simplest so

Re: Pyflink 1.10.0 issue on cluster

2020-07-27 Thread Xingbo Huang
Hi, You can use the `set_python_requirements` method to specify your requirement.txt which you can refer to the documentation[1] for details [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html#python-dependency Best, Xingbo rookieCOder 于2

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-30 Thread Xingbo Huang
Hi Jincheng, Thanks a lot for bringing up this discussion and the proposal. Big +1 for improving the structure of PyFlink doc. It will be very friendly to give PyFlink users a unified entrance to learn PyFlink documents. Best, Xingbo Dian Fu 于2020年7月31日周五 上午11:00写道: > Hi Jincheng, > > Thanks

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread Xingbo Huang
ganized documentation will greatly improve the efficiency and >>>> > experience for developers. >>>> > >>>> > Best, >>>> > Shuiqiang >>>> > >>>> > Hequn Cheng 于2020年8月1日周六 上午8:42写道: >&g

Re: Python StreamExecutionEnvironment from_collection Kafka example

2021-03-15 Thread Xingbo Huang
Hi, >From the error message, I think the problem is no python interpreter on your TaskManager machine. You need to install a python 3.5+ interpreter on the TM machine, and this python environment needs to install pyflink (pip install apache-flink). For details, you can refer to the document[1]. [

Re: Got “pyflink.util.exceptions.TableException: findAndCreateTableSource failed.” when running PyFlink example

2021-03-15 Thread Xingbo Huang
Hi, The problem is that the legacy DataSet you are using does not support the FileSystem connector you declared. You can use blink Planner to achieve your needs. >>> t_env = BatchTableEnvironment.create( environment_settings=EnvironmentSettings.new_instance() .in_batch_mode().

Re: Can I use PyFlink together with PyTorch/Tensorflow/PyTorch

2021-03-15 Thread Xingbo Huang
Hi Yik San, Thanks for the investigation of PyFlink together with all these ML libs. IMO, you could refer to the flink-ai-extended project that supports the Tensorflow on Flink, PyTorch on Flink etc, whose repository url is https://github.com/alibaba/flink-ai-extended. Flink AI Extended is a proje

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

2021-03-19 Thread Xingbo Huang
Yes, you need to ensure that the key and value types of the Map are determined Best, Xingbo Yik San Chan 于2021年3月19日周五 下午3:41写道: > I got why regarding the simplified question - the dummy parser should > return key(string)-value(string), otherwise it fails the result_type spec > > On Fri, Mar 19

Re: [ANNOUNCE] Release 1.13.0, release candidate #0

2021-04-01 Thread Xingbo Huang
Hi Dawid, Thanks a lot for the great work! Regarding to the issue of flink-python, I have provided a quick fix and will try to fix it ASAP. Best, Xingbo Dawid Wysakowicz 于2021年4月2日周五 上午4:04写道: > Hi everyone, > As promised I created a release candidate #0 for the version 1.13.0. I am > not star

Re: Extract/Interpret embedded byte data from a record

2021-04-14 Thread Xingbo Huang
Hi Sumeet, Python Row-based operation will be supported in the releases-1.13. I guess you are looking at the code of the master branch. Since you are using the Python Table API, you can use python udf to parse your data. For the details of python UDF, you can refer to the doc[1]. [1] https://ci.a

Re: How to tell between a local mode run vs. remote mode run?

2021-05-05 Thread Xingbo Huang
Hi Yik San, You can check whether the execution environment used is `LocalStreamEnvironment` and you can get the class object corresponding to the corresponding java object through py4j in PyFlink. You can take a look at the example I wrote below, I hope it will help you ``` from pyflink.table impo

Re: Access Row fields by attribute name rather than by index in PyFlink TableFunction

2021-05-19 Thread Xingbo Huang
Hi Sumeet, Due to the limitation of the original PyFlink serializers design, there is no way to pass attribute names to Row in row-based operations. In release-1.14, I am reconstructing the implementations of serializers[1]. After completion, accessing attribute names of `Row` in row-based operati

Re: [VOTE] Release 1.13.1, release candidate #1

2021-05-27 Thread Xingbo Huang
+1 (non-binding) - verified checksums and signatures - built from source code - check apache-flink source/wheel package content - run python udf job Best, Xingbo Dawid Wysakowicz 于2021年5月27日周四 下午9:45写道: > +1 (binding) > >- verified signatures and checksums >- built from sources and run

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Xingbo Huang
Congratulations, Dian. Well deserved! Best, Xingbo Wei Zhong 于2020年1月16日周四 下午6:13写道: > Congrats Dian Fu! Well deserved! > > Best, > Wei > > 在 2020年1月16日,18:10,Hequn Cheng 写道: > > Congratulations, Dian. > Well deserved! > > Best, Hequn > > On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu wrote: > >

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Xingbo Huang
Hi Jincheng, Thanks for driving this. +1 for this proposal. Compared to building from source, downloading directly from PyPI will greatly save the development cost of Python users. Best, Xingbo Wei Zhong 于2020年2月4日周二 下午12:43写道: > Hi Jincheng, > > Thanks for bring up this discussion! > > +1

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-12 Thread Xingbo Huang
+1 (non-binding) - Install the PyFlink by `pip install` [SUCCESS] - Run word_count.py [SUCCESS] Thanks, Xingbo Becket Qin 于2020年2月12日周三 下午2:28写道: > +1 (binding) > > - verified signature > - Ran word count example successfully. > > Thanks, > > Jiangjie (Becket) Qin > > On Wed, Feb 12, 2020 at 1

Re: [ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-20 Thread Xingbo Huang
Thanks a lot for the release. Great Work, Jincheng! Also thanks to participants who contribute to this release. Best, Xingbo Till Rohrmann 于2020年2月18日周二 下午11:40写道: > Thanks for updating the 1.9.2 release wrt Flink's Python API Jincheng! > > Cheers, > Till > > On Thu, Feb 13, 2020 at 12:25 PM H

Re: [ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-20 Thread Xingbo Huang
ight documentation? > > Thanks! > -chad > > > On Thu, Feb 20, 2020 at 5:39 AM Xingbo Huang wrote: > >> Thanks a lot for the release. >> Great Work, Jincheng! >> Also thanks to participants who contribute to this release. >> >> Best, >> Xingbo

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread Xingbo Huang
Congratulations Jingsong! Well deserved. Best, Xingbo wenlong.lwl 于2020年2月21日周五 上午11:43写道: > Congrats Jingsong! > > On Fri, 21 Feb 2020 at 11:41, Dian Fu wrote: > > > Congrats Jingsong! > > > > > 在 2020年2月21日,上午11:39,Jark Wu 写道: > > > > > > Congratulations Jingsong! Well deserved. > > > > > >

Re: PyFlink performance and deployment issues

2021-07-07 Thread Xingbo Huang
Hi Wouter, Sorry for the late reply. I will try to answer your questions in detail. 1. >>> Perforce problem. When running udf job locally, beam will use a loopback way to connect back to the python process used by the compilation job, so the time of starting up the job will come faster than pyflin

Re: PyFlink performance and deployment issues

2021-07-08 Thread Xingbo Huang
Hi Wouter, In fact, our users have encountered the same problem. Whenever the `bundle size` or `bundle time` is reached, the data in the buffer needs to be sent from the jvm to the pvm, and then waits for the pym to be processed and sent back to the jvm to send all the results to the downstream op

Re: PyFlink performance and deployment issues

2021-07-08 Thread Xingbo Huang
ommendations with > regard to these two configuration values, to get somewhat reasonable > performance? > > Thanks a lot! > Wouter > > On Thu, 8 Jul 2021 at 10:26, Xingbo Huang wrote: > >> Hi Wouter, >> >> In fact, our users have encountered the same problem

Re: key_by problem in Pyflink

2021-07-12 Thread Xingbo Huang
Hi, I think your understanding is correct. The results seem a little wired. I'm looking into this and will let you know when there are any findings. Best, Xingbo 赵飞 于2021年7月12日周一 下午4:48写道: > Hi all, > I'm using pyflink to develop a module, whose main functionality is > processing user data bas

Re: key_by problem in Pyflink

2021-07-13 Thread Xingbo Huang
is not used in `process_element2` [1] https://issues.apache.org/jira/browse/FLINK-23368 Best, Xingbo 赵飞 于2021年7月12日周一 下午10:00写道: > Thanks. In addition, I run the program in a local mini cluster mode, not > sure if it would affect the results. > > Xingbo Huang 于2021年7月12日周一 下午9:0

Re: Datastream api implementation of a third party pyflink connector

2021-07-19 Thread Xingbo Huang
Hi Zhongle Wang, Your understanding is correct. Firstly, you need to provide an implementation of a java connector, then add this jar to the dependency[1], and finally add a python connector wrapper. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/python/faq/#adding-jar-

Re: Serving Machine Learning models

2022-01-11 Thread Xingbo Huang
Hi sonia, As far as I know, pyflink users prefer to use python udf[1][2] for model prediction. Load the model when the udf is initialized, and then predict each new piece of data [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table/udfs/overview/ [2] https://nightl

Re: pyflink object to java object

2022-02-28 Thread Xingbo Huang
Hi, With py4j, you can call any Java method. On how to create a Java Row, you can call the `createRowWithNamedPositions` method of `RowUtils`[1]. [1] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/types/RowUtils.java# Best, Xingbo Francis Conroy 于2022年2月25

Re: [ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-15 Thread Xingbo Huang
Thanks a lot for being our release manager Konstantin and everyone who contributed. I have a question about pyflink. I see that there are no corresponding wheel packages uploaded on pypi, only the source package is uploaded. Is there something wrong with building the wheel packages? Best, Xingbo

Re: [ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-16 Thread Xingbo Huang
affected Flink > 1.13.6, the other release I was recently managing. I simply skipped a step > in the release guide. > > It should be fixed now. Could you double-check? > > Cheers, > > Konstantin > > On Wed, Mar 16, 2022 at 4:07 AM Xingbo Huang wrote: > > > Thank

Re: Pyflink elastic search connectors

2022-03-29 Thread Xingbo Huang
Hi, Are you using datastream api or table api?If you are using the table api, you can use the connector by executing sql[1]. If you are using the datastream api, there is really no es connector api provided, you need to write python wrapper code, but the wrapper code is very simple. The underlying

Re: Help installing apache-flink and numpy to run flink python examples

2022-06-01 Thread Xingbo Huang
Hi Kenneth, In flink 1.15, pyflink only guarantees support for python 3.6,3.7 and 3.8[1]. In release-1.16, pyflink will provide support for python 3.9[2]. Go back to your installation error. In flink 1.15, the version range of numpy that pyflink depends on is numpy>=1.14.3,<1.20. So when you exec

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Xingbo Huang
Hi John, Could you provide the code snippet and the version of pyflink you used? Best, Xingbo John Tipper 于2022年6月16日周四 17:05写道: > Hi all, > > I'm trying to run a PyFlink unit test to test some PyFlink SQL and where > my code uses a Python UDF. I can't share my code but the test case is > si

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Xingbo Huang
my code but Flink is 1.13. The main Flink code is > running inside Kinesis on AWS so I cannot change the version. > > Many thanks, > > John > > Sent from my iPhone > > On 16 Jun 2022, at 10:37, Xingbo Huang wrote: > >  > Hi John, > > Could you provid

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Xingbo Huang
her than explicit > calls to the Table and DataStream APIs. > > Is this a good pattern or are there caveats I should be aware of please? > > Many thanks, > > John > > > -- > *From:* Xingbo Huang > *Sent:* 16 June 2022 12:34 >

[ANNOUNCE] Apache Flink 1.14.5 released

2022-06-21 Thread Xingbo Huang
The Apache Flink community is very happy to announce the release of Apache Flink 1.14.5, which is the fourth bugfix release for the Apache Flink 1.14 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Re: [ANNOUNCE] Apache Flink 1.15.1 released

2022-07-07 Thread Xingbo Huang
Thanks a lot for being our release manager David and everyone who contributed. Best, Xingbo David Anderson 于2022年7月8日周五 06:18写道: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.15.1, which is the first bugfix release for the Apache Flink 1.15 > series. > >

Re: Guide for building Flink image with Python doesn't work

2022-07-07 Thread Xingbo Huang
Hi Gyula, According to the log, we can see that you downloaded the source package of pemja, not the wheel package of pemja[1]. I guess you are using the m1 machine. If you install pemja from the source package, you need to have JDK, gcc tools and CPython with Numpy in the environment. I believe th

Re: [ANNOUNCE] Apache Flink 1.15.2 released

2022-08-24 Thread Xingbo Huang
Thanks Danny for driving this release Best, Xingbo Jing Ge 于2022年8月25日周四 05:50写道: > Thanks Danny for your effort! > > Best regards, > Jing > > On Wed, Aug 24, 2022 at 11:43 PM Danny Cranmer > wrote: > >> The Apache Flink community is very happy to announce the release of >> Apache Flink 1.15.2

Re: Unable to run pyflink job - NetUtils getAvailablePort Error

2022-09-06 Thread Xingbo Huang
Hi Raman, This problem comes from the inconsistency between your flink version and pyflink version Best, Xingbo Ramana 于2022年9月6日周二 15:08写道: > Hello there, > > I have a pyflink setup of 1 : JobManager - 1 : Task Manager. > > Trying to run a pyflink job and no matter what i do, i get the follow

[ANNOUNCE] Apache Flink 1.14.6 released

2022-09-27 Thread Xingbo Huang
The Apache Flink community is very happy to announce the release of Apache Flink 1.14.6, which is the fifth bugfix release for the Apache Flink 1.14 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming a

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-10 Thread Xingbo Huang
+1 for reverting these changes in Flink 1.16, so I will cancel 1.16.0-rc1. +1 for `numXXXSend` as the alias of `numXXXOut` in 1.15.3. Best, Xingbo Chesnay Schepler 于2022年10月10日周一 19:13写道: > > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut > > But that's not possible. If it

[ANNOUNCE] Apache Flink 1.16.0 released

2022-10-27 Thread Xingbo Huang
The Apache Flink community is very happy to announce the release of Apache Flink 1.16.0, which is the first release for the Apache Flink 1.16 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applicat

Re: Dependency resolution issue with apache-flink 1.16.0 python package.

2022-11-16 Thread Xingbo Huang
Hi Yogi, I think the problem comes from poetry depending on the metadata in PyPI. This problem has been reported in https://issues.apache.org/jira/browse/FLINK-29817 and I will fix it in 1.16.1. Best, Xingbo Yogi Devendra 于2022年11月17日周四 06:21写道: > Dear community/maintainers, > > Thanks for the

Re: Facing Issue in running Python Flink Program in flink cluster (Version=1.15.2)

2022-11-21 Thread Xingbo Huang
Hi Harshit, According to the stack you provided, I guess you define your Python function in the main file, and the Python function imports xgboost globally. The reason for the error is that the xgboost library is difficult to be serialized by cloudpickle. There are two ways to solve 1. Move `impo

Re: Pyflink 1.20.1 packages can not be installed

2025-05-12 Thread Xingbo Huang
Hi Janus, Which python version are you using? Flink 1.19 removes the support of Python 3.7. Best, Xingbo janusgraph 于2025年5月12日周一 15:03写道: > Sorry about that the snapshot is broken. > > The errors are as follows. > ``` > # pip3 install apache-flink==1.20.1 > Collecting apache-flink==1.20.1 >