Welcome to create Pulsar + Flink solutions at Pulsar Hackathon!

2021-04-15 Thread Dianjin Wang
Hi Flink community members, I would like to share that the first-ever Apache Pulsar Hackathon (Virtual) is open for registration[1], until April 28, 2021. The goal of this event is to engage the open source community, drive contributions, and generate ideas to enhance Pulsar and its Big Data ecosy

Question about state processor data outputs

2021-04-15 Thread Chen-Che Huang
Hi all, We're going to use state processor to make our keyedstate data to be written to different files based on the keys. More specifically, we want our data to be written to files key1.txt, key2.txt, ..., and keyn.txt where the value with the same key is stored in the same file. In each file,

Re: WindowFunction is stuck until next message is processed although Watermark with idle timeout is applied.

2021-04-15 Thread David Anderson
The withIdleness option does not attempt to handle situations where all of the sources are idle. Flink operators with multiple input channels keep track of the current watermark from each channel, and use the minimum of these watermarks as their own watermark. withIdleness marks idle channels as i

[ANNOUNCE] Apache Flink Stateful Functions 3.0.0 released

2021-04-15 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions (StateFun) 3.0.0. StateFun is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications.

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Robert Metzger
Hi, I'm not aware of any known issues with Hadoop and Flink on Docker. I also tried what you are doing locally, and it seems to work: flink-jobmanager| 2021-04-15 18:37:48,300 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Starting StandaloneSessionClusterEntrypoint.

Re: Running Apache Flink 1.12 on Apple Silicon M1 MacBook Pro

2021-04-15 Thread Robert Metzger
Hey Klemens, I'm sorry that you are running into this. Looks like you are the first (of probably many people) who use Flink on a M1 chip. If you are up for it, we would really appreciate a fix for this issue, as a contribution to Flink. Maybe you can distill the problem into an integration test,

Re: Running Apache Flink 1.12 on Apple Silicon M1 MacBook Pro

2021-04-15 Thread Klemens Muthmann
Hi, Since kindergarden time is shortened due to the pandemic I only get four hours of work into each day and I am supposed to do eight. So unfortunately I will not be able to develop a fix at the moment. -.- I am happy to provide any debug log you need or test adaptations and provide fixes as p

Re: Question about state processor data outputs

2021-04-15 Thread Robert Metzger
Hey Chen-Che Huang, I guess the StreamingFileSink is what you are looking for. It is documented here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html I drafted a short example (that is not production ready), which does roughly what you are asking for: htt

Re: Running Apache Flink 1.12 on Apple Silicon M1 MacBook Pro

2021-04-15 Thread Robert Metzger
Hi, a DEBUG log of the client would indeed be nice. Can you adjust this file: conf/log4j-cli.properties to the following contents: (basically TRACE logging with netty logs enabled) # Licensed to the Apache Softwa

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Flavio Pompermaier
Hi Robert, indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable.. I just want to understand what to expect, if I really need to specify Hado

Iterate Operator Checkpoint Failure

2021-04-15 Thread Lu Niu
Hi, Flink Users When we migrate from flink 1.9.1 to flink 1.11, we notice job will always fail on checkpoint if job uses Iterator Operator, no matter we use unaligned checkpoint or not. Those jobs don't have checkpoint issues in 1.9. Is this a known issue? Thank you! Best Lu

Re: PyFlink Vectorized UDF throws NullPointerException

2021-04-15 Thread Yik San Chan
Hi Dian, I wonder if we can improve the error tracing and message so that it becomes more obvious where the problem is? To me, a NPE really says very little. Best, Yik San On Thu, Apr 15, 2021 at 11:07 AM Dian Fu wrote: > Great! Thanks for letting me know~ > > 2021年4月15日 上午11:01,Yik San Chan

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Yang Wang
It seems that we do not export HADOOP_CONF_DIR as environment variables in current implementation, even though we have set the env.xxx flink config options. It is only used to construct the classpath for the JM/TM process. However, in "HadoopUtils"[2] we do not support getting the hadoop configurat

Re: PyFlink Vectorized UDF throws NullPointerException

2021-04-15 Thread Dian Fu
Definitely agree with you. Have created https://issues.apache.org/jira/browse/FLINK-22297 as a following up. > 2021年4月16日 上午7:10,Yik San Chan 写道: > > Hi Dian, > > I wonder if we can improve the error tracing and message so that it becomes >

Re: Question about state processor data outputs

2021-04-15 Thread Chen-Che Huang
Hi Robert, Thanks for your code. It's really helpful! However, with the readKeyedState api of state processor, we get dataset for our data instead of datastream and it seems the dataset doesn't support streamfilesink (not addSink method like datastream). If not, I need to transform the dataset

PyFlink: called already closed and NullPointerException

2021-04-15 Thread Yik San Chan
The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67118743/pyflink-called-already-closed-and-nullpointerexception . Hi community, I run into an issue where a PyFlink job may end up with 3 very different outcomes, given very slight difference in input, and luck :(

Re: PyFlink Vectorized UDF throws NullPointerException

2021-04-15 Thread Yik San Chan
Hi Dian, Thank you so much for tracking the issue! I run into another NullPointerException when running pandas UDF, but this time I add an unit test to ensure the input and output type already ... And the new issue looks even more odd ... Do you mind taking a look? http://apache-flink-user-mailin

Re: PyFlink: called already closed and NullPointerException

2021-04-15 Thread Dian Fu
1) Regarding to Outcome 2: The logs are just warnings and currently it has chances to appear during the job shutdown. It doesn’t affect the functionality and so you can just ignore them. 2) Regarding to Outcome 3: It should be caused by the following input: 3708233,4,2,100,九江,3,0,1,"iPhone9,1",中

Re: PyFlink Vectorized UDF throws NullPointerException

2021-04-15 Thread Dian Fu
Sure. I have replied. Let’s discuss it in that thread. > 2021年4月16日 上午11:40,Yik San Chan 写道: > > Hi Dian, > > Thank you so much for tracking the issue! > > I run into another NullPointerException when running pandas UDF, but this > time I add an unit test to ensure the input and output type a

Re: Question about state processor data outputs

2021-04-15 Thread Robert Metzger
Hi, I assumed you are using the DataStream API, because you mentioned the streaming sink. But you also mentioned the state processor API (which I ignored a bit). I wonder why you are using the state processor API. Can't you use the streaming job that created the state also for writing it to files

Re: PyFlink: called already closed and NullPointerException

2021-04-15 Thread Yik San Chan
Hi Dian, Regarding outcome 2, sure I will ignore them for now. Regarding outcome 3, you have eagle eyes! Good finding! Thank you so much, I can't imagine trying PyFlink without your help. 感谢! Best, Yik San On Fri, Apr 16, 2021 at 1:54 PM Dian Fu wrote: > 1) Regarding to Outcome 2: The logs ar