Re: ByteSerializationSchema in PyFlink

2021-06-03 Thread Dian Fu
Hi Wouter, > E org.apache.flink.api.python.shaded.py4j.Py4JException: > Constructor > org.apache.flink.api.common.serialization.TypeInformationSerializationSchema([class > org.apache.flink.api.common.typeinfo.IntegerTypeInfo, class > org.apache.flink.configuration.Configurati

Re: Flink stream processing issue

2021-06-03 Thread yidan zhao
Yes, if you use KeyedCoProcess, flink will ensure that. Qihua Yang 于2021年6月4日周五 上午12:32写道: > > Sorry for the confusion Yes, I mean multiple parallelism. Really thanks > for your help. > > Thanks, > Qihua > > On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG wrote: >> >> Hi Qihua, >> >> I’m sorry I

Failed to cancel a job using the STOP rest API

2021-06-03 Thread Thomas Wang
Hi, Flink community, I'm trying to use the STOP rest API to cancel a job. So far, I'm seeing some inconsistent results. Sometimes, jobs could be cancelled successfully while other times, they couldn't. Either way, the POST request is accepted with a status code 202 and a "request-id". >From the F

Re: Flink exported metrics scope configuration

2021-06-03 Thread Mason Chen
Hi Kai, You can use the excluded variables config for the reporter. metrics.reporter..scope.variables.excludes: (optional) A semi-colon (;) separate list of variables that should be ignored by tag-based reporters (e.g., Prometheus, InfluxDB). https://ci.apache.org/projects/flink/flink-docs-rel

open checkpoint, send message to kafka OutOfOrderSequenceException

2021-06-03 Thread SmileSmile
Dear all: flink version is 1.12.4,kafka version is 1.1.1。topology is very simple ,source-->flatmap--->sink ,enable checkpoint,job will fail after a few hours 。 the error message is Caused by: org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to Kaf

subscribe

2021-06-03 Thread Boyang Chen

Flink exported metrics scope configuration

2021-06-03 Thread Kai Fu
Hi team, We noticed that Prometheus metrics exporter exports all of the metrics at the most fine-grained level, which is tremendous for the prometheus server especially when the parallelism is high. The metrics volume crawled from a single host(parallelism 8) is around 40MB for us currently. This

Re: Unexpected end of ZLIB input stream

2021-06-03 Thread Chesnay Schepler
What filesystem are you using? Is it possible that the source tries to read a file that is the process of being written to disk? n 6/3/2021 11:32 PM, Billy Bain wrote: We are getting this exception occasionally after a job runs for a month or more. The files don't seem to be corrupt from ou

Unexpected end of ZLIB input stream

2021-06-03 Thread Billy Bain
We are getting this exception occasionally after a job runs for a month or more. The files don't seem to be corrupt from our testing, so not sure what this error means. Task resources & network connectivity seem fine. How would you debug this? ink) (1/1)#52423 (595ced3edfe32bb7d826955f1a195a29)

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-03 Thread Alexander Filipchik
On the checkpoints -> what kind of issues should I check for? I was looking for metrics and it looks like they were reporting successful checkpoints. It looks like some files were removed in the shared folder, but I'm not sure how to check for what caused it. Savepoints were failing due to savepoi

Re: Flattening of events

2021-06-03 Thread Chesnay Schepler
Have a look at flatMaps: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#datastream-rarr-datastream-1 On 6/3/2021 8:28 PM, Satish Saley wrote: Hi team, I am trying to figure out a way to flatten events in my Flink app. Event that i am consum

Re: behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-06-03 Thread Jin Yi
just to resolve this thread, i figured out the issue. there's a local version of a watermark strategy that we use when running locally for development that didn't work correctly on many events with the same timestamp which the fake data generation that happens for local runs has a tendency to do.

Flattening of events

2021-06-03 Thread Satish Saley
Hi team, I am trying to figure out a way to flatten events in my Flink app. Event that i am consuming from Kafka is UpperLevelData { int upperId; List listOfModules } ModuleData { int moduleId; string info; } After consuming this event, i want to flatten it out in following format - FlinkRes

Re: streaming file sink OUT metrics not populating

2021-06-03 Thread Chesnay Schepler
This is a known issue, and cannot be fixed on the user side. The underlying problem is that this needs to be implemented separately for each source/sink and we haven't gotten around to doing that yet, but some progress is being made for 1.14 to that end. On 6/3/2021 6:06 PM, Vijayendra Yadav w

Re: Flink stream processing issue

2021-06-03 Thread Qihua Yang
Sorry for the confusion Yes, I mean multiple parallelism. Really thanks for your help. Thanks, Qihua On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG wrote: > Hi Qihua, > > I’m sorry I didn’t understand what you mean by ‘replica’. Would you please > explain a little more? > If you means you job h

streaming file sink OUT metrics not populating

2021-06-03 Thread Vijayendra Yadav
Hi Team, I am using streaming file sink and sinking data to s3. Also, using Graphite exporter for metrics. I can see correct counters for Source, Map, Filter functions. But for SINK, only* numRecordsIn* is populating, I am looking to get *numRecordsOut *counts also, but it's always staying 0 alth

Re: StreamingFileSink output formatting to CSV

2021-06-03 Thread Chesnay Schepler
This is handled by the StringEncoder; the one you use (SimpleStringEncoder) just calls toString on the input element. I don't think Flink provides a CSV StringEncoder, but if all you want is remove the parenthesis, then you could wrap the SimpleStringEncoder and trim the first and last charact

StreamingFileSink output formatting to CSV

2021-06-03 Thread Robert Cullen
I have a StreamingFileSink that writes to S3: final StreamingFileSink> sink = StreamingFileSink.forRowFormat( new Path("s3://argo-artifacts/files"), new SimpleStringEncoder>("UTF-8")) .withBucketAssigne

ByteSerializationSchema in PyFlink

2021-06-03 Thread Wouter Zorgdrager
Hi all, I have a PyFlink job connected to a KafkaConsumer and Producer. I want to directly work with the bytes from and to Kafka because I want to serialize/deserialize in my Python code rather than the JVM environment. Therefore, I can't use the SimpleStringSchema for (de)serialization (the messa

Events triggering JobListener notification

2021-06-03 Thread Barak Ben Nathan
Hi all, I am using Flink 1.12.1 I’m building a system that creates/cancels Flink Jobs and monitors them. We thought to use org.apache.flink.core.execution.JobListener as a ‘push’ mechanism for job-status-change events. We based this idea on the documentation that stated that JobListener ‘…is

Re: Best practice for adding support for Kafka variants

2021-06-03 Thread deepthi Sridharan
Makes sense. Thanks for the confirmation. On Thu, Jun 3, 2021, 4:08 AM Arvid Heise wrote: > Just to add, we target that for 1.14. > > However, it's also not too complicated to add a new TableFactory that uses > the new sources (or your source). > > On Thu, Jun 3, 2021 at 10:04 AM Chesnay Scheple

Re: Best practice for adding support for Kafka variants

2021-06-03 Thread Arvid Heise
Just to add, we target that for 1.14. However, it's also not too complicated to add a new TableFactory that uses the new sources (or your source). On Thu, Jun 3, 2021 at 10:04 AM Chesnay Schepler wrote: > The FLIP-27 were primarily aimed at the DataStream API; the integration > into the SQL/Tab

Re: recover from svaepoint

2021-06-03 Thread Till Rohrmann
Thanks for this insight. So the problem might be Flink using an internal Kafka API (the connector uses reflection to get hold of the TransactionManager) which changed between version 2.4.1 and 2.5. I think this is a serious problem because it breaks our end-to-end exactly once story when using new

Re: In native k8s application mode, how can I know whether the job is failed or finished?

2021-06-03 Thread LIU Xiao
Thank you for timely help! I've tried session mode a little bit, it's better than I thought, the TaskManager can be allocated and de-allocated dynamically. But it seems the memory size of TaskManager is fixed when the session starts, and can not be adjusted for different job. I'll try to deploy a

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-03 Thread Chesnay Schepler
Is there anything in the Flink logs indicating issues with writing the checkpoint data? When the savepoint could not be created, was anything logged from Flink? How did you shut down the cluster? On 6/3/2021 5:56 AM, Alexander Filipchik wrote: Hi, Trying to figure out what happened with our F

Re: recover from svaepoint

2021-06-03 Thread Tianxin Zhao
I encountered the exact same issue before when experimenting in a testing environment. I was not able to spot the bug as mentioned in this thread, the solution I did was to downgrade my own kafka-client version from 2.5 to 2.4.1, matching the version of flink-connector-kafka. In 2.4.1 Kafka, Transa

Re: Best practice for adding support for Kafka variants

2021-06-03 Thread Chesnay Schepler
The FLIP-27 were primarily aimed at the DataStream API; the integration into the SQL/Table APIs will happen at a later date. On 6/1/2021 5:59 PM, deepthi Sridharan wrote: Thank you, Roman. I should have said our own flavor of Kafka and not version. Thanks for the reference of the new source and

Re: DSL for Flink CEP

2021-06-03 Thread Dipanjan Mazumder
Thanks a lot ... On Thursday, June 3, 2021, 12:49:58 PM GMT+5:30, Dawid Wysakowicz wrote: Hi, Just to add on top to what Fabian said. The only community supported CEP library is the one that comes with Flink[1]. It is also used internally to support the MATCH_RECOGNIZE clause in

Re: DSL for Flink CEP

2021-06-03 Thread Dawid Wysakowicz
Hi, Just to add on top to what Fabian said. The only community supported CEP library is the one that comes with Flink[1]. It is also used internally to support the MATCH_RECOGNIZE clause in Flink SQL[2]. Therefore there is a both programmatic library native DSL for defining patterns. Moreover you

Re: In native k8s application mode, how can I know whether the job is failed or finished?

2021-06-03 Thread Xintong Song
There are two ways to access the status of a job after it is finished. 1. You can try native k8s deployment in session mode. When jobs are finished in this mode, TMs will be automatically released after a short period of time, while JM will not be terminated until you explicitly shutdown the sessi

Re: recover from svaepoint

2021-06-03 Thread Till Rohrmann
Thanks for the update. Skimming over the code it looks indeed that we are overwriting the values of the static value ProducerIdAndEpoch.NONE. I am not 100% how this will cause the observed problem, though. I am also not a Flink Kafka connector and Kafka expert so I would appreciate it if someone mo

Re: Flink stream processing issue

2021-06-03 Thread JING ZHANG
Hi Qihua, I’m sorry I didn’t understand what you mean by ‘replica’. Would you please explain a little more? If you means you job has multiple parallelism, and whether same data from different two inputs would be send to the same downstream subtask after `keyedCoProcessFunction`. Yes, Flink could d