[ANNOUNCE] Apache Flink 1.15.3 released

2022-11-25 Thread Fabian Paul
The Apache Flink community is very happy to announce the release of Apache Flink 1.15.3, which is the third bugfix release for the Apache Flink 1.15 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming a

Re: Alertmanager Sink Connector

2022-04-20 Thread Fabian Paul
ated firing (or to clear it from state if it is > resolved). From this ProcessFunction you send records to an AlertManagerSink. > > For 2. the AsyncSink, Fabian Paul (cc) might be the best person to judge if > it is a good fit. There is PR [1] to add a blog post about the Async Sink

Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-24 Thread Fabian Paul
eam = env.fromSource(hybridSource, watermarkStrategy, >> name) >> > > Let me know if you have any questions! > > Thanks, > Ryan van Huuksloot > Data Developer | Data Platform Engineering | Streaming Capabilities > [image: Shopify] > <https://www.shopify.com/?u

Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-21 Thread Fabian Paul
Hi Ryan, Thanks for bringing up this topic. Currently, your analysis is correct, and reading parquet files outside the Table API is rather difficult. The community started an effort in Flink 1.15 to restructure some of the formats to make them better applicable to the DataStream and Table API. You

Re: Kafka Consumer Group Name not set if no checkpointing?

2022-02-01 Thread Fabian Paul
umerGroup) > .setStartingOffsets(oi) > .setProperties(props) > .build(); > > > On Mon, 31 Jan 2022 at 12:03, Fabian Paul wrote: >> >> Hi John, >> First I would like to ask you to give us some more information about >> how you consume from Kafka with Flin

Re: How to put external data in EmbeddedRocksDB

2022-01-31 Thread Fabian Paul
Hi Surendra, I do not think there is an out-of-the-box way to do look-ups to the local rocksdb instance. In general, I am a bit skeptical about whether it is a good idea to use the rocksdb instance for your state management, and the looks up in parallel. It may overload the rocksdb and cause unexp

Re: Flink 1.14.2/3 - KafkaSink vs deprecated FlinkKafkaProducer

2022-01-31 Thread Fabian Paul
Hi Daniel, Thanks for reaching out, we are constantly trying to improve the reliability of our connectors. I assume you are running the KafkaSink with exactly-once delivery guarantee. On startup, the KafkaSink tries to abort lingering transactions from previous executions. Unfortunately, nothing c

Re: Kafka Consumer Group Name not set if no checkpointing?

2022-01-31 Thread Fabian Paul
Hi John, First I would like to ask you to give us some more information about how you consume from Kafka with Flink. It is currently recommended to use the KafkaSource that subsumes the deprecated FlinkKafkaConsumer. One thing to already note is that by default Flink does not commit the Kafka to o

Re: ParquetColumnarRowInputFormat - parameter description

2022-01-25 Thread Fabian Paul
Hi Krzysztof, sorry for the late reply. The community is very busy at the moment with the final two weeks of Flink 1.15. The parameters you have mentioned are mostly relevant for the internal conversion or representation from Parquet types to Flink's SQL type system. - isUtcTimestamp denotes whe

Re: FlinkKafkaConsumer and FlinkKafkaProducer and Kafka Cluster Migration

2022-01-14 Thread Fabian Paul
Hi Alexey, The bootstrap servers are not part of the state so you are good to go although please stop all your jobs with a savepoint and resume from it with the new properties. I guess to migrate the FlinkKafkaConsumer to an empty topic you can discard the state if you ensure that all messages beg

Re: Is FlinkKafkaProducer state compatible with KafkaSink sink? How to migrate?

2022-01-12 Thread Fabian Paul
Hi Kevin, No, the state is not compatible but it is also not necessary because if the FlinkKafkaProducer is stopped with a savepoint all transactions are finalized and the new KafkaSink uses a different mechanism to track transaction ids. [1] It should be enough to recover from the savepoint with

Re: StaticFileSplitEnumerator - keeping track 9f already processed files

2022-01-11 Thread Fabian Paul
t; Could you shed some light on this and how the collectWithClient works > especially in case if Failover. > > Thanks, > Krzysztof Chmielewski > > czw., 6 sty 2022 o 09:29 Fabian Paul napisał(a): >> >> Hi, >> >> I think your analysis is correct. One thing to n

Re: Plans to update StreamExecutionEnvironment.readFiles to use the FLIP-27 compatible FileSource?

2022-01-11 Thread Fabian Paul
pic of this thread but I was wondering if you > could elaborate on that. Can the new FileSource interoperate with the old > .readFile operator state? Is there a smooth way to upgrade to the new > FileSource API from the old one without losing state? > > Thanks! > > On Mon, Jan

Re: Plans to update StreamExecutionEnvironment.readFiles to use the FLIP-27 compatible FileSource?

2022-01-10 Thread Fabian Paul
Hi Kevin, I created a ticket to track the effort [1]. Unfortunately, we are already in the last few weeks of the release cycle for 1.15 so I cannot guarantee that someone can implement it until then. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-25591 On Fri, Jan 7, 2022 at 5:07

Re: StaticFileSplitEnumerator - keeping track 9f already processed files

2022-01-06 Thread Fabian Paul
Hi, I think your analysis is correct. One thing to note here is that I guess when implementing the StaticFileSplitEnumerator we only thought about the batch case where no checkpoints exist [1] on the other hand it is possible as you have noted to run a bounded source in streaming mode. Although i

Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-15 Thread Fabian Paul
Hi Megh, Flink offers the ParquetVectorizedInputFormat which is already heavily optimized. Unfortunately, you need to need to implement some of the methods depending on your type. In general, the BulkFormat gives you more control and allows more optimizations but is harder to implement. Best, Fab

Re: FileSink in Apache Flink not generating logs in output folder

2021-12-08 Thread Fabian Paul
Hi Kajal, This looks indeed strange. Are you sure that there are records sent to the sink? You can verify it by looking at some Flink metrics of tasks before the task if they emit something. The sink should create a part file immediately when it receives a record and the rolling policy should ensu

Re: Stateful functions module configurations (module.yaml) per deployment environment

2021-12-08 Thread Fabian Paul
Hi Deniz, Great to hear from someone using Ververica Platform with StateFun. When deploying your job you can specify `additionalConfigurations`[1] that are also pulled and put into the classpath. Hopefully, that is suitable for your scenario. Best, Fabian [1] https://docs.ververica.com/user_gu

Re: KafkaSink.builder setDeliveryGuarantee is not a member

2021-12-02 Thread Fabian Paul
Hi Lars, Unfortunately, there is at the moment a small bug in our documentation [1]. You can set the DeliveryGuarantee on the builder object and not on the serialization schema. Sorry for the inconvenience. Best, Fabian [1] https://github.com/apache/flink/pull/17971 On Thu, Dec 2, 2021 at 12:34

Re: How to Fan Out to 100s of Sinks

2021-11-30 Thread Fabian Paul
ket as an Iceberg table row. We were hoping to continue using Flink for > this use case by just one job doing a conditional sink, but we are not sure > if that would be the right usage of Flink. > > Thanks, > Shree > ____ > From: Fabian Paul

Re: How to Fan Out to 100s of Sinks

2021-11-29 Thread Fabian Paul
Hi, What do you mean by "fan out" to 100 different sinks? Do you want to replicate the data in all buckets or is there some conditional branching logic? In general, Flink can easily support 100 different sinks but I am not sure if this is the right approach for your use case. Can you clarify your

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-10 Thread Fabian Paul
Hi Dongwon, Thanks for sharing the logs and the metrics screenshots with us. Unfortunately, I think we need more information to further isolate the problem therefore I have a couple of suggestions. 1. Since you already set up PromQL can you also share the JVM memory statics i.e. DirectMemory cons

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-09 Thread Fabian Paul
Hi Dongwan, Can you maybe share more about the setup and how you use the AsyncFunction with the Kafka client? As David already pointed out it could be indeed a Kafka bug but it could also mean that your defined async function leaks direct memory by not freeing some resources. We can definitely i

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Fabian Paul
Hi Krzysztof, The blog post is not building a lookup source but only a scan source. For scan sources you can choose between the old RichSourceFunction or the new unified Source interface. For lookup sources you need to implement either a TableFunction or a AsyncTableFunction there are current

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Fabian Paul
I think neither Source nor RichSourceFunction are correct in this case. You can have a look at the Jdbc lookup source[1][2]. Your function needs to implement TableFunction. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apac

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Fabian Paul
Hi Krzysztof, The unified Source is meant to be used for the DataStream API and Table API. Currently, we do not have definition of look up sources in the DataStream API therefore the new source do not work as lookups and only as scan sources. Maybe in the future we also want to define look ups

Re: How to write unit test for stateful operators in Pyflink apps

2021-11-05 Thread Fabian Paul
Hi, Since you want to use Table API you probably can write a more high-level test around executing the complete program. A good examples are the pyflink example programs [1]. I also could not find something similar to the testing harness from Java. I cced Dian maybe he knows more about it. Be

Re: How to filter these emails?

2021-11-05 Thread Fabian Paul
of consuming from > exchanges. Do you know of another rabbitmq connector that has this feature? > > Also, just now learning about stateful functions. Is there a way to sink to > them from a flink job? > > Thank you very much, > > Ivan > > From: Fabian Paul &l

Re: "sink.partition-commit.success-file.name" option in the FileSystem connector does not work

2021-11-05 Thread Fabian Paul
Hi, Currently this is expected because the FileSink is built to support running with higher parallelism. Therefore it needs to periodically write files. The respective file names always have a descriptor that the File Sink knows which files have already been written. You can read more about the

Re: How to filter these emails?

2021-11-05 Thread Fabian Paul
Hi Ivan, You can configure your email client to filter for messages going to user@flink.apache.org and move them into a separate folder. Best, Fabian

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Fabian Paul
Hi Krzysztof, It is great to hear that you have implemented your source with the unified Source interface. To integrate you source in the Table API you can use the SourceProvider. You can take a look at how our FileSource does is[1] Btw I think you forgot to add your references ;) Best, Fabi

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-04 Thread Fabian Paul
Hi Yuval, We tried to reproduce your behaviour but unfortunately did not succeed. It seems you can reliably reproduce the problem. Do you think there is any way to break the problem down and share with us the problematic case so that we can reproduce it? Best, Fabian

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-02 Thread Fabian Paul
Hi Yuval, Ok no worries. One thing I would first check is why the TwoPhaseCommitSinkFunction is instantiated because the KafkaSink is not using it. It seems there is still an old FlinkKafkaProducer build somewhere. Best, Fabian

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-02 Thread Fabian Paul
Hi Yuval, This error looks indeed strange. I do not think when switching to the unified KafkaSink the old serializer should be invoked at all. Can you maybe share more information about the job you are using or maybe share the program so that we can reproduce it? Best, Fabian

Re: Some question with flink rabbitmq connector?

2021-10-28 Thread Fabian Paul
Hi, You are right the current RMQSink does not give any delivery guarantee. As far as I know there are also no ongoing efforts to improve the situation but we are always open for contributions if you would like to work on that. We are currently trying move all Sinks to the new unified Sink inte

Re: FlinkKafkaConsumer -> KafkaSource State Migration

2021-10-27 Thread Fabian Paul
Hi, Sorry for the late reply but most of use were involved in the Flink Forward conference. The upgrade strategies for the Kafka sink and source are pretty similar. Source and sink do not rely on state migration but leveraging Kafka as source of truth. When running with FlinkKafkaConsumer Maso

Re: SplitEnumeratorContext callAsync() cleanup

2021-10-25 Thread Fabian Paul
Hi Mason, Thanks for opening the ticket. Can you also share the log with us when the KafkaEnumerator closed before the async call finished? Best, Fabian

Re: SplitEnumeratorContext callAsync() cleanup

2021-10-22 Thread Fabian Paul
Hi Mason, This seems to be a bug with the current KafkaSource and also the unified Sources in general. Can you open a bug ticket in jira? I think the enumerator should take of first joining all the async threads before closing the enumerator. Best, Fabian

Re: Impossible to get pending file names/paths on checkpoint?

2021-10-21 Thread Fabian Paul
Hi Preston, To be honest I am a bit surprised that it worked with Flink 1.10. I was under the impression that we did never support writing to the Azure Filesystem. Unfortunately, it is very hard for us to implement a solid FileSystem implementation for Azure in Flink. We would need someone with

Re: Flink fault tolerance guarantees

2021-10-13 Thread Fabian Paul
Hi Yuval, If the pipeline fails before the next checkpoint all the records in the buffer should be replayed beginning from the last taken checkpoint. The replay usually starts from the source and reading records again from the external system. The assumption is always that after a successful ch

Re: Flink fault tolerance guarantees

2021-10-13 Thread Fabian Paul
Hi Yuval, If your pipeline can implement an exactly-once delivery guarantee depends on your pipeline. Usually Flink’s fault tolerance mechanism is built around periodically snapshots of intermediate states called checkpoints. As long as checkpointing is enabled and all the operators you are usi

Re: Impossible to get pending file names/paths on checkpoint?

2021-10-12 Thread Fabian Paul
Hi Preston, I just noticed I forgot to cc to the user mailing list on my first reply …. I have a few thoughts about the design you are describing. > In the meantime I have a nasty hack in place that has unblocked me for now in > getting the target file off the LocalRecoverable/HadoopFsRecovera

Re: RocksDB: Spike in Memory Usage Post Restart

2021-10-06 Thread Fabian Paul
Hi Kevin, Since you are seeing the problem across multiple Flink versions and with the default RocksDb and custom configuration it might be related to something else. A lot of different components can allocate direct memory i.e. some filesystem implementations, the connectors or some user grpc

Re: In flight records on Flink : Newbie question

2021-10-06 Thread Fabian Paul
Hi Declan, As far as I know the FileSink does not buffer records but writes the records to temporary files which are bucketed later. For the Elasticsearch sink you are right it buffers the records before flushing them to ElasticSearch but you can control the flushing behaviour based on a given

Re: RocksDB: Spike in Memory Usage Post Restart

2021-10-06 Thread Fabian Paul
Hi Kevin, Sorry for the late reply. I collected some feedback from other folks and have two more questions. 1. Did you enable incremental checkpoints for your job and is the checkpoint you recover from incremental? 2. I saw in your configuration that you set `state.backend.rocksdb.block.cach

Re: RocksDB: Spike in Memory Usage Post Restart

2021-10-04 Thread Fabian Paul
Hi Kevin, We bumped the RocksDb version with Flink 1.14 which we thought increases the memory control [1]. In the past we also saw problems with the allocator used of the OS. We switched to use jemalloc within our docker images which has a better memory fragmentation [2]. Are you using the offi

Re: In flight records on Flink : Newbie question

2021-10-04 Thread Fabian Paul
Hi Declan, I forgot to ask which sink you are using. I do not think it is generally applicable that all sinks buffer records and only send them periodically. It depends a lot on the connector and what kind of capabilities the external system you are writing to offers. The amount of buffered da

Re: In flight records on Flink : Newbie question

2021-10-01 Thread Fabian Paul
Hi Declan, Thanks for reaching out, we always welcome new users to Apache Flink community :) Your first question is a bit tricky. I am still trying to understand the motivation behind. In general there is no generic way to access the records which one of the operator currently processes. Are

Re: RocksDB: Spike in Memory Usage Post Restart

2021-10-01 Thread Fabian Paul
Hi Kevin, You are right RocksDB is probably responsible for the memory consumption you are noticing. We have definitely seen similar issues in the past and with the latest Flink version 1.14 we tried to restrict the RocksDB memory consumption even more to make it better controllable. Can you t

Re: KafkaSource builder and checkpointing with parallelism > kafka partitions

2021-09-16 Thread Fabian Paul
Hi all, The problem you are seeing Lars is somewhat intended behaviour, unfortunately. With the batch/stream unification every Kafka partition is treated as kind of workload assignment. If one subtask receives a signal that there is no workload anymore it goes into the FINISHED state. As alread

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Fabian Paul
Hi Constantinos, I agree with David that it is not easily possible to remove a partition while a Flink job is running. Imagine the following scenario: Your Flink job initially works on 2 partitions belonging to two different topics and you have checkpointing enabled to guarantee exactly-once de

Re: Flink Performance Issue

2021-08-24 Thread Fabian Paul
Hi Mohammed, 200records should definitely be doable. The first you can do is remove the print out Sink because they are increasing the load on your cluster due to the additional IO operation and secondly preventing Flink from fusing operators. I am interested to see the updated job graph after

Re: Flink 1.13.1 Kafka Producer Error

2021-08-24 Thread Fabian Paul
Hi Debraj How do you run your application? If you run it from an IDE you can set a breakpoint and inspect the serializer class which is used. Best, Fabian

Re: Flink 1.13.1 Kafka Producer Error

2021-08-24 Thread Fabian Paul
Hi Debraj, The error looks indeed strange. We recommend to not set any `ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG` or `ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG` because the connector will take care of it. Can you try to remove these call and check if it makes a difference? Only looking a

Re: Can't start FlinkKafkaProducer using SSL

2021-08-24 Thread Fabian Paul
Hi Wouter, Can you share the jars which are part of the classpath? I can imagine that something was not bundled correctly. Best, Fabian

Re: Flink Performance Issue

2021-08-24 Thread Fabian Paul
Hi Mohammed, Without diving too much into your business logic a thing which catches my eye is the partitiong you are using. In general all calls to`keyBy`or `rebalance` are very expensive because all the data is shuffled across down- stream tasks. Flink tries to fuse operators with the same keyG

Re: How to check rocksdb log

2021-08-11 Thread Fabian Paul
Hi Li, Flink has disabled the RocksDb logs because sizing problems but you can have a look at this link [1] on how to enable them and setting the log directory. Let me know if that answers your question. Best, Fabian [1] https://ververica.zendesk.com/hc/en-us/articles/360015933320-How-to-get

Re: Error using KafkaProducer EXACTLY_ONCE semantic + TaskManager Failure

2021-08-03 Thread Fabian Paul
Hi Kevin, Please excuse the rather long reply duration. The behaviour you are describing is definitely worrisome. I think I need some more information and the exact stacktrace. Can you provide your checkpointing configuration i.e. frequency, how many concurrent checkpoints, rough checkpoint dur

Re: Need help of deploying Flink HA on kubernetes cluster

2021-07-29 Thread Fabian Paul
Hi Dhiru, Sorry for the late reply. Once the cluster is successfully started the web UI should be reachable if you somehow forward the port of the running pod. Although with the exception you have shared I suspect the cluster never fully runs (or not long enough). Can you share the full stacktra

Re: Need help of deploying Flink HA on kubernetes cluster

2021-07-22 Thread Fabian Paul
Hi Dhiru, No worries I completely understand your point. Usually all the executable scripts from Flink can be found in the main repository [1]. We also provide a community edition of our commercial product [2] which manages the lifecycle of the cluster and you do not have to use these scripts an

Re: Questions about keyed streams

2021-07-22 Thread Fabian Paul
Hi Dan, 1) In general, there is no guarantee that your downstream operator is on the same TM although working on the same key group. Nevertheless, you can try force this kind of behaviour to prevent the network transfer by either chaining the two operators (if no shuffle is in between) or confi

Re: Need help of deploying Flink HA on kubernetes cluster

2021-07-22 Thread Fabian Paul
Hi Dhirendra, Thanks for reaching out. A good way to start is to have a look at [1] and [2]. Once you have everything setup it should be possible to delete the pod of the JobManager while an application is running and the job successfully recovers. You can use one of the example Flink applicati

Re: Flink TaskManager container got restarted by K8S very frequently

2021-07-22 Thread Fabian Paul
CC user ML

Re: Suspected SPAM - RE: FW: Hadoop3 with Flink

2021-07-05 Thread Fabian Paul
Hi Suchithra, I suspect you want to connect to something like HDFS from the running k8s application. Is this correct? Yangze already gave some good hints you have to make sure that the necessary Hadoop java libraries and configurations are part of your Flink docker images or are mounted from an

Re: upgrade kafka-schema-registry-client to 6.1.2 for Flink-avro-confluent-registry

2021-06-25 Thread Fabian Paul
Hi, Thanks for bringing this up. It looks to me like something we definitely want to fix. Unfortunately, I also do not see an easy workaround besides building your own flink-avro-confluent-registry and bumping the dependency. Can you create a JIRA ticket for bumping the dependencies and would y

Re: "Legacy Source Thread" line in logs

2021-06-24 Thread Fabian Paul
Hi Debraj, Sorry for the confusion the FlinkKafkaConsumer is the old source and the overhauled one you can find here [1]. You would need to replace the FlinkKafkaConsumer with the KafkaSource to not see the message anymore. Best Fabian [1] https://github.com/apache/flink/blob/2bd8fab01d2aba9

Re: "Legacy Source Thread" line in logs

2021-06-23 Thread Fabian Paul
Hi Debraj, By Source Legacy Thread we refer to all sources which do not implement the new interface yet [1]. Currently only the Hive, Kafka and FileSource are already migrated. In general, there is no sever downside of using the older source but in the future we plan only to implement ones based

Re: avro-confluent supports authentication enabled schema registry

2021-06-02 Thread Fabian Paul
> > <https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaSchemaSerDeConfig.java#L85> > On Wed, Jun 2, 2021 at 5:27 PM Fabian Paul <mailto:fabianp...@data-artisans.com>> wrote: >

Re: Parquet reading/writing

2021-06-02 Thread Fabian Paul
Hi Taras, On first glance this looks like a bug to me. Can you try the latest 1.12 version (1.12.4)? If the bug still persists can you share the full job manager and task manager logs to further debug this problem. Best, Fabian > On 2. Jun 2021, at 13:22, Taras Moisiuk wrote: > > Update: >

Re: DSL for Flink CEP

2021-06-02 Thread Fabian Paul
ere any plan to grow the flink CEP and build a > friendly DSL around flink CEP by any chance. > > Regards > Dipanjan > > On Wednesday, June 2, 2021, 03:22:46 PM GMT+5:30, Fabian Paul > wrote: > > > Hi Dipanjan, > > Unfortunately, I have no experience with

Re: DSL for Flink CEP

2021-06-02 Thread Fabian Paul
Hi Dipanjan, Unfortunately, I have no experience with Siddhi but I am not aware of any official joined efforts between Flink and Siddhi. I can imagine that not all Siddhi CEP expressions are compatible with Flink’s CEP. At the moment there is also no active development for Flink’s CEP. I think

Re: avro-confluent supports authentication enabled schema registry

2021-06-02 Thread Fabian Paul
Hi Tao, Thanks for reaching out. Have you tried the following 'value.avro-confluent.schema-registry.url' = 'https://{SR_API_KEY}:{SR_API_SECRET}@psrc-lzvd0.ap-southeast-2.aws.confluent.cloud', It may be possible to provide basic HTTP authentication by adding your username and password to t

Re: Convert DataStream to Table with the same columns in Row

2021-05-14 Thread Fabian Paul
Hi John, Can you maybe share more code about how you build the DataStrean? It would also be good to know against which Flink version you are testing. I just tried the following code against the current master and: StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); DataStream r

Re: How to setup HA properly with Kubernetes Standalone Application Cluster

2021-05-14 Thread Fabian Paul
Hi Chen, Can you tell us a bit more about the job you are using? The intended behaviour you are seeking can only be achieved If the Kubernetes HA Services are enabled [1][2]. Otherwise the job cannot recall past checkpoints. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-rel

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-16 Thread Fabian Paul
Hi Yik San, I think the usage of vectorized udfs highly depends on your input and output formats. For your example my first impression would say that parsing a JSON string is always an rather expensive operation and the vectorization has not much impact on that. I am ccing Dian Fu who is more

Re: Iterate Operator Checkpoint Failure

2021-04-16 Thread Fabian Paul
Hi Lu, Can you provide some more detailed logs of what happened during the checkpointing phase? If it is possible please enable debug logs enabled. It would be also great know whether you have implemented your own Iterator Operator or what kind of Flink program you are trying to execute. Best,

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-05 Thread Fabian Paul
We are currently working on supporting arbitrary pod template specs for the Flink pods. It allows you to specify a V1PodTemplateSpec for taskmanager and jobmanager. The feature will be included in the next upcoming release 2.4 of the ververica platform. We plan to release it in the next few mon

Re: Application Mode support on VVP v2.3

2020-12-10 Thread Fabian Paul
Hi Narasimha, I think without a major change in vvp it will not be possible to submit multiple jobs within one jar in the foreseeable future. Maybe you can explain more about your use case why it is inconvenient for you to put the jobs in multiple jars? With ververica platform 2.4 we will add supp

Re: Application Mode support on VVP v2.3

2020-12-08 Thread Fabian Paul
:1.11.1-stream1-scala_2.12 > <http://registry.ververica.com/v2.2/flink:1.11.1-stream1-scala_2.12> > > There are no errors as such. But it is just considering the first job. > > > On Thu, Dec 3, 2020 at 5:34 PM Fabian Paul <mailto:fabianp...@data-artisans.com>>

Re: Application Mode support on VVP v2.3

2020-12-03 Thread Fabian Paul
Hi Narasimha, Nothing comes to my mind immediately why it should not work. We are using the StandaloneApplicationClusterEntryPoint to start the cluster. Can you provide some more information about which Flink image on vvp are you trying to use and maybe show the error message? Best, Fabian