Minicluster Flink tests, checkpoints and inprogress part files

2021-02-04 Thread Dan Hill
Hi Flink user group, *Background* I'm changing a Flink SQL job to use Datastream. I'm updating an existing Minicluster test in my code. It has a similar structure to other tests in flink-tests. I call StreamExecutionEnvironment.execute. My tests sink using StreamingFileSink Bulk Formats to tmp

Re: Proctime consistency

2021-02-04 Thread Rex Fenley
So if I'm reading this correctly, on checkpoint restore, if current machine time / proc time > checkpointed window proc time, the window will fire immediately with all the data it had aggregated. If current machine time < window proc time, the window will just continue where it left off until it hi

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread narasimha
Thanks Yang for confirming. I did try putting in the config, also modifying the deployment.yml in the helm chart. Adding TIll if this can be taken up. On Fri, Feb 5, 2021 at 10:37 AM Yang Wang wrote: > I am not very familiar with ververica platform. But after checking the > documentation[1], >

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread Yang Wang
I am not very familiar with ververica platform. But after checking the documentation[1], I am afraid that setting liveness check could not be supported in VVP. [1]. https://docs.ververica.com/user_guide/application_operations/deployments/configure_kubernetes.html Best, Yang narasimha 于2021年2月5日

Re: Watermarks on map operator

2021-02-04 Thread Kezhu Wang
> it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. Watermarks are most liked only used by timing segmented aggregation operator to trigger result materialization. In streaming, this “timing segmentation” is usually called “windowing”, so in t

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread narasimha
Thanks Yang, that was really helpful. But is there a way to add probes? I could find an example for setup via docker-compose, nothing I could find with VVP. It will be helpful to have it for the community for other cases as well. Can you please help in setting it up. On Fri, Feb 5, 2021 at 8:3

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread Yang Wang
If the JobManager and TaskManager have some fatal errors which they could not correctly handle, then both of them will directly exit with non-zero code. In such a case, the pod will be restarted. Once possible scenario I could imagine that the liveness and readiness could help is the long GC. Duri

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread narasimha
I have been asked at the org to set it up as per org level standards, so trying to set them. As these are health checks with k8s, so that k8s can report if there are any intermittent issues. Does the JobManager and TaskManager handle failures diligently? On Fri, Feb 5, 2021 at 7:53 AM Yang Wan

Re: Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread Yang Wang
Do you mean setting the liveness check like the following could not take effect? livenessProbe: tcpSocket: port: 6123 initialDelaySeconds: 30 periodSeconds: 60 AFAIK, setting the liveness and the readiness probe is not very necessary for the Flink

Re: Re: Re: Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-02-04 Thread Abhishek Rai
I had a similar need recently and ended up using KafkaDeserializationSchemaWrapper to wrap a given DeserializationSchema. The resulting KafkaDeserializationSchema[Wrapper] can be passed directly to the `FlinkKafkaConsumer` constructor. ``` class BoundingDeserializationSchema extends KafkaDese

StateFun scalability

2021-02-04 Thread Martijn de Heus
Hi all, I’ve been working with StateFun for a bit for my university project. I am now trying to increase the number of StateFun workers and the parallelism, however this barely seems to increase the throughput of my system. I have 5000 function instances in my system during my tests. Once I inc

Re: [Stateful Functions] JDBC Sink Problems

2021-02-04 Thread Jan Brusch
Hi Igal, thanks for the quick reply (as always) and the corresponding issue. Building StateFun from source is potentially an option for us until the feature makes it into an upcoming release. If there is anything we can do to help with the issue, please let us know. At least once is good eno

Set Readiness, liveness probes on task/job manager pods via Ververica Platform

2021-02-04 Thread narasimha
Hi, I'm using the ververica platform to host flink jobs. Need help in setting up readiness, liveness probes to the taskmanager, jobmanager pods. I tried it locally by adding the probe details in deployment.yml file respectively, but it didn't work. Can someone help me with setting up the probes.

Re: [Stateful Functions] JDBC Sink Problems

2021-02-04 Thread Igal Shilman
Hi Jan, StateFun enables object reuse automatically, and it can't be disabled with a configuration. There is a technical reason for that that has to do with how we translate StateFun concepts to Flink concepts. I've created an issue to remove this limitation [1]. I might come up with a workaround

Re: DynamicTableSink's equivalent of UpsertStreamTableSink.setKeyFields

2021-02-04 Thread Timo Walther
For this you might need to go a level deeper. Maybe the legacy util org.apache.flink.table.planner.plan.utils.UpdatingPlanChecker can help you. It analyzes the plan to figure out the keys. org.apache.flink.table.planner.plan.metadata.FlinkRelMdUniqueKeys seems the newer version. Regards, Ti

Watermarks on map operator

2021-02-04 Thread Antonis Papaioannou
Hi, reading through the documentation regarding waterrmarks, it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. My application reads from a kafka topic (with multiple partitions) and extracts assigns timestamp on each tuple based on some fi

[Stateful Functions] JDBC Sink Problems

2021-02-04 Thread Jan Brusch
Hello, we are currently trying to implement a JDBC Sink in Stateful Functions as documented here: https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/io-module/flink-connectors.html However, when starting the application we are running into this error: --

Re: AbstractMethodError while writing to parquet

2021-02-04 Thread Till Rohrmann
In order to answer this question you would need to figure out where the second parquet-avro dependency comes from. You can check your job via `mvn dependency:tree` and then check whether you have another dependency which pulls in parquet-avro. Another source where the additional dependency could co

AW: AbstractMethodError while writing to parquet

2021-02-04 Thread Jan Oelschlegel
Okay, this is helpful. The problem arrives when adding parquet-avro to the dependencies. But the the question is, why do I need this dependency? I is not mentioned in the docs and I’m using standard setup for writing into hdfs with parquet format, nothing special. Best, Jan Von: Till Rohrmann

Kafka connector doesn't support consuming update and delete changes in Table SQL API

2021-02-04 Thread meneldor
Hello, Flink 1.12.1(pyflink) I am deduplicating CDC records coming from Maxwell in a kafka topic. Here is the SQL: CREATE TABLE stats_topic( > `data` ROW<`id` BIGINT, `account` INT, `upd_ts` BIGINT>, > `ts` BIGINT, > `xid` BIGINT , > row_ts AS TO_TIMESTAMP(

Re: AbstractMethodError while writing to parquet

2021-02-04 Thread Till Rohrmann
I guess it depends from where the other dependency is coming. If you have multiple dependencies which conflict then you have to resolve it. One way to detect these things is to configure dependency convergence [1]. [1] https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html Ch

Cannot connect to queryable state proxy

2021-02-04 Thread 陳昌倬
Hi, We have problem connecting to queryable state client proxy as described in [0]. Any help is appreciated. The following is our setup: * Flink 1.12.1 * Standalone Kubernetes * Related config in flink-conf.yaml ``` queryable-state.enable: true queryable-state.proxy.ports: 6125 ``` *