Re: Proper way to get DataStream

2021-04-08 Thread Arvid Heise
Hi Maminspapin, I just answered another question similarly, so let me just c&p it here: The beauty of Avro lies in having reader and writer schema and schema compatibility, such that if your schema evolves over time (which will happen in streaming naturally but is also very common in batch), you

Re: Avro schema

2021-04-08 Thread Arvid Heise
Hi Sumeet, The beauty of Avro lies in having reader and writer schema and schema compatibility, such that if your schema evolves over time (which will happen in streaming naturally but is also very common in batch), you can still use your application as is without modification. For streaming, this

Re: Organizing Flink Applications: Mono repo or polyrepo

2021-04-08 Thread Arvid Heise
Hi Bin, I would put Flink applications into separate repos. It reduces compile times and makes automatic deployment much easier (if you update master/release branch of application X, you simply deploy it - potentially with some manual trigger in your CI/CD pipeline) . You can also easily bump Flin

Re: Async + Broadcast?

2021-04-08 Thread Arvid Heise
Hi Alex, The easiest way to verify if what you tried is working out is to look at Flink's Web UI and check the topology. The broadcast side of the input will always be ... well broadcasted (=not chained). So you need to disable chaining only on the non-broadcasted dataset. val parsed: DataStream

Re: FLINK Kinesis consumer Checkpointing data loss

2021-04-08 Thread Arvid Heise
Hi Vijay, if you don't specify a checkpoint, then Flink assumes you want to start from scratch (e.g., you had a bug in your business logic and need to start completely without state). If there is any failure and Flink restarts automatically, it will always pick up from the latest checkpoint [1].

Flink Metrics emitted from a Kubernetes Application Cluster

2021-04-08 Thread Claude M
Hello, I've setup Flink as an Application Cluster in Kubernetes. Now I'm looking into monitoring the Flink cluster in Datadog. This is what is configured in the flink-conf.yaml to emit metrics: metrics.scope.jm: flink.jobmanager metrics.scope.jm.job: flink.jobmanager.job metrics.scope.tm: flink

how to submit jobs remotely when a rest proxy like nginx is used and REST endpoint is bind to loopback interface?

2021-04-08 Thread Ming Li
Hi, The flink official document clearly states that "Simple mutual authentication may be enabled by configuration if authentication of connections to the REST endpoint is required, but we recommend to deploy a “side car proxy”: Bind the REST endpoint to the loopback interface (or the pod-local int

?????? ?????? period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-08 Thread ??????
I have tried  to add 'classloader.parent-first-patterns.additional: "ru.yandex.clickhouse" ' to flink-config, but problem still exist. Is there lightweight way to put clickhouse JDBC driver on Flink lib/ folder?   --  -- ??:

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Kurt Young
My DDL is: CREATE TABLE csv ( id BIGINT, name STRING ) WITH ( 'connector' = 'filesystem', 'path' = '.', 'format' = 'csv' ); Best, Kurt On Fri, Apr 9, 2021 at 10:00 AM Kurt Young wrote: > Hi Flavio, > > We would recommend you to use new table source & sin

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

2021-04-08 Thread Yik San Chan
Hi Till, I have 2 follow-ups. (1) Why is Hive special, while for connectors such as kafka, the docs suggest simply bundling the kafka connector dependency with my user code? (2) it seems the document misses the "before you start the cluster" part - does it always require a cluster restart wheneve

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Kurt Young
Hi Flavio, We would recommend you to use new table source & sink interfaces, which have different property keys compared to the old ones, e.g. 'connector' v.s. 'connector.type'. You can follow the 1.12 doc [1] to define your csv table, everything should work just fine. *Flink SQL> set table.dml-

Re: Async + Broadcast?

2021-04-08 Thread Alex Cruise
Thanks Arvid! I'm not completely clear on where to apply your suggestions. I've included a sketch of my job below, and I have a couple questions: 1. It looks like enableObjectReuse() is a global setting, should I worry about whether I'm using any mutable data between stages? 2. Should I disableCh

Re: FLINK Kinesis consumer Checkpointing data loss

2021-04-08 Thread Vijayendra Yadav
Thanks it was working fine with: bin/flink run -s s3://bucketxx-app/flink/checkpoint/iv/f1ef373e27203907d2d568bca31e6556/chk-384/ \ On Thu, Apr 8, 2021 at 11:42 AM Vijayendra Yadav wrote: > Hi Arvid, > > Thanks for your response. I did not restart from the checkpoint. I assumed > Flink would lo

Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-08 Thread Maciek Próchniak
Hi, Did you put the clickhouse JDBC driver on Flink main classpath (in lib folder) and not in user-jar - as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code? When we enco

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Flavio Pompermaier
Hi Till, since I was using the same WITH-clause both for reading and writing I discovered that overwrite is actually supported in the Sinks, while in the Sources an exception is thrown (I was thinking that those properties were simply ignored). However the quote-character is not supported in the si

Re: Flink does not cleanup some disk memory after submitting jar over rest

2021-04-08 Thread Maciek Próchniak
Hi, don't know if this is the problem you're facing, but some time ago we encountered two issues connected to REST API and increased disk usage after each submission: https://issues.apache.org/jira/browse/FLINK-21164 https://issues.apache.org/jira/browse/FLINK-9844 - they're closed ATM, but

Re: FLINK Kinesis consumer Checkpointing data loss

2021-04-08 Thread Vijayendra Yadav
Hi Arvid, Thanks for your response. I did not restart from the checkpoint. I assumed Flink would look for a checkpoint upon restart automatically. *I should restart like below ?* bin/flink run -s s3://bucketxx-app/flink/checkpoint/iv/f1ef373e27203907d2d568bca31e6556/chk-384/ \ Thanks, Vijay O

Re: Compression with rocksdb backed state

2021-04-08 Thread deepthi Sridharan
Thank you, that makes sense. On Thu, Apr 8, 2021 at 12:37 AM Timo Walther wrote: > Hi Deepthi, > > 1. Correct > 2. Correct > 3. Incremental snapshots simply manage references to RocksDB's sstables. > You can find a full explanation here [1]. Thus, the payload is a > blackbox for Flink and Flink'

Flink does not cleanup some disk memory after submitting jar over rest

2021-04-08 Thread Great Info
I have deployed my own flink setup in AWS ECS. One Service for JobManager and one Service for task Managers. I am running one ECS task for a job manager and 3 ecs tasks for TASK managers. I have a kind of batch job which I upload using flink rest every-day with changing new arguments, when I submi

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

2021-04-08 Thread Till Rohrmann
Hi Yik San, for future reference, I copy my answer from the SO here: The reason for this difference is that for Hive it is recommended to start the cluster with the respective Hive dependencies. The documentation [1] states that it's best to put the dependencies into the lib directory before you

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Till Rohrmann
Hi Flavio, I tried to execute the code snippet you have provided and I could not reproduce the problem. Concretely I am running this code: final EnvironmentSettings envSettings = EnvironmentSettings.newInstance() .useBlinkPlanner() .inStreamingMode() .build(); final TableEnvironment

Re: flink cdc postgres connector 1.1 - heartbeat.interval.ms - WAL consumption

2021-04-08 Thread bat man
Thanks Till. Hi Jark, Any inputs, going through the code of 1.1 and 1.3 in the meantime. Thanks, Hemant On Thu, Apr 8, 2021 at 3:52 PM Till Rohrmann wrote: > Hi Hemant, > > I am pulling in Jark who is most familiar with Flink's cdc connector. He > might also be able to tell whether the fix ca

Re: UniqueKey constraint is lost with multiple sources join in SQL

2021-04-08 Thread Kai Fu
As identified with the community, it's bug and more information in issue https://issues.apache.org/jira/browse/FLINK-22113 On Sat, Apr 3, 2021 at 8:43 PM Kai Fu wrote: > Hi team, > > We have a use case to join multiple data sources to generate a > continuous updated view. We defined primary key

Re: Reducing Task Manager Count Greatly Increases Savepoint Restore

2021-04-08 Thread Till Rohrmann
Hi Kevin, when decreasing the TaskManager count I assume that you also decrease the parallelism of the Flink job. There are three aspects which can then cause a slower recovery. 1) Each Task gets a larger key range assigned. Therefore, each TaskManager has to download more data in order to restar

Re: flink cdc postgres connector 1.1 - heartbeat.interval.ms - WAL consumption

2021-04-08 Thread Till Rohrmann
Hi Hemant, I am pulling in Jark who is most familiar with Flink's cdc connector. He might also be able to tell whether the fix can be backported. Cheers, Till On Thu, Apr 8, 2021 at 10:42 AM bat man wrote: > Anyone who has faced similar issues with cdc with Postgres. > > I see the restart_lsn

Re: period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-08 Thread Yangze Guo
IIUC, your program will finally generate 100 ChildFirstClassLoader in a TM. But it should always be GC when job finished. So, as Arvid said, you'd better check who is referencing those ChildFirstClassLoader. Best, Yangze Guo On Thu, Apr 8, 2021 at 5:43 PM 太平洋 <495635...@qq.com> wrote: > > My app

?????? period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-08 Thread ??????
My application program looks like this. Does this structure has some problem? public class StreamingJob { public static void main(String[] args) throws Exception { int i = 0; while (i < 100) { try {

Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

2021-04-08 Thread Yik San Chan
The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de . ## Connector dependencies should be in default scope This is what [flink-quickstart-scala]( https://github.com/apache/flin

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Flavio Pompermaier
Any help here? Moreover if I use the DataStream APIs there's no left/right outer join yet..are those meant to be added in Flink 1.13 or 1.14? On Wed, Apr 7, 2021 at 12:27 PM Flavio Pompermaier wrote: > Hi to all, > I'm testing writing to a CSV using Flink 1.13 and I get the following > error: >

Re: flink cdc postgres connector 1.1 - heartbeat.interval.ms - WAL consumption

2021-04-08 Thread bat man
Anyone who has faced similar issues with cdc with Postgres. I see the restart_lsn and confirmed_flush_lsn constant since the snapshot replication records were streamed even though I have tried inserting a record in the whitelisted table. select * from pg_replication_slots; slot_name | plugin

Re: FLINK Kinesis consumer Checkpointing data loss

2021-04-08 Thread Arvid Heise
Hi Vijay, edit: After re-reading your message: are you sure that you restart from a checkpoint/savepoint? If you just start the application anew and use LATEST initial position, this is the expected bahvior. --- original intended answer if you restart from checkpoint this is definitively not the

Re: SingleValueAggFunction received more than one element error with LISTAGG

2021-04-08 Thread Timo Walther
Hi, which Flink version are you using? Could you also share the resulting plan with us using `TableEnvironment.explainSql()`? Thanks, Timo On 07.04.21 17:29, soumoks123 wrote: I receive the following error when trying to use the LISTAGG function in Table API. java.lang.RuntimeException:

Re: Compression with rocksdb backed state

2021-04-08 Thread Timo Walther
Hi Deepthi, 1. Correct 2. Correct 3. Incremental snapshots simply manage references to RocksDB's sstables. You can find a full explanation here [1]. Thus, the payload is a blackbox for Flink and Flink's compression flag has no impact. So we fully rely what RocksDB offers. 4. Correct I hope t

Re: Flink 1.12.2 sql api use parquet format error

2021-04-08 Thread Timo Walther
Hi, can you check the content of the JAR file that you are submitting? There should be a `META-INF/services` directory with a `org.apache.flink.table.factories.Factory` file that should list the Parque format. See also here: https://ci.apache.org/projects/flink/flink-docs-master/docs/connec

Re: Zigzag shape in TM JVM used memory

2021-04-08 Thread Piotr Nowojski
Hi, I don't think there is a Flink specific answer to this question. Just do what you would normally do with a normal Java application running inside a JVM. If there is an OOM on heap space, you can either try to bump the heap space, or reduce usage of it. The only Flink specific part is probably