Re: Streaming SQL support for redis streaming connector

2021-09-14 Thread Yangze Guo
> What about the bahir streaming connectors? Are they considered canonical? > Would they be merged in to the main project at some point? Iiuc we can use > table API etc to write data to redis using that right? I think it still can be used with Flink 1.12 and 1.13. However, it has not been update

RE: Streaming SQL support for redis streaming connector

2021-09-14 Thread Osada Paranaliyanage
Hi David, What about the bahir streaming connectors? Are they considered canonical? Would they be merged in to the main project at some point? Iiuc we can use table API etc to write data to redis using that right? What more will be required for us to use SQL? (as in specify the connector in t

Re: Error while fetching data from Apache Kafka

2021-09-14 Thread Dhiru
Thanks @caizhi , let me look this  On Monday, September 13, 2021, 10:24:03 PM EDT, Caizhi Weng wrote: Hi! This seems to be caused by some mismatching types in your source definition and your workflow. If possible could you describe the schema of your Kafka source and paste your datast

Fast serialization for Kotlin data classes

2021-09-14 Thread Alex Cruise
Hi there, I appreciate the fact that Flink has built-in support for making POJO and Scala `case class` serialization faster, but in my project we use immutable Kotlin `data class`es (analogous to Scala `case class`es) extensively, and we'd really prefer not to make them POJOs, mostly for style/tas

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Constantinos Papadopoulos
Thanks David. What you are saying makes sense. But, I keep hearing I shouldn't delete the topic externally, and I keep asking why doesn't Flink forget about the topic IF it has in fact been deleted externally (for whatever reason). I think I will drop this now. On Tue, Sep 14, 2021 at 5:50 PM Dav

Re: Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread David Morávek
Hi Sandeep, Jan has already provided pretty good guidelines for getting more context on the issue ;) Because this is not for the first time, I would like to raise awareness, that it's not OK to send a user related question to four Apache mailing list (that I know of). Namely: - user@flink.apache

Re: RocksDB state not cleaned up

2021-09-14 Thread tao xiao
Hi David, If I read Stephan's comment correctly TTL doesn't work well for cases where we have too many levels, like fast growing state, as compaction doesn't clean up high level SST files in time, Is this correct? If yes should we register a timer with TTL time and manual clean up the state (stat

Re: Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread Jan Lukavský
Hi Sandeep, a few questions:  a) which state backend do you use for Flink?  b) what is your checkpointingInterval set for FlinkRunner?  c) how much data is there in your input Kafka topic(s)? FileIO has to buffer all elements per window (by default) into state, so this might create a high pressu

Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread Kathula, Sandeep
Hi, We have a simple Beam application which reads from Kafka, converts to parquet and write to S3 with Flink runner (Beam 2.29 and Flink 1.12). We have a fixed window of 5 minutes after conversion to PCollection and then writing to S3. We have around 320 columns in our data. Our intention is

Re: flink : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextShutdownAware

2021-09-14 Thread Ragini Manjaiah
hi David, yes . you are correct. solved the issue On Tue, Sep 14, 2021 at 5:57 PM David Morávek wrote: > From the stacktrace you've shared in the previous email, it seems that > you're running the code from IDE, is that correct? > > This is the part that makes me assume that, because it's touchi

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread David Morávek
We are basically describing the same thing with Fabian, just a different wording. The problem is that if you delete the topic externally, you're making an assumption that downstream processor (Flink in this case) has already consumed and RELIABLY processed all of the data from that topic (which ma

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Constantinos Papadopoulos
Hi all, Thank you for the replies, they are much appreciated. I'm sure I'm missing something obvious here, so bear with me... Fabian, regarding: "Flink will try to recover from the previous checkpoint which is invalid by now because the partition is not available anymore." The above would happ

Re: Streaming SQL support for redis streaming connector

2021-09-14 Thread David Morávek
Hi Osada, in theory building a Redis table from "CDC stream" should definitely be doable. Unfortunately Flink currently doesn't have any official Redis Sink for the Table API and there is currently no on-going effort for adding it, so it would need to be implemented first. The resulting syntax wou

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Jan Lukavský
On 9/14/21 3:57 PM, David Morávek wrote: Hi Jan, Notion of completeness is just one part of the problem. The second part is that once you remove the Kafka topic, you are no longer able to replay the data in case of failure. So you basically need a following workflow to ensure correctness: 1

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread David Morávek
Hi Jan, Notion of completeness is just one part of the problem. The second part is that once you remove the Kafka topic, you are no longer able to replay the data in case of failure. So you basically need a following workflow to ensure correctness: 1) Wait until there are no more elements in the

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Jan Lukavský
Hi, just out of curiosity, would this problem be solvable by the ability to remove partitions, that declare, that do not contain more data (watermark reaching end of global window)? There is probably another problem with that topic can be recreated after being deleted, which could result in w

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Fabian Paul
Hi Constantinos, I agree with David that it is not easily possible to remove a partition while a Flink job is running. Imagine the following scenario: Your Flink job initially works on 2 partitions belonging to two different topics and you have checkpointing enabled to guarantee exactly-once de

Streaming SQL support for redis streaming connector

2021-09-14 Thread Osada Paranaliyanage
Hi All, We are looking to use flink to build a materialized view of a relation db and a document db using cdc streams. For this purpose we would like to use redis for hosting the materialized view. Can we do this in streaming SQL? We have worked through https://github.com/ververica/flink-sql-CDC

Re: flink : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextShutdownAware

2021-09-14 Thread David Morávek
>From the stacktrace you've shared in the previous email, it seems that you're running the code from IDE, is that correct? This is the part that makes me assume that, because it's touching files from local maven repository. SLF4J: Found binding in [jar:file:/Users/z004t01/.m2/repository/org/slf4j

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Constantinos Papadopoulos
Thank you for your answer David, which is a confirmation of what we see in the Flink code. A few thoughts below: "as this may easily lead to a data loss" Removing a topic/partition can indeed lead to data loss if not done carefully. However, *after* the topic has been deleted, I believe it woul

Re: flink : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextShutdownAware

2021-09-14 Thread Ragini Manjaiah
Hi David, please find my pom.xml . where I have excluded the slf4j-log4j12 dependency . even after excluding encountering this issue http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 h

Re: flink : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextShutdownAware

2021-09-14 Thread David Morávek
Hi Ragini, I think you actually have the opposite problem that your classpath contains slf4j binding for log4j 1.2, which is no longer supported. Can you try getting rid of the slf4j-log4j12 dependency? Best, D. On Tue, Sep 14, 2021 at 1:51 PM Ragini Manjaiah wrote: > when I try to run flink .

flink : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/LoggerContextShutdownAware

2021-09-14 Thread Ragini Manjaiah
when I try to run flink .1.13 application encountering the below mentioned issue. what dependency I am missing . can you please help me SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/z004t01/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.1

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread David Morávek
Hi Constantinos, The partition discovery doesn't support topic / partition removal as this may easily lead to a data loss (partition removal is not even supported by Kafka for the same reason) Dynamically adding and removing partitions as part of a business logic is just not how Kafka is designed

Re: RocksDB state not cleaned up

2021-09-14 Thread David Morávek
Hi Tao, my intuition is that the compaction of SST files is not triggering. By default, it's only triggered by the size ratios of different levels [1] and the TTL mechanism has no effect on it. Some reasoning from Stephan: It's very likely to have large files in higher levels that haven't been >

Re: FLINK-14316 happens on version 1.13.2

2021-09-14 Thread Xiangyu Su
Hi Guys, sorry for the late reply. we found out the issue is not related to flink, there is a connection issue with zookeeper. we deploy our whole infra on k8s and using aws spot ec2, once the pod get restarted or lost spot instances we lost the log files... so sorry for not being able to share th

Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Constantinos Papadopoulos
We are on Flink 1.12.1, we initialize our FlinkKafkaConsumer with a topic name *pattern*, and we have partition discovery enabled. When our product scales up, it adds new topics. When it scales down, it removes topics. The problem is that the FlinkKafkaConsumer never seems to forget partitions th

Re: Flink Native Kubernetes - Configuration kubernetes.flink.log.dir not working

2021-09-14 Thread Guowei Ma
Hi Maybe you could try the `kubectl describe pod -n ${namespace} ${podname}` to see what happened atm. Best, Guowei On Tue, Sep 14, 2021 at 2:58 PM bat man wrote: > Hello Guowei, > > The pods terminate almost within a second so am unable to pull any logs. > Is there any way I can pull the log

Re: JVM Metaspace capacity planning

2021-09-14 Thread Guowei Ma
Hi, Puneet In general every job has its own classloader. You could find more detailed information from doc [1]. You could put some common jar into the "/lib" to avoid this [2]. [1] https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/ops/debugging/debugging_classloading/ [2] https://ni

Re: Flink-Zookeeper Security

2021-09-14 Thread David Morávek
Hi Beata, you need to upgrade zookeeper to 3.5+ on Flink side [1] and set JVM properties the same way you'd do with vanilla ZK client. Please refer to the following thread [2] for more details. It would be great if you could provide feedback for future reference, whether this approach has worked f