Re:Re: Questions of "State Processing API in Scala"

2020-08-31 Thread izual
I tried to fix the small mistake of sample code in State-Processor-API doc[1], could someone do a doc review[2] for me, thank you. 1: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/state_processor_api.html#keyed-state 2: https://github.com/apache/flink/pull/13266 At 2020-0

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-31 Thread Averell
Hello Robert, Arvid, As I am running on EMR, and currently AWS only supports version 1.10. I tried both solutions that you suggested ((i) copying a SAXParser implementation to the plugins folder and (ii) using the S3FS Plugin from 1.10.1), and both worked - I could have successful checkpoints. Ho

Re: Exception on s3 committer

2020-08-31 Thread Ivan Yang
Hi Yun, Thank you so much for you suggestion. (1) The job couldn’t restore from the last checkpoint. The exception is in my original email. (2) No, I didn’t change any multipart upload settings. (3) The file is gone. I have another batch process that reads Flink output s3 bucket and pushes obj

Re: Debezium Flink EMR

2020-08-31 Thread Marta Paes Moreira
Hey, Rex! This is likely due to the tombstone records that Debezium produces for DELETE operations (i.e. a record with the same key as the deleted row and a value of null). These are markers for Kafka to indicate that log compaction can remove all records for the given key, and the initial impleme

Re: Debezium Flink EMR

2020-08-31 Thread Rex Fenley
Hi, getting so close but ran into another issue: Flink successfully reads changes from Debezium/Kafka and writes them to Elasticsearch, but there's a problem with deletions. When I DELETE a row from MySQL the deletion makes it successfully all the way to Elasticsearch which is great, but then the

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-31 Thread Austin Cawley-Edwards
Hey Arvid, Yes, I was able to self-answer this one. Was just confused on the non-deterministic behavior of the FULL OUTER join statement. Thinking through it and took a harder read through the Dynamic Tables doc section[1] where "Result Updating" is hinted at, and the behavior makes total sense in

Re: Debezium Flink EMR

2020-08-31 Thread Rex Fenley
Ah, my bad, thanks for pointing that out Arvid! On Mon, Aug 31, 2020 at 12:00 PM Arvid Heise wrote: > Hi Rex, > > you still forgot > > 'debezium-json.schema-include' = true > > Please reread my mail. > > > On Mon, Aug 31, 2020 at 7:55 PM Rex Fenley wrote: > >> Thanks for the input, though I've

Re: Debezium Flink EMR

2020-08-31 Thread Arvid Heise
Hi Rex, you still forgot 'debezium-json.schema-include' = true Please reread my mail. On Mon, Aug 31, 2020 at 7:55 PM Rex Fenley wrote: > Thanks for the input, though I've certainly included a schema as is > reflected earlier in this thread. Including here again > ... > tableEnv.executeSql("

Re: Debezium Flink EMR

2020-08-31 Thread Rex Fenley
Thanks for the input, though I've certainly included a schema as is reflected earlier in this thread. Including here again ... tableEnv.executeSql(""" CREATE TABLE topic_addresses ( -- schema is totally the same to the MySQL "addresses" table id INT, customer_id INT, street STRING, city STRING, sta

Editing Rowtime for SQL Table

2020-08-31 Thread Satyam Shekhar
Hello, I use Flink for continuous evaluation of SQL queries on streaming data. One of the use cases requires us to run recursive SQL queries. I am unable to find a way to edit rowtime time attribute of the intermediate result table. For example, let's assume that there is a table T0 with schema -

Re: FileSystemHaServices and BlobStore

2020-08-31 Thread Khachatryan Roman
+ dev Blob store is used for jars, serialized job, and task information and logs. You can find some information at https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture I guess in your setup, Flink was able to pick up local files. HA setup presumes that

[ANNOUNCE] Weekly Community Update 2020/35

2020-08-31 Thread Konstantin Knauf
Dear community, happy to share a brief community update for the past week with configurable memory sharing between Flink and its Python "side car", stateful Python UDFs, an introduction of our GSoD participants and a little bit more. Flink Development == * [datastream api] Dawid has

Re: Packaging multiple Flink jobs from a single IntelliJ project

2020-08-31 Thread Manas Kale
Guess I figured out a solution for the first question as well - I am packaging multiple main() classes in the same JAR and specifying entrypoint classes when submitting the JAR. Most of my issues stemmed from an improperly configured POM file and a mismatch in Flink runtime versions. I'll assume th

Re: runtime memory management

2020-08-31 Thread Xintong Song
Well, that's a long story. In general, there are 2 steps. 1. *Which operators are deployed in the same slot?* Operators are first *chained*[1] together, then a *slot sharing strategy*[2] is applied by default. 2. *Which task managers are slots allocated from?* 1. For active deplo

Re: Security vulnerabilities of dependencies in Flink 1.11.1

2020-08-31 Thread Arvid Heise
Hi Shravan, we periodically bump version numbers, especially for major releases and basic dependencies such as netty. However, running a simple scan over dependencies is not that useful without also checking whether the reported issues are actually triggered by code. For example, we are not using

Re: Idle stream does not advance watermark in connected stream

2020-08-31 Thread Dawid Wysakowicz
Hey Arvid, The problem is that the StreamStatus.IDLE is set on the Task level. It is not propagated to the operator. Combining of the Watermark for a TwoInputStreamOperator happens in the AbstractStreamOperator:     public void processWatermark(Watermark mark) throws Exception {         if (timeS

Re: Issues with Flink Batch and Hadoop dependency

2020-08-31 Thread Arvid Heise
Hi Dan, Your approach in general is good. You might want to use the bundled hadoop uber jar [1] to save some time if you find the appropriate version. You can also build your own version and include it then in lib/. In general, I'd recommend moving away from sequence files. As soon as you change

Re: Debezium Flink EMR

2020-08-31 Thread Arvid Heise
Hi Rex, the connector expects a value without a schema, but the message contains a schema. You can tell Flink that the schema is included as written in the documentation [1]. CREATE TABLE topic_products ( -- schema is totally the same to the MySQL "products" table id BIGINT, name STRING,

Re: Flink Migration

2020-08-31 Thread Arvid Heise
Hi Navneeth, if everything worked before and you just experience later issues, it would be interesting to know if your state size grew over time. An application over time usually needs gradually more resources. If the user base of your company grows, so does the amount of messages (be it click str

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-31 Thread Arvid Heise
Hi Austin, Do I assume correctly, that you self-answered your question? If not, could you please update your current progress? Best, Arvid On Thu, Aug 27, 2020 at 11:41 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Ah, I think the "Result Updating" is what got me -- INNER joins

Re: Idle stream does not advance watermark in connected stream

2020-08-31 Thread Arvid Heise
Hi Aljoscha, I don't quite follow your analysis. If both sources are configured with idleness, they should send a periodic watermark on timeout. So the code that you posted would receive watermarks on the idle source and thus advance watermarks periodically. If an idle source does not emit a wate

Re: Implementation of setBufferTimeout(timeoutMillis)

2020-08-31 Thread Pankaj Chand
Thank you so much, Yun! It is exactly what I needed. On Mon, Aug 31, 2020 at 1:50 AM Yun Gao wrote: > Hi Pankaj, > > I think it should be in > org.apache.flink.runtime.io.network.api.writer.RecordWriter$OutputFlusher. > > Best, > Yun > > > > -

Re: runtime memory management

2020-08-31 Thread lec ssmi
Thanks. When the program starts, how is each operator allocated in taskmanager? For example, if I have 2 taskmanagers and 10 operators, 9 operators are allocated to tm-A and the remaining one is placed in tm-B, the utilization of resources will be very low. Xintong Song 于2020年8月31日周一 下午2:45写道:

Security vulnerabilities of dependencies in Flink 1.11.1

2020-08-31 Thread shravan
issues.docx Hello, We are using Apache Flink 1.11.1 version and our security scans report the following issues. Please let us know your comments on these security vulnerabilities and fix plans for th