Re: Multiple select queries in single job on flink table API

2021-04-21 Thread tbud
/"TableResult result1 = stmtSet.execute(); result1.print();"/ I tried this, and the result is following : Job has been submitted with JobID 4803aa5edc31b3ddc884f922008c5c03 +++ | default_catalog.default_databas

Debezium CDC | OOM

2021-04-21 Thread Ayush Chauhan
Hi, I am using flink cdc to stream CDC changes in an iceberg table. When I first run the flink job for a topic which has all the data for a table, it get out of heap memory as flink try to load all the data during my 15mins checkpointing interval. Right now, only solution I have is to pass *-ytm 81

Re: Flink Statefun Python Batch

2021-04-21 Thread Timothy Bess
Hi Igal and Konstantin, Wow! I appreciate the offer of creating a branch to test with, but for now we were able to get it working by tuning a few configs and moving other blocking IO out of statefun, so no rush there. That said if you do add that, I'd definitely switch over. That's great! I'll tr

MemoryStateBackend Issue

2021-04-21 Thread Milind Vaidya
Hi I see MemoryStateBackend being used in TM Log org.apache.flink.streaming.runtime.tasks.StreamTask - No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchron

Re: java.io.StreamCorruptedException: unexpected block data

2021-04-21 Thread Jingsong Li
Hi Alokh, Maybe this is related to https://issues.apache.org/jira/browse/FLINK-20241 We can improve `SerializableConfiguration` to throw better exceptions. So the true reason may be "ClassNotFoundException" Can you check your dependencies? Like Hadoop related dependencies? Best, Jingsong On F

event-time window cannot become earlier than the current watermark by merging

2021-04-21 Thread Vishal Santoshi
Hey folks, I had a pipe with sessionization restarts and then fail after retries with this exception. The only thing I had done was to increase the lateness by 12 hours ( to a day ) in this pipe and restart from SP and it ran for 12 hours plus without issue. I cannot imagine that i

Kubernetes Setup - JM as job vs JM as deployment

2021-04-21 Thread Gil Amsalem
Hi, I found that there are 2 different approaches to setup Flink over kubernetes. 1. Deploy job manager as Job. 2. Deploy job manager as Deployment. What is the recommended way? What are the benefits of each? Thanks, Gil Amsalem

Long to Timestamp(3) Conversion

2021-04-21 Thread Aeden Jameson
I've probably overlooked something simple, but when converting a datastream to a table how does one convert a long to timestamp(3) that will not be your event or proc time. I've tried tEnv.createTemporaryView( "myTable" ,myDatastream ,

Re: Flink Savepoint fault tolerance

2021-04-21 Thread Arvid Heise
Just to add. You can also change parallelism from checkpoints (it's usually much faster than using savepoints). For that, you want to use external checkpoints that are retained after job completion. But savepoints are the way to go for any topology changes, version updates, etc. On Wed, Apr 21, 2

Re: Flink Event specific window

2021-04-21 Thread Arvid Heise
Hi Sunitha, the approach you are describing sounds like you want to use a session window. [1] If you only want to count them if they happen at the same hour then, you want to use a tumbling window. Your datastream approach looks solid. For SQL, there is also a session (and tumbling) window [2].

Re: Are configs stored as part of savepoints

2021-04-21 Thread Arvid Heise
Just to make it explicit: no the configuration is not stored. Only maxParallelism and the state backend choices are implicitly stored. Thus, you can use the same savepoint to perform some A/B testing based on configuration. On Tue, Apr 20, 2021 at 6:51 PM Austin Cawley-Edwards < austin.caw...@gma

Re: Flink support for Kafka versions

2021-04-21 Thread Arvid Heise
I'm wondering if we could shade scala 1.13 dependencies inside the Kafka connector? Then we would be independent of the rather big FLINK-20845. On Tue, Apr 20, 2021 at 5:54 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi Prasanna, > > It looks like the Kafka 2.5.0 connector upgrad

Re: How to config the flink to load libs in myself path

2021-04-21 Thread Arvid Heise
Hi, I can't offer you a solution for your problem but I'd like to emphasize that connectors are most of the time put into the user jar. A connector should be a couple of MB and not cause too many issues. On Tue, Apr 20, 2021 at 4:02 PM cxydevelop wrote: > For example, now I had my custom table

Re: Task Local Recovery with mountable disks in the cloud

2021-04-21 Thread Stephan Ewen
/cc dev@flink On Tue, Apr 20, 2021 at 1:29 AM Sonam Mandal wrote: > Hello, > > We've been experimenting with Task-local recovery using Kubernetes. We > have a way to specify mounting the same disk across Task Manager > restarts/deletions for when the pods get recreated. In this scenario, we > n

Re: Correctly serializing "Number" as state in ProcessFunction

2021-04-21 Thread Arvid Heise
Hi Miguel, as Klemens said this is a rather general problem independent of Flink: How do you map Polymorphism in serialization? Flink doesn't have an answer on its own, as it's discouraged (A Number can have arbitrary many subclasses: how do you distinguish them except by classname? That adds a t

Re: How can I demarcate which event elements are the boundaries of a window?

2021-04-21 Thread Arvid Heise
Hi Marco, It basically works like this for windows: - For any incoming record, calculate the respective window based on the event timestamp (ts). Let's assume a tumbling window for now, then we calculate by ts / window size (simplified). - This means that at any given time, there could be an arbit

Re: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-21 Thread Arvid Heise
Hi Andreas, I'd check where the exception occurs (not clear from what you posted) and double-check that the part of the system can access the given path deploy-keys/rest.keystore. The brute-force solution is to manually copy the files onto all worker nodes on the respective directory + potential

Re: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-21 Thread Nico Kruber
Hi Andreas, judging from [1], it should work if you refer to it via security.ssl.rest.keystore: ./deploy-keys/rest.keystore security.ssl.rest.truststore: ./deploy-keys/rest.truststore Nico [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-KAFKA-KEYTAB-Kafkaconsumer-

Re: Interesting article about correctness and latency

2021-04-21 Thread Arvid Heise
Just for future reference, there has been a correction concerning Flink [1]. [1] https://scattered-thoughts.net/writing/internal-consistency-in-streaming-systems/#updates On Sun, Apr 18, 2021 at 11:11 PM Benoît Paris < benoit.pa...@centraliens-lille.org> wrote: > Hi all! > > I read this very int

Re: Multiple select queries in single job on flink table API

2021-04-21 Thread Arvid Heise
Hi tbud, you still have two executes; it should only be one. Can you try the following instead of using outputTable1? TableResult result1 = stmtSet.execute(); result1.print(); On Sun, Apr 18, 2021 at 12:05 AM tbud wrote: > I have tried that too For example : > > /tableEnv.createTemporaryView("

Re: idleTimeMsPerSecond exceeds 1000

2021-04-21 Thread Alexey Trenikhun
Great. Thank ypu From: Chesnay Schepler Sent: Wednesday, April 21, 2021 1:02:59 AM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: idleTimeMsPerSecond exceeds 1000 This ticket seems related; the issue was fixed in 1.13: https://issues.apache.org/jira/

Re: Flink Savepoint fault tolerance

2021-04-21 Thread dhanesh arole
Hi Arvid, Thanks for taking time to answer this. Yeah, we are also using save points as only restore mechanism If job parallelism needs to be changed or some job graph properties need to be updated. Otherwise during other rolling deployments of task manager pods or job manager pods we solely rely

Re: Flink Savepoint fault tolerance

2021-04-21 Thread Arvid Heise
Hi Dhanesh, We recommend to use savepoints only for migrations, investigations, A/B testing, and time travel and rely completely on checkpoints for fault tolerance. Are you using it differently? Currently, we are triggering savepoints using REST apis. And query the > status of savepoint by the re

Re: Flink 1.11 FlinkKafkaConsumer not propagating watermarks

2021-04-21 Thread Arvid Heise
For reference: self answered on [1]. Turns out that Flink 1.12 defaults the TimeCharacteristic to EventTime and > deprecates the whole TimeCharacteristic flow. So to downgrade to Flink > 1.11, you must add the following statement to configure the > StreamExecutionEnvironment. > > env.setStreamTime

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-21 Thread Austin Cawley-Edwards
Great to hear! Austin On Wed, Apr 21, 2021 at 6:19 AM Sambaran wrote: > Hi Austin, > > Many thanks, we indeed were using the Api incorrectly. Now in local tests > we can see the data population happened in the postgres. > > Have a nice day! > > Regards > Sambaran > > On Tue, Apr 20, 2021 at 8:1

Re: Flink: Not able to sink a stream into csv

2021-04-21 Thread Yik San Chan
Hi Dian, Thanks for your help, again! Best, Yik San On Wed, Apr 21, 2021 at 8:39 PM Dian Fu wrote: > Hi Yik San, > > You need to set the rolling policy for filesystem. You could refer to the > Rolling Policy section [1] for more details. > > Actually there are output and you could execute comm

Re: Flink: Not able to sink a stream into csv

2021-04-21 Thread Dian Fu
Hi Yik San, You need to set the rolling policy for filesystem. You could refer to the Rolling Policy section [1] for more details. Actually there are output and you could execute command `ls -la /tmp/output/`, then you will see several files named “.part-xxx”. For your job, you need to set the

Flink: Not able to sink a stream into csv

2021-04-21 Thread Yik San Chan
The question is cross posted on Stack Overflow https://stackoverflow.com/questions/67195207/flink-not-able-to-sink-a-stream-into-csv . I am trying to sink a stream into filesystem in csv format using PyFlink, however it does not work. ```python # stream_to_csv.py from pyflink.table import Environ

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-21 Thread Sambaran
Hi Austin, Many thanks, we indeed were using the Api incorrectly. Now in local tests we can see the data population happened in the postgres. Have a nice day! Regards Sambaran On Tue, Apr 20, 2021 at 8:11 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi Sambaran, > > I'm not sur

Re: Flink Statefun Python Batch

2021-04-21 Thread Konstantin Knauf
Hi Igal, Hi Timothy, this sounds very interesting. Both state introspection as well as OpenTracing support have been requested by multiple users before, so certainly something we are willing to invest into. Timothy, would you have time for a 30min call in the next days to understand your use case

Re: idleTimeMsPerSecond exceeds 1000

2021-04-21 Thread Chesnay Schepler
This ticket seems related; the issue was fixed in 1.13: https://issues.apache.org/jira/browse/FLINK-19174 On 4/21/2021 4:20 AM, Alexey Trenikhun wrote: Hello, When Flink job mostly idle, idleTimeMsPerSecond for given task_name and subtask_index sometimes exceeds 1000, I saw values up to 1350,

Re: Flink Statefun Python Batch

2021-04-21 Thread Igal Shilman
Hi Tim, Yes, I think that this feature can be implemented relatively fast. If this blocks you at the moment, I can prepare a branch for you to experiment with, in the following days. Regarding to open tracing integration, I think the community can benefit a lot out of this, and definitely contrib