Re: Properly using ConnectorDescriptor instead of registerTableSource

2020-05-18 Thread Timo Walther
Hi Nikola, the reason for deprecating `registerTableSource` is that we aim to have everything declarative in Table API. A table program should simply declare what it needs and the planner should find a suitable connector, regardless how the underlying class structure looks like. This might al

Re: "Fill in" notification messages based on event time watermark

2020-05-18 Thread Manas Kale
Hi Aljoscha, Thank you, that clarification helps. I am generating a new watermark in the getCurrentWatermark() method of my assigner, which causes the watermark to be actually updated every autoWatermark interval. I assumed that actual watermark updates were caused by only setAutoWatermark() method

Re: Flink Dataset job submission very slow

2020-05-18 Thread Arvid Heise
Do you have any logs that could help us identify the issue? How many files is a long date range? In general, you could try out the same program with the DataStream API (use StreamExecutionEnvironment#readFile [1] with PROCESS_ONCE to get a behavior equivalent to batch). DataStreams are only slight

Flink Dataset job submission very slow

2020-05-18 Thread ysnakie
I have many lzo files on HDFS in such path format: /logs/{id}/{date}/xxx[1-100].lzo/logs/a/ds=2018-01-01/xxx1.lzo/logs/b/ds=2018-01-01/xxx1.lzo.../logs/z/ds=2018-01-02/xxx1.lzo.../logs/z/ds=2020-05-01/xxx100.lzoI'am using Flink Dataset to read those files by a range of {date} and a

Re: Help with table-factory for SQL

2020-05-18 Thread Martin Frank Hansen
Hi Leonard, Thank you so much! It worked, I did get a new error but it is unrelated to this question. Den man. 18. maj 2020 kl. 15.21 skrev Leonard Xu : > More precisely: Should the sink table `sql-sink` missed required version > option. > > > 在 2020年5月18日,21:13,Leonard Xu 写道: > > Hi, > > Lo

Re: Export user metrics with Flink Prometheus endpoint

2020-05-18 Thread Eleanore Jin
Hi Aljoscha, Thanks a lot for the suggestion! Best, Eleanore On Mon, May 18, 2020 at 2:16 AM Aljoscha Krettek wrote: > Now I see what you mean. I think you would have to somehow set up the > Flink metrics system as a backend for opencensus. Then the metrics would > be reported to the same syst

Re: run flink on edge vs hub

2020-05-18 Thread Eleanore Jin
Hi Arvid, Thanks for the suggestion! I will tryout to see how it works. Best, Eleanore On Mon, May 18, 2020 at 8:04 AM Arvid Heise wrote: > Hi Eleanore, > > The question in general is what you understand under edge data centers as > the term is pretty fuzzy. Since Flink is running on Java, it'

Re: Is it possible to change 'connector.startup-mode' option in the flink job

2020-05-18 Thread Thomas Huang
Hi Jingsong, Cool, Thanks for your reply. Best wishes. From: Jingsong Li Sent: Tuesday, May 19, 2020 10:46 To: Thomas Huang Cc: Flink Subject: Re: Is it possible to change 'connector.startup-mode' option in the flink job Hi Thomas, Good to hear from you. Th

Re: Is it possible to change 'connector.startup-mode' option in the flink job

2020-05-18 Thread Jingsong Li
Hi Thomas, Good to hear from you. This is a very common problem. In 1.11, we have two FLIP to solve your problem. [1][2] You can take a look. I think dynamic table options (table hints) is enough for your requirement. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dyna

Is it possible to change 'connector.startup-mode' option in the flink job

2020-05-18 Thread Thomas Huang
Hi guys, I'm using hive to store kafka topic metadata as follows:: CREATE TABLE orders ( user_idBIGINT, productSTRING, order_time TIMESTAMP(3), WATERMARK FOR order_time AS order_time - '5' SECONDS ) WITH ( 'connector.type' = 'kafka',

Re: 回复:Re: Writing _SUCCESS Files (Streaming and Batch)

2020-05-18 Thread Guowei Ma
Hi Theo Sorry for the late reply and thanks for your detailed explanation. >From your description, I know 1. The Impala only handle the parquet file that it has been notified(scanned). 2. You could accept that only updating the state once per partition in your scenario. 3. There

Re: changing the output files names in Streamfilesink from part-00 to something else

2020-05-18 Thread Jingsong Li
Hi Rahul, Thanks for explaining. I see. Now there is no way to dynamic control file name in StreamingFileSink. If the number of organizations is not so huge. Like Sivaprasanna said, you can use "BucketAssigner" to create bucket by your organization ID. The bucket in StreamingFileSink is like Hive

Re: Process available data and stop with savepoint

2020-05-18 Thread Arvid Heise
I also previously had some low volume data sources that I wanted to process and I was always convinced that the proper solution would be to have auto-scaling and just decrease the used resources as much as possible (which is not trivial because of state rescaling). But thinking a bit further, it wo

Re: RocksDB savepoint recovery performance improvements

2020-05-18 Thread Joey Pereira
Thanks Yun for highlighting this, it's very helpful! I'll give it a go with that in mind. We have already begun using checkpoints for recovery. Having these improvements would still be immensely helpful to reduce downtime for savepoint recovery. On Mon, May 18, 2020 at 3:14 PM Yun Tang wrote: >

Re: RocksDB savepoint recovery performance improvements

2020-05-18 Thread Yun Tang
Hi Joey Previously, I also looked at the mechanism to create on-disk SSTables as I planed to use RocksDB's benchmark to mock scenario in Flink. However, I found the main challenge is how to ensure the keys are inserted in a strictly increasing order. The key order in java could differ from the

Re: Process available data and stop with savepoint

2020-05-18 Thread Sergii Mikhtoniuk
Thanks all for responding. To give a bit more context: - I'm building a tool that performs a *fully deterministic* stream processing of mission-critical data - all input data is in the form of an append-only event log (Parquet files) - users define streaming SQL transformations to do all kinds of

Flink 1.10 permanent JVM hang when stopped

2020-05-18 Thread Hunter Herman
Hi Flink users! TL;DR: My Flink taskmanagers frequently permanently hang in a shutdown handler’s Thread.sleep() call when I issue a stop. Hitting a wall trying to debug. https://issues.apache.org/jira/browse/FLINK-17470 I’m really scratching my head at this issue. On a particular environment i

Re: Using Queryable State within 1 job + docs suggestion

2020-05-18 Thread Yun Tang
Hi Annemarie First of all, I'm afraid Flink does not support to make window state as queryable currently. It was planed to support but haven't implemented as lack of continuous development in this area for Flink community. Secondly, I think the doc just want to tell user how to enable this feat

RocksDB savepoint recovery performance improvements

2020-05-18 Thread Joey Pereira
Hey, While running a Flink application with a large-state, savepoint recovery has been a painful part of operating the application because recovery time can be several hours. During some profiling that chohan (cc'd) had done, a red flag stood out — savepoint recovery consisted mostly of RocksDB Ge

Re: Rocksdb implementation

2020-05-18 Thread Yun Tang
Hi Jaswin As Arvid suggested, it's not encouraged to query the internal RocksDB directly. Apart from Arvid's solution, I think queryable state [1] might also help you. I think you just want to know the left entries in both of map state after several days and query the state should make the meet

Re: changing the output files names in Streamfilesink from part-00 to something else

2020-05-18 Thread dhurandar S
Hi Jingsong, We have a system where organizations keep getting added and removed on a regular basis, As the new organizations get added the data from these organization starts flowing into the streaming system, we do group by on Organisation ID which is part of the incoming event, If in the incomi

Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-18 Thread Zhijiang
Thanks Yu for the release manager and everyone involved in. Best, Zhijiang -- From:Arvid Heise Send Time:2020年5月18日(星期一) 23:17 To:Yangze Guo Cc:dev ; Apache Announce List ; user ; Yu Li ; user-zh Subject:Re: [ANNOUNCE] Apache Fli

Re: Process available data and stop with savepoint

2020-05-18 Thread Thomas Huang
Hi, Actually, seems like spark dynamic allocation saves more resources in that case. From: Arvid Heise Sent: Monday, May 18, 2020 11:15:09 PM To: Congxian Qiu Cc: Sergii Mikhtoniuk ; user Subject: Re: Process available data and stop with savepoint Hi Sergii,

Re: Rocksdb implementation

2020-05-18 Thread Arvid Heise
Hi Jaswin, I'd discourage using rocksdb directly. It's more of an implementation detail of Flink. I'd also discourage to write to Kafka directly without using our Kafka Sink, as you will receive duplicates upon recovery. If you run the KeyedCoProcessFunction continuously anyways, I'd add a timer

Re: Incremental state with purging

2020-05-18 Thread Thomas Huang
I’m wondering that why you use a beta feature for production. Why not push the latest state into down sink like redis or hbase with Apache phoenix . From: Annemarie Burger Sent: Monday, May 18, 2020 11:19:23 PM To: user@flink.apache.org Subject: Re: Incremental

Re: Incremental state with purging

2020-05-18 Thread Annemarie Burger
Hi, Thanks for your suggestions! However, as I'm reading the docs for queryable state, it says that it can only be used for Processing time, and my windows are defined using event time. So, I guess this means I should use the KeyedProcessFunction. Could you maybe suggest a rough implementation fo

Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-18 Thread Arvid Heise
Thank you very much! On Mon, May 18, 2020 at 8:28 AM Yangze Guo wrote: > Thanks Yu for the great job. Congrats everyone who made this release > possible. > Best, > Yangze Guo > > On Mon, May 18, 2020 at 10:57 AM Leonard Xu wrote: > > > > > > Thanks Yu for being the release manager, and everyone

Re: Process available data and stop with savepoint

2020-05-18 Thread Arvid Heise
Hi Sergii, your requirements feel a bit odd. It's neither batch nor streaming. Could you tell us why it's not possible to let the job run as a streaming job that runs continuously? Is it just a matter of saving costs? If so, you could monitor the number of records being processed and trigger stop

Using Queryable State within 1 job + docs suggestion

2020-05-18 Thread Annemarie Burger
Hi, I want to use Queryable State to communicate between PU's in the same Flink job. I'm aware this is not the intended use of Queryable State, but I was wondering if and how it could be done. More specifically, I want to query the (event-time) window state of one PU, from another PU, while both

Re: run flink on edge vs hub

2020-05-18 Thread Arvid Heise
Hi Eleanore, The question in general is what you understand under edge data centers as the term is pretty fuzzy. Since Flink is running on Java, it's not suitable for embedded clusters as of now. There is plenty of work done already to tests that Flink runs on ARM clusters [1]. If you just mean i

Re: the savepoint problem of upgrading job from blink-1.5 to flink-1.10

2020-05-18 Thread Arvid Heise
Hi Roc, just as an addition to Congxian, Flink 1.11 will be released soonish. We just started the release process (created release branch to freeze the features), which should include 2-4 weeks of testing/bug fixing. Of course, if you are interested, you could use the current master/release branch

Re: "Fill in" notification messages based on event time watermark

2020-05-18 Thread Aljoscha Krettek
I think there is some confusion in this thread between the auto watermark interval and the interval (length) of an event-time window. Maybe clearing that up for everyone helps. The auto watermark interval is the periodicity (in processing time) at which Flink asks the source (or a watermark ge

Re: Infer if a Table will create an AppendStream / RetractStream

2020-05-18 Thread Timo Walther
Hi Yuval, currently there is no API for getting those insights. I guess you need to use internal API for getting this information. Which planner and version are you using? Regards, Timo On 18.05.20 14:16, Yuval Itzchakov wrote: Hi, Is there any way to infer if a Table is going to generate

Re: Help with table-factory for SQL

2020-05-18 Thread Leonard Xu
More precisely: Should the sink table `sql-sink` missed required version option. > 在 2020年5月18日,21:13,Leonard Xu 写道: > > Hi, > > Look likes you missed two required parameters: version and topic[1], you need > to add them for both source table and sink table. > > .connect( > new Kafka()

Re: Help with table-factory for SQL

2020-05-18 Thread Leonard Xu
Hi, Look likes you missed two required parameters: version and topic[1], you need to add them for both source table and sink table. .connect( new Kafka() .version("0.11")// required: valid connector versions are // "0.8", "0.9", "0.10", "0.11", and "universal

Re: Flink BLOB server port exposed externally

2020-05-18 Thread Till Rohrmann
Hi Omar, I think in the next couple of weeks the Flink 1.11 release should be released. It mainly depends on the testing and bug fixing period as Xintong said. Cheers, Till On Mon, May 18, 2020 at 1:48 PM Xintong Song wrote: > 1.11.0 is feature freezing today. The final release date depends on

Re: Publishing Sink Task watermarks outside flink

2020-05-18 Thread Timo Walther
Hi Shubham, great that tweaking the JDBC sink helped. Maybe I don't fully understand your logic but: The watermark that you are receiving in an operator should already be the minimum of all subtasks. Because it is sent to all subsequent operators by the precedeing operator. So a watermark ca

Infer if a Table will create an AppendStream / RetractStream

2020-05-18 Thread Yuval Itzchakov
Hi, Is there any way to infer if a Table is going to generate an AppendStream or a RetractStream under the hood in order to figure out if I need to call `toAppendStream` vs `toRetractStream` for the DataStream conversion? Note this information is important to me further down the DAG so I want to p

[ANNOUNECE] release-1.11 branch cut

2020-05-18 Thread Piotr Nowojski
Hi community, I have cut the release-1.11 from the master branch based on bf5594cdde428be3521810cb3f44db0462db35df commit. If you will be merging something into the master branch, please make sure to set the correct fix version in the JIRA, accordingly to which branch have you merged your code. E

Re: Flink BLOB server port exposed externally

2020-05-18 Thread Xintong Song
1.11.0 is feature freezing today. The final release date depends on the progress of release testing / bug fixing. Thank you~ Xintong Song On Mon, May 18, 2020 at 6:36 PM Omar Gawi wrote: > Thanks Till! > Do you know what is 1.11.0 release date? > > > On Mon, May 18, 2020 at 12:49 PM Till Roh

Re: Memory growth from TimeWindows

2020-05-18 Thread Aljoscha Krettek
On 15.05.20 15:17, Slotterback, Chris wrote: My understanding is that while all these windows build their memory state, I can expect heap memory to grow for the 24 hour length of the SlidingEventTimeWindow, and then start to flatten as the t-24hr window frames expire and release back to the JV

Re: Watermarks and parallelism

2020-05-18 Thread Arvid Heise
Hi Gnanasoundari, Your use case is very typical and pretty much the main motivation for event time and watermarks. It's supported out of the box. I recommend reading again the first resource of Alex. To make it clear, let's have a small example: Source 1 -\ +--> Window --> Sink

Re: Testing process functions

2020-05-18 Thread Manas Kale
I see, I had not considered the serialization; that was the issue. Thank you. On Mon, May 18, 2020 at 12:29 PM Chesnay Schepler wrote: > We don't publish sources for test classes. > > Have you considered that the sink will be serialized on job submission, > meaning that your myTestSink instance

Re: Developing Beam applications using Flink checkpoints

2020-05-18 Thread Arvid Heise
Hi Ivan, First let's address the issue with idle partitions. The solution is to use a watermark assigner that also emits a watermark with some idle timeout [1]. Now the second question, on why Kafka commits are committed for in-flight, checkpointed data. The basic idea is that you are not losing

Re: Flink BLOB server port exposed externally

2020-05-18 Thread Omar Gawi
Thanks Till! Do you know what is 1.11.0 release date? On Mon, May 18, 2020 at 12:49 PM Till Rohrmann wrote: > Hi Omar, > > with FLINK-15154 [1] which will be released with the upcoming 1.11.0 > release, it will be possible to bind the Blob server to the hostname > specified via jobmanager.bind-

Re: Flink Key based watermarks with event time

2020-05-18 Thread Arvid Heise
Watermarks are maintained for each source separately, but need to be combined when you combine the different streams. If not, you would not be able to perform windows and any related join or aggregation. Having different timestamps in the sources is actually the normal case and the main motivation

Re: Flink suggestions;

2020-05-18 Thread Arvid Heise
Hi Aissa, Your use case is quite unusual. You usually have an alert in a dashboard (potentially sending an email), if any of the sensors show an error. You usually want to retain the original error code to be able to quickly identify the issue. Your use case would make sense if you want to filte

Re: Flink performance tuning on operators

2020-05-18 Thread Arvid Heise
Hi Ivan, Just to add up to chaining: When splitting the map into two parts, objects need to be copied from one operator to the chained operator. Since your objects are very heavy that can take quite long, especially if you don't have a specific serializer configured but rely on Kryo. You can avoi

Re: Flink BLOB server port exposed externally

2020-05-18 Thread Till Rohrmann
Hi Omar, with FLINK-15154 [1] which will be released with the upcoming 1.11.0 release, it will be possible to bind the Blob server to the hostname specified via jobmanager.bind-host. Per default it will still bind to the wildcard address but with this option you can bind it to localhost, for examp

Re: Flink Streaming Job Tuning help

2020-05-18 Thread Arvid Heise
Hi Senthil, since your records are so big, I recommend to take the time to evaluate some different serializers [1]. [1] https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html On Wed, May 13, 2020 at 5:40 PM Senthil Kumar wrote: > Zhijiang, > > > > Thanks for your sugges

Re: Windowed Stream Queryable State Support

2020-05-18 Thread Annemarie Burger
Hi, I was wondering that since it is possible to "query the state of an in-flight window", if it is also possible to make sure we query *every* window at the proper time. So how to access in flight window state of a window of a PU from another PU with Queryable State. I want to query the window st

Re: Export user metrics with Flink Prometheus endpoint

2020-05-18 Thread Aljoscha Krettek
Now I see what you mean. I think you would have to somehow set up the Flink metrics system as a backend for opencensus. Then the metrics would be reported to the same system (prometheus) in this case. In Opencensus lingo, this would mean using a Flink-based Stats Exporter instead of the Prometh

Re: classpath of flink1.10.0 on yarn with Hadoop3.2.1

2020-05-18 Thread Robert Metzger
Hey, If you want to pass your Hadoop jars using the HADOOP_CLASSPATH environment variable, you need to remove the "flink-shaded-hadoop.jar" from the classpath by deleting the "lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar" file. This jar file contains all Hadoop classes as well. That's also the rea

classpath of flink1.10.0 on yarn with Hadoop3.2.1

2020-05-18 Thread ??????
Hi, I'm using flink1.10.0 and Hadoop3.2.1. My flink cluster is deployed on yarn. According to Flink-Hadoop-Integration Doc, I set up HADOOP_CLASSPATH environment variable on each cluster machine. From the beginning, everythins is ok, I can read from/write to HDFS/S3. And then, I want to try

Re: How to read UUID out of a JDBC table source

2020-05-18 Thread Dawid Wysakowicz
Hi Dario, What version of Flink are you using? Are you implementing your own TableSource or do you use the JdbcTableSource that comes with Flink? Which planner do you use blink or the old one? Unfortunately the rework of Table API type system is not finished yet, therefore there are still rough e

Re: Rocksdb implementation

2020-05-18 Thread Jaswin Shah
/** * Alipay.com Inc. * Copyright (c) 2004-2020 All Rights Reserved. */ package com.paytm.reconsys.functions.processfunctions; import com.paytm.reconsys.Constants; import com.paytm.reconsys.configs.ConfigurationsManager; import com.paytm.reconsys.enums.DescripancyTypeEnum; import com.paytm.reco

Rocksdb implementation

2020-05-18 Thread Jaswin Shah
Hi, I have implemented the flink job with MapStates. The functionality is like, 1. I have two datastreams which I connect with connect operator and then call coprocessfunction with every pair of objects. 2. For element of first datastream, processElement1 method is called and for an elemen

Re: Using flink-connector-kafka-1.9.1 with flink-core-1.7.2

2020-05-18 Thread Arvid Heise
Hi Nick, yes, you can be lucky that no involved classes have changed (much), but there is no guarantee. You could try to fiddle around and add the respective class ( *ClosureCleanerLevel)* from Flink 1.9 in your jar, but it's hacky at best. Another option is to bundle Flink 1.9 with your code if

[ANNOUNCE] Weekly Community Update 2020/20

2020-05-18 Thread Konstantin Knauf
Dear community, happy to share this week's community update. With feature freeze for Flink 1.11 today, there are not a lot of feature-related discussions right now on Apache Flink development mailing list. Still I can share some news on Apache Flink 1.11 and 1.10.1, Flink Forward Global and a disc

Re: The order of Retract Record

2020-05-18 Thread Benchao Li
Yes. The retract message will be generated first, then the new result. lec ssmi 于2020年5月18日周一 下午3:00写道: > Hi: > When encountering Retract, there was a following sql : > *select count(1) count, word group by word* > > Suppose the current aggregation result is : >* 'hello'->3* > When

The order of Retract Record

2020-05-18 Thread lec ssmi
Hi: When encountering Retract, there was a following sql : *select count(1) count, word group by word* Suppose the current aggregation result is : * 'hello'->3* When there is record to come again, the count of 'hello' will be changed to 4. The following two records will be generated i

Re: Testing process functions

2020-05-18 Thread Chesnay Schepler
We don't publish sources for test classes. Have you considered that the sink will be serialized on job submission, meaning that your myTestSink instance is not the one actually used by the job? This typically means that have to store stuff in a static field instead. Alternatively, depending on