Late data acquisition

2020-04-28 Thread lec ssmi
Hi: can we get data later than watermark in sql ? Best Lec Ssmi

Re: ML/DL via Flink

2020-04-28 Thread Becket Qin
Hi Max, Thanks for the question and sharing your findings. To be honest, I was not aware some of the projects until I see your list. First, to answer you questions: > (i) Has anyone used them? While I am not sure about the number of users of every listed project, Alink is definitely used by Al

Re: Configuring taskmanager.memory.task.off-heap.size in Flink 1.10

2020-04-28 Thread Xintong Song
Hi Jiahui, 'taskmanager.memory.task.off-heap.size' accounts for the off-heap memory reserved for your job / operators. There are other configuration options accounting for the off-heap memory usages for other purposes, e.g., 'taskmanager.memory.framework.off-heap'. The default 'task.off-heap.size'

Re: upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-04-28 Thread Yang Wang
Hi Anuj, I think the exception you come across still because the hadoop version is 2.4.1. I have checked the hadoop code, the code line are exactly same. For 2.8.1, i also have checked the ruleParse. It could work. /** * A pattern for parsing a auth_to_local rule. */ private static final Patter

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-28 Thread Yang Wang
Hi Averell, Hadoop could directly support S3AFileSystem. When you deploy a Flink job on YARN, the hadoop classpath will be added to JobManager/TaskManager automatically. That means you could use "s3a" schema without putting "flink-s3-fs-hadoop.jar" in the plugin directory. In K8s deployment, we d

Configuring taskmanager.memory.task.off-heap.size in Flink 1.10

2020-04-28 Thread Jiahui Jiang
Hello! We are migrating our pipeline from Flink 1.8 to 1.10 in Kubernetes. In the first try, we simply copied the old 'taskmanager.heap.size' over to 'taskmanager.memory.flink.size'. This caused the cluster to OOM. Eventually we had to allocate a small amount of memory to 'taskmanager.memory.tas

Re: join state TTL

2020-04-28 Thread LakeShen
Thank you for the clarification. Jark Jark Wu 于2020年4月29日周三 上午10:39写道: > If 'uu' in stream A is not updated for more than 24 hours, then it will be > cleared. (blink planner) > The state expiration strategy is "not updated for more than x time". > > Best, > Jark > > On Wed, 29 Apr 2020 at 10:19

Re: join state TTL

2020-04-28 Thread Jark Wu
If 'uu' in stream A is not updated for more than 24 hours, then it will be cleared. (blink planner) The state expiration strategy is "not updated for more than x time". Best, Jark On Wed, 29 Apr 2020 at 10:19, LakeShen wrote: > Hi Jark, > > I am a little confused about how double stream joini

Re: Blink window and cube

2020-04-28 Thread 刘建刚
Thank you. I create an issue: https://issues.apache.org/jira/browse/FLINK-17446 > 2020年4月28日 下午7:57,Jark Wu-3 [via Apache Flink User Mailing List archive.] > 写道: > > Thanks for reporting this. I think this is a missing feature. We need

Re: Publishing Sink Task watermarks outside flink

2020-04-28 Thread Shubham Kumar
Hi Timo, Yeah, I got the idea of getting access to timers through process function and had the same result which you explained that is a side output doesn't guarantee that the data is written out to sink. (so maybe Fabian in that post pointed out something else which I am missing). If I am correct

Re: Possible Bug

2020-04-28 Thread Robert Metzger
Hey, thanks a lot for filing a ticket! I put a link into SO. It might take a few days till there's a response on the ticket. On Tue, Apr 28, 2020 at 10:42 PM Marie May wrote: > Thanks for responding Robert. I reported the issue here: > > issues.apache.org/jira/browse/FLINK-17444 > > I do not ha

Possible Bug

2020-04-28 Thread Marie May
Hello I am running into the same issue this person posted here: https://stackoverflow.com/questions/61246683/flink-streamingfilesink-on-azure-blob-storage I see no one has answered so I thought maybe I could report it as a bug but the site said to mail here first if unsure its a bug or not first.

Re: RocksDB default logging configuration

2020-04-28 Thread Bajaj, Abhinav
Thanks Yun for your response. It seems creating the RocksDBStateBackend from the job requires providing the checkpoint URL whereas the savepoint url seems to default to “state.savepoints.dir” of the flink-conf.yaml. I was expecting similar behavior to create the RocksDBStateBackend without pro

Presentation - Real World Architectural Patterns with Apache Pulsar and Flink

2020-04-28 Thread Devin Bost
If anyone missed my presentation on Real-World Architectural Patterns with Apache Pulsar that covers a use case involving Apache *Flink* for distributed tracing, please check out the recording here: https://youtu.be/pmaCG1SHAW8 Devin G. Bost

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Gyula Fóra
Hi Timo, Row will work definitely work at this point for sure, thank you for helping out. I opened a jira ticket: https://issues.apache.org/jira/browse/FLINK-17442 Gyula On Tue, Apr 28, 2020 at 6:48 PM Timo Walther wrote: > Hi Gyula, > > does `toAppendStream(Row.class)` work for you? The othe

Re: ML/DL via Flink

2020-04-28 Thread Timo Walther
Hi Max, as far as I know a better ML story for Flink is in the making. I will loop in Becket in CC who may give you more information. Regards, Timo On 28.04.20 07:20, m@xi wrote: Hello Flinkers, I am building a *streaming* prototype system on top of Flink and I want ideally to enable ML tra

Re: Publishing Sink Task watermarks outside flink

2020-04-28 Thread Timo Walther
Hi Shubham, you can call stream.process(...). The context of ProcessFunction gives you access to TimerService which let's you access the current watermark. I'm assuming your are using the Table API? As far as I remember, watermark are travelling through the stream even if there is no time-ba

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Timo Walther
Hi Gyula, does `toAppendStream(Row.class)` work for you? The other methods take TypeInformation and might cause this problem. It is definitely a bug. Feel free to open an issue under: https://issues.apache.org/jira/browse/FLINK-12251 Regards, Timo On 28.04.20 18:44, Gyula Fóra wrote: Hi Tim

Re: "Fill in" notification messages based on event time watermark

2020-04-28 Thread Timo Walther
Hi Manas, Reg. 1: I would recommend to use a debugger in your IDE and check which watermarks are travelling through your operators. Reg. 2: All event-time operations are only performed once the watermark arrived from all parallel instances. So roughly speaking, in machine time you can assume

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Gyula Fóra
Hi Timo, I am trying to convert simply back to a DataStream. Let's say: DataStream> I can convert the DataStream into a table without a problem, the problem is getting a DataStream back. Thanks Gyula On Tue, Apr 28, 2020 at 6:32 PM Timo Walther wrote: > Hi Gyula, > > are you coming from Data

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Timo Walther
Hi Gyula, are you coming from DataStream API or are you trying to implement a source/sink? It looks like the array is currently serialized with Kryo. I would recommend to take a look at this class: org.apache.flink.table.types.utils.LegacyTypeInfoDataTypeConverter This is the current mapping

Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Gyula Fóra
Hi All! I have a Table with columns of ARRAY and ARRAY, is there any way to convert it back to the respective java arrays? String[] and Integer[] It only seems to work for primitive types (non null), date, time and decimal. For String for instance I get the following error: Query schema: [f0: AR

Re: "Fill in" notification messages based on event time watermark

2020-04-28 Thread Manas Kale
Hi David and Piotrek, Thank you both for your inputs. I tried an implementation with the algorithm Piotrek suggested and David's example. Although notifications are being generated with the watermark, subsequent transition events are being received after the watermark has crossed their timestamps.

Re: Blink window and cube

2020-04-28 Thread Jark Wu
Thanks for reporting this. I think this is a missing feature. We need to do something in the optimizer to make this possible. Could you please help to create a JIRA issue for this? Best, Jark On Tue, 28 Apr 2020 at 14:55, 刘建刚 wrote: > Hi, I find that blink planner supports CUBE. CUBE can

Re: join state TTL

2020-04-28 Thread Jark Wu
Hi Lec, StateTtlConfig in DataStream API is a configuration on specific state, not a job level configuration. TableConfig#setIdleStateRetentionTime in TableAPI&SQL is a job level configuration which will enable state ttl for all non-time-based operator states. In blink planner, the underlying of T

Re: Flink Forward 2020 Recorded Sessions

2020-04-28 Thread Sivaprasanna
Awesome, thanks for the update! On Tue, Apr 28, 2020 at 3:43 PM Marta Paes Moreira wrote: > Hi again, > > You can find the first wave of recordings on Youtube already [1]. The > remainder will come over the course of the next few weeks. > > [1] > https://www.youtube.com/playlist?list=PLDX4T_cnKj

Publishing Sink Task watermarks outside flink

2020-04-28 Thread Shubham Kumar
Hi everyone, I have a flink application having kafka sources which calculates some stats based on it and pushes it to JDBC. Now, I want to know till what timestamp is the data completely pushed in JDBC (i.e. no more data will be pushed to timestamp smaller or equal than this). There doesn't seem t

Re: Flink Forward 2020 Recorded Sessions

2020-04-28 Thread Marta Paes Moreira
Hi again, You can find the first wave of recordings on Youtube already [1]. The remainder will come over the course of the next few weeks. [1] https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7 On Fri, Apr 24, 2020 at 3:23 PM Sivaprasanna wrote: > Cool. Thanks for the inf

Re: Issue

2020-04-28 Thread Till Rohrmann
Hi Pavan, please post these kind of questions to the user ML. I've cross linked it now. Image attachments will be filtered out. Consequently, we cannot see what you have posted. Moreover, it would be good if you could provide the community with a bit more details what the custom way is and what y