Re: multiple joins in one job

2020-05-05 Thread Fabian Hueske
You can in fact forward both time attributes because Flink makes sure that the watermark is automatically adjusted to the "slower" of both input streams. You can run the following queries in the SQL CLI client (here taken an example from a Flink SQL training [1] Flink SQL> CREATE VIEW ridesWithFa

??????MongoDB sink;

2020-05-05 Thread myflink
my solution: First, Flink sinks data to Kafka; Second, MongoDB reads data from Kafka. Over. --  -- ??: "Aissa Elaffani"

MongoDB sink;

2020-05-05 Thread Aissa Elaffani
Hello , I want to sink my data to MongoDB but as far as I know there is no sink connector to MongoDB. How can I implement a MongoDB sink ? If there is any other solutions, I hope you can share with me.

Re: Flink: For terabytes of keyed state.

2020-05-05 Thread Gowri Sundaram
Hi Congxian, Thank you so much for your response! We will go ahead and do a POC to test out how Flink performs at scale. Regards, - Gowri On Wed, May 6, 2020 at 8:34 AM Congxian Qiu wrote: > Hi > > From my experience, you should care the state size for a single task(not > the whole job state si

Autoscaling vs backpressure

2020-05-05 Thread Manish G
Hi, As flink doesn't provide out-of-box support for autoscaling, can backpressure be considered as an alternative to it? Autoscaling allows us to add/remove nodes as load goes up/down. With backpressure, if load goes up system would signal upstream to release data slowly. So we don't need to add

What is the RocksDB local directory in flink checkpointing?

2020-05-05 Thread LakeShen
Hi community, Now I have a question about flink checkpoint local directory , our flink version is 1.6, job mode is flink on yarn per job . I saw the flink source code , and I find the flink checkpoint local directory is /tmp when you didn't config the "state.backend.rocksdb.localdir". But I go i

Re: Flink: For terabytes of keyed state.

2020-05-05 Thread Congxian Qiu
Hi >From my experience, you should care the state size for a single task(not the whole job state size), the download speed for single thread is almost 100 MB/s (this may vary in different env), and I do not have much performance for loading state into RocksDB(we use an internal KV store in my comp

Re: multiple joins in one job

2020-05-05 Thread Benchao Li
Yes. The watermark will be propagated correctly, which is the min of two inputs. lec ssmi 于2020年5月6日周三 上午9:46写道: > Even if the time attribute field is retained, will the related watermark > be retained? > If not, and there is no sql syntax to declare watermark again, it is > equivalent to not b

Re: multiple joins in one job

2020-05-05 Thread lec ssmi
Even if the time attribute field is retained, will the related watermark be retained? If not, and there is no sql syntax to declare watermark again, it is equivalent to not being able to do multiple joins in one job. Benchao Li 于2020年5月5日周二 下午9:23写道: > You cannot select more than one time attri

Re: Writing _SUCCESS Files (Streaming and Batch)

2020-05-05 Thread Jingsong Li
Hi Peter, The troublesome is how to know the "ending" for a bucket in streaming job. In 1.11, we are trying to implement a watermark-related bucket ending mechanism[1] in Table/SQL. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table Best, Jingsong Lee

Re: Cannot start native K8s

2020-05-05 Thread Yang Wang
Hi Dongwon Kim, I think it is a known issue. The native kubernetes integration could not work with jdk 8u252 due to okhttp issue[1]. Currently, you could upgrade your jdk to a new version to work around. [1]. https://issues.apache.org/jira/browse/FLINK-17416 Dongwon Kim 于2020年5月6日周三 上午7:15写道:

Export user metrics with Flink Prometheus endpoint

2020-05-05 Thread Eleanore Jin
Hi all, I just wonder is it possible to use Flink Metrics endpoint to allow Prometheus to scrape user defined metrics? Context: In addition to Flink metrics, we also collect some application level metrics using opencensus. And we run opencensus agent as side car in kubernetes pod to collect metri

Casting from org.apache.flink.api.java.tuple.Tuple2 to scala.Product; using java and scala in flink job

2020-05-05 Thread Nick Bendtner
Hi guys, In our flink job we use java source for deserializing a message from kafka using a kafka deserializer. Signature is as below. public class CustomAvroDeserializationSchema implements KafkaDeserializationSchema> The other parts of the streaming job are in scala. When data has to b

MongoDB as a Sink;

2020-05-05 Thread Aissa Elaffani
Hello Guys, I am looking for some help concerning my flink sink, i want te output to be stocked in MongoDB database. As far as I know, there is no sink conector for MongoDB, and I need to implement one by my self, and i don't know how to do that. Can you please help me in this ?

Cannot start native K8s

2020-05-05 Thread Dongwon Kim
Hi, I'm using Flink-1.10 and tested everything [1] successfully. While trying [2], I got the following message. Can anyone help please? [root@DAC-E04-W06 bin]# ./kubernetes-session.sh > 2020-05-06 08:10:49,411 INFO > org.apache.flink.configuration.GlobalConfiguration- Loading > confi

Flink pipeline;

2020-05-05 Thread Aissa Elaffani
Hello Guys, I am new to the real-time streaming field, and I am trying to build a BIG DATA architecture for processing real-time streaming. I have some sensors that generate data in json format, they are sent to Apache kafka cluster then i want to consume them with Apache flinkin ordre to do some a

Overriding hadoop core-site.xml keys using the flink-fs-hadoop-shaded assemblies

2020-05-05 Thread Jeff Henrikson
Has anyone had success overriding hadoop core-site.xml keys using the flink-fs-hadoop-shaded assemblies? If so, what versions were known to work? Using btrace, I am seeing a bug in the hadoop shaded dependencies distributed with 1.10.0. Some (but not all) of the core-site.xml keys cannot be

Flink - Hadoop Connectivity - Unable to read file

2020-05-05 Thread Samik Mukherjee
Hi All, I am trying to get some file from HDFS which is locally installed. But I am not able to. I tried with both these ways. But all the time the program is ending with "Process finished with exit code 239." Any help will be helpful- public class Processor { public static void main(String[]

Re: table.show() in Flink

2020-05-05 Thread Fabian Hueske
There's also the Table API approach if you want to avoid typing a "full" SQL query: Table t = tEnv.from("myTable"); Cheers, Fabian Am Di., 5. Mai 2020 um 16:34 Uhr schrieb Őrhidi Mátyás < matyas.orh...@gmail.com>: > Thanks guys for the prompt answers! > > On Tue, May 5, 2020 at 2:49 PM Kurt You

Re: table.show() in Flink

2020-05-05 Thread Őrhidi Mátyás
Thanks guys for the prompt answers! On Tue, May 5, 2020 at 2:49 PM Kurt Young wrote: > A more straightforward way after FLIP-84 would be: > TableResult result = tEnv.executeSql("select xxx ..."); > result.print(); > > And if you are using 1.10 now, you can use TableUtils#collectToList(table) > t

Re: multiple joins in one job

2020-05-05 Thread Benchao Li
You cannot select more than one time attribute, the planner will give you an Exception if you did that. lec ssmi 于2020年5月5日周二 下午8:34写道: > As you said, if I select all the time attribute fields from > both , which will be the final one? > > Benchao Li 于 2020年5月5日周二 17:26写道: > >

Re: Restore from savepoint with Iterations

2020-05-05 Thread ashish pok
Let me see if I can do artificial throttle somewhere. Volume of data is really high and hence trying to avoid rounds in Kafka too. Looks like options are “not so elegant” until FLIP-15. Thanks for pointers again!!! On Monday, May 4, 2020, 11:06 PM, Ken Krugler wrote: Hi Ashish, The workaroun

Re: table.show() in Flink

2020-05-05 Thread Kurt Young
A more straightforward way after FLIP-84 would be: TableResult result = tEnv.executeSql("select xxx ..."); result.print(); And if you are using 1.10 now, you can use TableUtils#collectToList(table) to collect the result to a list, and then print rows by yourself. Best, Kurt On Tue, May 5, 2020

Re: table.show() in Flink

2020-05-05 Thread Jark Wu
Hi Matyas, AFAIK, currently, this is the recommended way to print result of table. In FLIP-84 [1] , which is targeted to 1.11, we will introduce some new APIs to do the fluent printing like this. Table table2 = tEnv.sqlQuery("select yy ..."); TableResult result2 = table2.execute(); result2.print(

Re: Reading from sockets using dataset api

2020-05-05 Thread Arvid Heise
Hi Kaan, explicitly mapping to physical nodes is currently not supported and would need some workarounds. I have readded user mailing list (please always also include it in response); maybe someone can help you with that. Best, Arvid On Thu, Apr 30, 2020 at 10:12 AM Kaan Sancak wrote: > One q

Re: multiple joins in one job

2020-05-05 Thread lec ssmi
As you said, if I select all the time attribute fields from both , which will be the final one? Benchao Li 于 2020年5月5日周二 17:26写道: > Hi lec, > > You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you > just select the time attribute field > from one of the inp

table.show() in Flink

2020-05-05 Thread Őrhidi Mátyás
Dear Flink Community, I'm missing Spark's table.show() method in Flink. I'm using the following alternative at the moment: Table results = tableEnv.sqlQuery("SELECT * FROM my_table"); tableEnv.toAppendStream(results, Row.class).print(); Is it the recommended way to print the content of a table?

Flink on Kubernetes unable to Recover from failure

2020-05-05 Thread Morgan Geldenhuys
Community, I am currently doing some fault tolerance testing for Flink (1.10) running on Kubernetes (1.18) and am encountering an error where after a running job experiences a failure, the job fails completely. A Flink session cluster has been created according to the documentation containe

Re: ML/DL via Flink

2020-05-05 Thread m@xi
Hello Becket, I just watched your Flink Forward talk. Really interesting! I leave the link here as it is related to the post. AI Flow (FF20) Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: multiple joins in one job

2020-05-05 Thread Benchao Li
Hi lec, You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you just select the time attribute field from one of the input, then it will be time attribute automatically. lec ssmi 于2020年5月5日周二 下午4:42写道: > But I have not found there is any syntax to specify time > a

Re: Autoscaling flink application

2020-05-05 Thread Manish G
Thanks. It would help. On Tue, May 5, 2020 at 2:12 PM David Anderson wrote: > There's no explicit, out-of-the-box support for autoscaling, but it can be > done. > > Autoscaling came up a few times at the recent Virtual Flink Forward, > including a talk on Autoscaling Flink at Netflix [1] by Timo

Re: Autoscaling flink application

2020-05-05 Thread David Anderson
There's no explicit, out-of-the-box support for autoscaling, but it can be done. Autoscaling came up a few times at the recent Virtual Flink Forward, including a talk on Autoscaling Flink at Netflix [1] by Timothy Farkas. [1] https://www.youtube.com/watch?v=NV0jvA5ZDNc Regards, David On Mon,

Re: multiple joins in one job

2020-05-05 Thread lec ssmi
But I have not found there is any syntax to specify time attribute field and watermark again with pure sql. Fabian Hueske 于 2020年5月5日周二 15:47写道: > Sure, you can write a SQL query with multiple interval joins that preserve > event-time attributes and watermarks. > There's no ne

Re: multiple joins in one job

2020-05-05 Thread Fabian Hueske
Sure, you can write a SQL query with multiple interval joins that preserve event-time attributes and watermarks. There's no need to feed data back to Kafka just to inject it again to assign new watermarks. Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi : > I mean using pure sql statement to ma