Kafka dynamic topic for Sink in SQL

2021-05-20 Thread Benoît Paris
Hi all! I'm looking for a way to write to different Kafka topics based on some column value in SQL. I think it's possible with Java, using KafkaSerializationSchema, and ProducerRecord(topic, ...), but I was wondering if I could somewhat access that feature in SQL. I'm also trying to evaluate the

Interesting article about correctness and latency

2021-04-18 Thread Benoît Paris
Hi all! I read this very interesting and refreshing article today, about correctness and (vs?) latency in streaming systems. I thought I'd share it. https://scattered-thoughts.net/writing/internal-consistency-in-streaming-systems/ With some comments on Hacker News: https://news.ycombinator.com/

Re: Writing retract streams to Kafka

2020-03-05 Thread Benoît Paris
wing error: >>> >>> "AppendStreamTableSink requires that Table has only insert changes." >>> >>> If I understand correctly the problem here is that since updates are >>> possible within a single group, we have a retract stream and the Kafka Sink >>> cannot handle that. I tried to search for the solution but I haven't found >>> any satisfying answers. >>> >>> How can I simply tell the INSERT logic to ignore previous values and >>> just always keep sending the latest (like you would see it on the CLI >>> output). >>> >>> Thank you! >>> Gyula >>> >> -- Benoît Paris Ingénieur Machine Learning Explicable Tél : +33 6 60 74 23 00 http://benoit.paris http://explicable.ml

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-25 Thread Benoît Paris
1], > which restarts only the tasks connected by pipelined edges instead of the > whole job graph. > > Thank you~ > > Xintong Song > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/task_failure_recovery.html#restart-pipelined-region-failover-str

Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-24 Thread Benoît Paris
Hello all! I have a setup composed of several streaming pipelines. These have different deployment lifecycles: I want to be able to modify and redeploy the topology of one while the other is still up. I am thus putting them in different jobs. The problem is I have a Co-Location constraint between

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-27 Thread Benoît Paris
on the intermediate state is on the roadmap" I > mentioned is referring to integrate Table API & SQL with Queryable State. > We also have an early issue FLINK-6968 to tracks this. > > Best, > Jark > > > On Fri, 24 Jan 2020 at 00:26, Benoît Paris < > benoit.pa...@

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-23 Thread Benoît Paris
teViews.java#L45 >>>> >>>> >>>> 2020年1月21日 18:01,kant kodali 写道: >>>> >>>> Hi All, >>>> >>>> If I run a query like this >>>> >>>> StreamTableEnvironment.sqlQuery("select * from table1 join table2 on >>>> table1.col1 = table2.col1") >>>> >>>> 1) Where will flink store the intermediate result? Imagine >>>> flink-conf.yaml says state.backend = 'rocksdb' >>>> >>>> 2) If the intermediate results are stored in rockdb then what is the >>>> key and value in this case(given the query above)? >>>> >>>> 3) What is the best way to query these intermediate results from an >>>> external application? while the job is running and while the job is not >>>> running? >>>> >>>> Thanks! >>>> >>>> >>>> -- Benoît Paris Ingénieur Machine Learning Explicable Tél : +33 6 60 74 23 00 http://benoit.paris http://explicable.ml

TableSource being duplicated

2020-01-22 Thread Benoît Paris
Hello all! I'm having a problem with TableSources' DataStream being duplicated when pulled on from 2 sinks. I understand that sometimes the best plan might just be to duplicate and read both times a TableSource/SourceFunction; but in my case I can't quite reproduce the data as say Kafka would. I

Re: Implementing a tick service

2020-01-21 Thread Benoît Paris
de the onTimer method? On Mon, Jan 20, 2020 at 7:04 PM Benoît Paris < benoit.pa...@centraliens-lille.org> wrote: > Hello all :)! > > I'm having trouble creating a tick service. > > Goal: register a TableSource that emits a Row roughly every 200ms in > processing t

Implementing a tick service

2020-01-20 Thread Benoît Paris
Hello all :)! I'm having trouble creating a tick service. Goal: register a TableSource that emits a Row roughly every 200ms in processing time. The Row would contain only one column "counter" that is incremented by 1 each Row. Current attempt: Using TimerService A TableSource with public DataSt

Re: Controlling the Materialization of JOIN updates

2020-01-05 Thread Benoît Paris
T JOIN DimensionAtJoinTime1 d1 ON e.uid > = d1.uid). It's possible that > when a record with uid=x comes from Event table, but the dimension table > doesn't have any data around > uid=x yet due to the window aggregation. In this case, you don't want them > to join? >

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-03 Thread Benoît Paris
Changing the default planner for the whole Table API & SQL is another >>> topic >>> >> and is out of scope of this discussion. >>> >> >>> >> What do you think? >>> >> >>> >> Best, >>> >> Jark >>> >> >>> >> [1]: >>> >> >>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table >>> >> [2]: >>> >> >>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#top-n >>> >> [3]: >>> >> >>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication >>> >> [4]: >>> >> >>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tuning/streaming_aggregation_optimization.html >>> >> [5]: >>> >> >>> https://github.com/apache/flink/blob/master/flink-table/flink-sql-client/conf/sql-client-defaults.yaml#L100 >>> > >>> >>> >> >> -- >> Best, Jingsong Lee >> > > > -- > Best Regards > > Jeff Zhang > -- Benoît Paris Ingénieur Machine Learning Explicable Tél : +33 6 60 74 23 00 http://benoit.paris http://explicable.ml

Controlling the Materialization of JOIN updates

2020-01-02 Thread Benoît Paris
Hello all! I'm trying to design a stream pipeline, and have trouble controlling when a JOIN is triggering an update: Setup: - The Event table; "probe side", "query side", the result of earlier stream processing - The DimensionAtJoinTimeX tables; of updating nature, "build side", the

Re: Flink ML feature

2019-12-10 Thread Benoît Paris
ed in v1.9. It looks like ML feature >>> in Flink going to be enhanced. >>> >>> What is the recommended approach for implementing production grade ML >>> based apps using Flink? v1.9 is ok?or should wait for 1.10? >>> >>> Thanks, >>> Chandu >>> >> -- Benoît Paris Ingénieur Machine Learning Explicable Tél : +33 6 60 74 23 00 http://benoit.paris http://explicable.ml

Re: Joining multiple temporal tables

2019-12-06 Thread Benoît Paris
la:937) >>> at scala.collection.Iterator.foreach$(Iterator.scala:937) >>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) >>> at scala.collection.IterableLike.foreach(IterableLike.scala:70) >>> at scala.collection.IterableLike.forea

Re: 【flink sql】table in subquery join temporal table raise java.lang.NullPointerException

2019-10-14 Thread Benoît Paris
; > @Override > public DataStream getDataStream(StreamExecutionEnvironment > execEnv) { > return execEnv.addSource(new SourceFunction() { > > @Override > public void run(SourceContext ctx) throws Exception { > Random random = new Random(); > > while (true) { > > Row row = new Row(3); > row.setField(0, "Euro" + random.nextLong() % 3); > row.setField(1, random.nextLong() % 200 ); > row.setField(2, new > java.sql.Timestamp(System.currentTimeMillis())); > ctx.collect(row); > Thread.sleep(100); > } > > } > > @Override > public void cancel() { > System.out.println("cancelling > --"); > > } > }, getReturnType()); > } > > @Override > public String getProctimeAttribute() { > return fieldNames[2]; > } > } > > } > > -- Benoît Paris Ingénieur Machine Learning Explicable Tél : +33 6 60 74 23 00 http://benoit.paris http://explicable.ml

Re: Random forest - Flink ML

2019-03-12 Thread Benoît Paris
There has been some developments at Apache SAMOA for a forest of decision trees. This is not regular Random Forest, but a form of trees that can be incrementally learned fast. If I recall correctly they also have adaptive algorithms as well. Here are some r