Re: [DISCUSS] Semantic and implementation of per-job mode

2019-11-04 Thread tison
Hi Peter, I checked out you proposal FLIP-85 and think that we are in the very similar direction. For any reason in your proposal we can create PackagedProgram in server side and thus if we can configure environment properly we can directly invoke main method. In addition to your design document

Using RocksDB as lookup source in Flink

2019-11-04 Thread srikanth flink
Hi there, Can someone help me implement Flink source Kafka to Flink Sink RocksDB, while I could use UDF for lookup RocksDB in SQL queries? Context: I get a list of IPaddresses in a stream which I wish to store in RocksDB. Therefore the other stream perform a lookup to match the IPaddress. Thank

Re: Using RocksDB as lookup source in Flink

2019-11-04 Thread OpenInx
Hi The Kafka table source & sink connector has been implemented (at least flink1.9 support this), but the RocksDB connector not support yet, you may need to implement it by yourself. Here[1] we have a brief wiki to show what interfaces we need to implement, but seems it's not detailed enough per

is Flink a database ?

2019-11-04 Thread Hanan Yehudai
This seems like a controversial subject.. on purpose 😊 I have my data lake in parquet files – should I use Flink batch mode to query historical batch ad Hoc queries ? or should I use a dedicated “database” eg Drill / Dremio / Hiveand their likes ? what advantage will Flink give me f

Re: is Flink a database ?

2019-11-04 Thread Piotr Nowojski
Hi :) What do you mean by “a database”? A SQL like query engine? Flink is already that [1]. A place where you store the data? Flink kind of is that as well [2] and many users are using Flink as the source of truth, not just as a data processing framework. With Flink Table API/SQL [1], you can

Re: low performance in running queries

2019-11-04 Thread Habib Mostafaei
Hi, On 11/1/2019 4:40 PM, Piotr Nowojski wrote: Hi, More important would be the code profiling output. I think VisualVM allows to share the code profiling result as “snapshots”? If you could analyse or share this, it would be helpful. Enclosed is a snapshot of VisualVM. From the attached s

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-04 Thread Becket Qin
Hi Steven and Thomas, Sorry about missing the update of FLIP-27. I am working on the implementation of FLIP-27 at this point. It is about 70% done. Right now I am integrating the source coordinator to the job master. Hopefully I can get the basics of Kafka connector work from end to end by this S

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-04 Thread Thomas Weise
Hi Becket, Thanks for the reply, it is good to know that there is activity on FLIP-27. A while ago I was wondering if event time alignment is on the radar [1], can you please clarify that? There is a parallel discussion of adding it to the existing Kafka consumer [2], could you please take a loo

Finite source without blocking save-points

2019-11-04 Thread Gaël Renoux
Hello everyone, I have a job which runs continuously, but it also needs to send a single specific Kafka message on startup. I tried the obvious approach to use StreamExecutionEnvironment.fromElements and add a Kafka sink, however that's not possible: the source being finished, it becomes impossibl

Re: Finite source without blocking save-points

2019-11-04 Thread bupt_ljy
Hi Gael, I had a similar situation before. Actually you don’t need to accomplish this in such a complicated way. I guess you’ve already had a rules source and you can send rules in #open function for a startup if your rules source inherit from #RichParallelSourceFunction. Best, Jiayi Liao O

Re: Finite source without blocking save-points

2019-11-04 Thread Gaël Renoux
Hi Jiayi, This would allow me to call the Kafka producer without risking a race condition, but it comes with its own problem: unless the source has a parallelism of 1, it will trigger multiple times. I can create a specific source that doesn't produce anything, has a parallelism of 1, and calls th

Re: Finite source without blocking save-points

2019-11-04 Thread bupt_ljy
Oh the parallelism problem didn’t bother me because we used to set the parallelism of rule source to be one :o). Maybe a more elegant way is hashing the rule emitting by #RuntimeContext#getIndexOfThisSubtask. Best, Jiayi Liao Original Message Sender: Gaël Renoux Recipient: bupt_ljy Cc: user

Re: Using RocksDB as lookup source in Flink

2019-11-04 Thread Yun Tang
Hi Srikanth As RocksDB is a single node DB which just like InfluxDB, I recommend you could refer to an implementation of InfluxDB sink. [1] [1] https://github.com/apache/bahir-flink/tree/master/flink-connector-influxdb Best Yun Tang From: OpenInx Date: Monday, November 4, 2019 at 6:28 PM To:

Ordered events in broadcast state

2019-11-04 Thread Filip Niksic
Hi all, The documentation for the broadcast state explicitly says that the order of broadcast events may differ across tasks, so the state updates should not depend on a particular order. [1] But what to do in the use cases where the order matters? Is there some way to enforce the order even at th

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-04 Thread Becket Qin
Hi Thomas, Event time alignment is absolutely one of the important considerations of FLIP-27. That said, we are not implementing that in FLIP-27, but just make sure such feature can be easily added in the future. The design was to make the communication between SplitEnumerator and SourceReader qui

Checkpoint in FlinkSQL

2019-11-04 Thread Simon Su
Hi All Does current Flink support to set checkpoint properties while using Flink SQL ? For example, statebackend choices, checkpoint interval and so on ... Thanks, SImon

Re: Checkpoint in FlinkSQL

2019-11-04 Thread Yun Tang
Hi Simon If you are using table API, you could set state backend via environment like `env.setStateBackend()` If you just launch a cluster with SQL-client, you could configure state backend and checkpoint options [1] within `flink-conf.yaml` before launching the cluster . [1] https://ci.apac

Re: Checkpoint in FlinkSQL

2019-11-04 Thread vino yang
Hi Simon, Absolutely, yes. Before using Flink SQL, you need to initialize a StreamExecutionEnvirnoment instance[1], then call StreamExecutionEnvirnoment#setStateBackend or StreamExecutionEnvirnoment#enableCheckpointing to specify the information what you want. [1]: https://ci.apache.org/projects/

从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread wangl...@geekplus.com.cn
从 RocketMQ 中消费数据做处理。 代码中最大的并行度为 8, 提交任务时指定 -ys 4 ,会自动分配 2 个 container 作为 taskMgr 运行一段时间后以 savepoint 方式停止。 再从 savepoint 恢复,此时指定 -ys 2 , 会分配 4 个container 作为 taskMgr , 但任务提交后程序不从 RocketMQ 消费数据了,消费 TPS 一直是 0,这是什么原因呢? 谢谢, 王磊 wangl...@geekplus.com.cn

Re: 从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread Yun Tang
Hi 首先先判断作业是否在不断地failover,是否有“maximum parallelism” 相关的异常,如果有,说明因为改了并发度而不兼容,实际作业一直都没有从checkpoint正常恢复。 如果作业成功地从checkpoint恢复了,再判断是不是因为task端正在因为正在改并发而导致恢复数据中,如果你的state比较大,这一步骤可能会比较耗时,一般这种情况是source端消费了数据,但是无法向下游发送,整个作业看上去像是一直卡在那边。可以通过task端的jstak看调用栈,看是否有restore相关的栈hang住。 如果以上都不是,那请自行jstack看一下source和下

Fwd: 从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread Biao Liu
Hi, this topic should be sent to user-zh mailing list. Just forward there. Thanks, Biao /'bɪ.aʊ/ -- Forwarded message - From: Yun Tang Date: Tue, 5 Nov 2019 at 13:20 Subject: Re: 从 state 中恢复数据,更改 yarn container 个数会有影响吗 To: wangl...@geekplus.com.cn , user < user@flink.apache.org

Flink SQL GroupBy Excption

2019-11-04 Thread Polarisary
Hi ALL, I have a problem when use flink sql, my code like this: ``` tEnv.registerDataStream(“mytable", actionStream, "uid, device_id, rowtime.rowtime”); ``` actionStream is kafka consumer,but this can not run,Exception as follow: ``` org.apache.flink.client.program.ProgramIn

Re: Flink SQL GroupBy Excption

2019-11-04 Thread Terry Wang
Hi, Polarisary~ The reason should be that `uid, device_id` can not be automatically derived from the type of your kafka actionStremt, you should check it and make sure actionStream returns suitable type. Best, Terry Wang > 2019年11月5日 15:11,Polarisary 写道: > > Hi ALL, > I have a problem when

Re: low performance in running queries

2019-11-04 Thread Piotr Nowojski
Hi, Unfortunately your VisualVM snapshot doesn’t contain the profiler output. It should look like this [1]. > Checking the timeline of execution shows that the source operation is done in > less than a second while Map and Reduce operations take long running time. It could well be that the ove