Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-27 Thread Till Rohrmann
Thanks Dian for being our release manager and thanks to everyone who helped making this release possible. Cheers, Till On Mon, Apr 27, 2020 at 3:26 AM Jingsong Li wrote: > Thanks Dian for managing this release! > > Best, > Jingsong Lee > > On Sun, Apr 26, 2020 at 7:17 PM Jark Wu wrote: > >> Th

Re: Task Assignment

2020-04-27 Thread Marta Paes Moreira
Sorry — I didn't understand you were dealing with multiple keys. In that case, I'd recommend you read about key-group assignment [1] and check the KeyGroupRangeAssignment class [2]. Key-groups are assigned to parallel tasks as ranges before the job is started — this is also a well-defined behavio

RE: Fault tolerance in Flink file Sink

2020-04-27 Thread Eyal Pe'er
Hi Dawid, Thanks for the very detailed answer and the correct assumptions (I am using row format). I tried not using NFS/S3, but seems like it is the only option I have. Best regards Eyal Peer From: Dawid Wysakowicz Sent: Friday, April 24, 2020 4:20 PM To: Eyal Pe'er ; user Subject: Re: Fault to

Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-27 Thread Zhijiang
Thanks Dian for the release work and thanks everyone involved. Best, Zhijiang -- From:Till Rohrmann Send Time:2020 Apr. 27 (Mon.) 15:13 To:Jingsong Li Cc:dev ; Leonard Xu ; Benchao Li ; Konstantin Knauf ; jincheng sun ; Hequn C

Re: Fault tolerance in Flink file Sink

2020-04-27 Thread Kostas Kloudas
Hi Eyal and Dawid, @Eyal I think Dawid explained pretty well what is happening and why in distributed settings, the underlying FS on which the StreamingFileSink writes has to be durable and accessible to all parallel instances of the job. Please let us know if you have any further questions. Chee

How to disable the state behind `COUNT` sql?

2020-04-27 Thread izual
I implements my DimTable by extends `LookupTableSource`[1], which stores data like: id=1 -> (SH, BJ, SD) id=2 -> (...) and then extends `TableFunction` to return the value corresponding to the lookup keys,and maybe return multi rows, for example, when lookupkeys is id=1, then in the `TableFun

Re: How to disable the state behind `COUNT` sql?

2020-04-27 Thread Benchao Li
Hi, There is indeed a state for the aggregation result, however we cannot disable it, it's by design. StreamQueryConfig.maxIdleStateRetentionTime can control how long the state will be kept. If you can ensure the time gap between two records of the same id larger than, for example 1 min, then sett

Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Chesnay Schepler
overview.json is a generated file that is placed in the local directory controlled by /historyserver.web.tmpdir/. Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?) On 24/04/2020 18:24, Hailu, Andreas wr

"Fill in" notification messages based on event time watermark

2020-04-27 Thread Manas Kale
Hi, I have an upstream operator that outputs device state transition messages with event timestamps. Meaning it only emits output when a transition takes place. For example, state1 @ 1 PM state2 @ 2 PM and so on. *Using a downstream operator, I want to emit notification messages as per some config

Cannot map nested Tuple fields to table columns

2020-04-27 Thread Gyula Fóra
Hi All! I was trying to flatten a nested tuple into named columns with the fromDataStream method and I hit some problems with mapping tuple fields to column names. It seems like the `f0 as ColumnName` kind of expressions are not parsed correctly. It is very easy to reproduce: tableEnv.fromDataSt

Flink Lookup Filter Pushdown

2020-04-27 Thread forideal
Hello, my friend. I have a dimension table. createtabledim_u_score(u_idbigint,varchar,score_adouble,score_bdouble)with{xxx}Inascene The condition of lookup is fliter score_a > 0.9 In another scenario The condition of lookup is fliter score_b > 1 In Flink, at present, lookup join can use on to

Re: Streaming Job eventually begins failing during checkpointing

2020-04-27 Thread Yu Li
Sorry, just noticed this thread... @Stephan I cannot remember the discussion but I think it's an interesting topic, will find some time to consider it (unregister states). @Eleanore Glad to know that Beam community has fixed it and thanks for the reference. Best Regards, Yu On Sun, 26 Apr 2020

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Leonard Xu
Hi, gyula.fora If you’re trying convert Table from a Tuple DataStream, Alias the filed by `as` expression is no supported yet, because all fields are referenced by position in this point. You can simply alias like following syntax: ``` tableEnv.fromDataStream(env.fromElements(Tuple2.of("a", 1))

Updating Closure Variables

2020-04-27 Thread Senthil Kumar
Hello Flink Community! We have a flink streaming application with a particular use case where a closure variable Set is used in a filter function. Currently, the variable is set at startup time. It’s populated from an S3 location, where several files exist (we consume the one with the last upd

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Gyula Fóra
Hi Leonard, The tuple fields can also be referenced as their POJO names (f0, f1), they can be reordered similar to pojo fields, however you cannot alias them. (If you look at the link I have sent that shows how it is supposed to work but it throws an exception when I try it) Also what I am trying

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Hailu, Andreas
My machine's /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn't set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historys

Re: Flink 1.9.2 why always checkpoint expired

2020-04-27 Thread Congxian Qiu
Hi The image is not very clear. For RocksDBStateBackend, do you enable incremental checkpoint? Currently, checkpoint on TM side contains some steps: 1 barrier align 2 sync snapshot 3 async snapshot For expired checkpoint, could you please check the tasks in the first operator of the DAG to find o

Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Chesnay Schepler
If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine. What are the contents of /local/scratch/flink_historyserver_tmpdir? I assume there are already archives in HDFS? On 27/04/2020 16:02, Hailu, Andreas wrote: My machine’s /tmp directory is not large enou

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Dawid Wysakowicz
Hi Gyula, I think you are hitting a bug with the naming/aliasing of the fields of a Tuple. The bug is in the org.apache.flink.table.typeutils.FieldInfoUtils#isReferenceByPosition method. As it does not work correctly for aliases. Would you mind creating an issue for it? You should be able to alia

Re:Re: How to disable the state behind `COUNT` sql?

2020-04-27 Thread izual
Thanks, Benchao. Maybe change the dimension table will work, but this changes a lot, include `size/count` is not the column of one dim table. I notice that user can define Aggregate Functions[1], but this page also said: > Accumulators are automatically backup-ed by Flink’s checkpointing mechan

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Gyula Fóra
Hi Dawid, Thanks for the clarification on this issue and I agree that there is too much going on with these conversions already. What do you mean by "Unfortunately you can not reorder the fields that way." ? I can reorder POJO fields even after aliasing and also tuple fields (f1, f0) so I assume

Re: Re: How to disable the state behind `COUNT` sql?

2020-04-27 Thread Benchao Li
Hi izual, IMO, implementing your own COUNT/SUM UDAF doesn't solve the problem. The state is not managed in UDAF, it's managed by aggregation operator, and your UDAF's aggregator will be handled by operator using state. izual 于2020年4月27日周一 下午11:21写道: > Thanks, Benchao. > > Maybe change the dimen

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Dawid Wysakowicz
What I meant by "Unfortunately you can not reorder the fields that way." is that    tableEnv.fromDataStream(input, “name, age, height"); uses the so-called referenceByPosition mode. It will name the f0 field -> name, the f1 -> age and f2 -> height. If it wasn't for the bug you could reorder and

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Hailu, Andreas
bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/ total 8 drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9 drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76 There are jus

Re: Updating Closure Variables

2020-04-27 Thread Yun Gao
Hi Senthil, I think you are right that you cannot update closure variables directly and expect them to show up at the workers. If the variable values are read from S3 files, I think currently you will need to define a source explicitly to read the latest value of the file. Wh

Re: upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-04-27 Thread aj
Hello Yang, My Hadoop version is Hadoop 3.2.1-amzn-0 and I have put in flink/lib. flink-shaded-hadoop-2-uber-2.8.3-10.0.jar then I am getting below error : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/appcache/application_15879

Re: Task Assignment

2020-04-27 Thread Piotr Nowojski
Hi Navneeth, But what’s the problem with using `keyBy(…)`? If you have a set of keys that you want to process together, in other words they are are basically equal from the `keyBy(…)` perspective, why can’t you use this in your `KeySelector`? Maybe to make it clear, you can think about this in

Re: "Fill in" notification messages based on event time watermark

2020-04-27 Thread Piotr Nowojski
Hi, I’m not sure, but I don’t think there is an existing window that would do exactly what you want. I would suggest to go back to the `keyedProcessFunction` (or a custom operator?), and have a MapState currentStates field. Your key would be for example a timestamp of the beginning of your win

Re: "Fill in" notification messages based on event time watermark

2020-04-27 Thread David Anderson
Following up on Piotr's outline, there's an example in the documentation of how to use a KeyedProcessFunction to implement an event-time tumbling window [1]. Perhaps that can help you get started. Regards, David [1] https://ci.apache.org/projects/flink/flink-docs-master/tutorials/event_driven.htm

Re: RocksDB default logging configuration

2020-04-27 Thread Bajaj, Abhinav
It seems requiring the checkpoint URL to create the RocksDBStateBackend mixes up the operational aspects of cluster within the job. RocksDBStateBackend stateBackend = new RocksDBStateBackend(“CHECKPOINT_URL”, true); stateBackend.setDbStoragePath(“DB_STORAGE_PATH”); Also, noticed that the RocksDB

Re: Flink 1.9.2 why always checkpoint expired

2020-04-27 Thread qq
Hi Jiayi Liao. Thanks your replying. Add attachment . And can’t get any useful messages; > 2020年4月27日 12:40,Jiayi Liao 写道: > > <粘贴的图形-1.tiff>

Re: Flink Lookup Filter Pushdown

2020-04-27 Thread Jark Wu
Hi forideal, Currently, dimension table join requires at least one join key. That is the `u_id` in your example. The join key will be used as lookup key. If you have some additional filters on dimension table, that's fine, Flink will help to filter the fetched data. That means Flink supports follo

Re: Re: How to disable the state behind `COUNT` sql?

2020-04-27 Thread Jark Wu
Hi izual, In such case, I think you should try COUNT DISTINCT instead of COUNT. DISTINCT will help to deduplicate, so no matter how many times you received id=1, the region count should always 3. SELECT tblEvent.id, COUNT(DISTINCT tblDim.region) FROM tblEvent JOIN tblDim FOR SYSTEM AS OF tblEvent

Re:Re: Re: How to disable the state behind `COUNT` sql?

2020-04-27 Thread izual
Thank you, Jark. I also have tried COUNT DISTINCT ^_^, the only problem is that if the `tblEvent` generates two simple id, such as: t1: {"id": 1} t2: {"id": 1} But the sql will only output one result only on t1 record received. I think maybe some optimizer worker background when the result doe

Re: RocksDB default logging configuration

2020-04-27 Thread Yun Tang
Hi Bajaj Current "state.checkpoints.dir" defines cluster-wide location for cluster and each job would create the specific checkpoint location under it with job-id sub-directory. It is the same for the checkpoint URL in RocksDB. And the configuration option "state.backend.rocksdb.localdir" [1] s

Re: Re: Re: How to disable the state behind `COUNT` sql?

2020-04-27 Thread Jark Wu
Yes. This is an optimization if the previous result is the same to the new result, then operator will not generate records for the new result. Best, Jark On Tue, 28 Apr 2020 at 11:05, izual wrote: > Thank you, Jark. > > I also have tried COUNT DISTINCT ^_^, the only problem is that if the > `tb

ML/DL via Flink

2020-04-27 Thread m@xi
Hello Flinkers, I am building a *streaming* prototype system on top of Flink and I want ideally to enable ML training (if possible DL) in Flink. It would be nice to lay down all the existing libraries that provide primitives that enable the training of ML models. I assume it is more efficient to

join state TTL

2020-04-27 Thread lec ssmi
Hi: When the stream is joined with another stream , the cached stream data will be saved as a state and deleted as the watermark advances. I found that there is also a parameter that can set the state expiration time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI &SQL .This

Blink window and cube

2020-04-27 Thread 刘建刚
Hi, I find that blink planner supports CUBE. CUBE can be used together with a field but not window. For example, the following SQL is not supported: SELECT A, B, sum(C) FROM person GROUP BY cube(A, B), TUMBLE(curTimestamp, interval '1' minute) The following error is reported. Is th

Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-27 Thread 刘建刚
Thank you very much. It solved my problem. > 2020年4月22日 下午5:15,Jingsong Li [via Apache Flink User Mailing List archive.] > 写道: > > Hi, > > Sorry for the mistake, [1] is related, but this bug has been fixed totally in > [2], so the safe version should be 1.9.3+ and 1.10.1+, so there is