Thanks Dian for being our release manager and thanks to everyone who helped
making this release possible.
Cheers,
Till
On Mon, Apr 27, 2020 at 3:26 AM Jingsong Li wrote:
> Thanks Dian for managing this release!
>
> Best,
> Jingsong Lee
>
> On Sun, Apr 26, 2020 at 7:17 PM Jark Wu wrote:
>
>> Th
Sorry — I didn't understand you were dealing with multiple keys.
In that case, I'd recommend you read about key-group assignment [1] and
check the KeyGroupRangeAssignment class [2].
Key-groups are assigned to parallel tasks as ranges before the job is
started — this is also a well-defined behavio
Hi Dawid,
Thanks for the very detailed answer and the correct assumptions (I am using row
format).
I tried not using NFS/S3, but seems like it is the only option I have.
Best regards
Eyal Peer
From: Dawid Wysakowicz
Sent: Friday, April 24, 2020 4:20 PM
To: Eyal Pe'er ; user
Subject: Re: Fault to
Thanks Dian for the release work and thanks everyone involved.
Best,
Zhijiang
--
From:Till Rohrmann
Send Time:2020 Apr. 27 (Mon.) 15:13
To:Jingsong Li
Cc:dev ; Leonard Xu ; Benchao Li
; Konstantin Knauf ; jincheng
sun ; Hequn C
Hi Eyal and Dawid,
@Eyal I think Dawid explained pretty well what is happening and why in
distributed settings, the underlying FS on which the StreamingFileSink
writes has to be durable and accessible to all parallel instances of
the job. Please let us know if you have any further questions.
Chee
I implements my DimTable by extends `LookupTableSource`[1], which stores data
like:
id=1 -> (SH, BJ, SD)
id=2 -> (...)
and then extends `TableFunction` to return the value corresponding to the
lookup keys,and maybe return multi rows, for example, when lookupkeys is id=1,
then in the `TableFun
Hi,
There is indeed a state for the aggregation result, however we cannot
disable it, it's by design.
StreamQueryConfig.maxIdleStateRetentionTime can control how long the state
will be kept.
If you can ensure the time gap between two records of the same id larger
than, for example
1 min, then sett
overview.json is a generated file that is placed in the local directory
controlled by /historyserver.web.tmpdir/.
Have you configured this option to point to some non-local filesystem?
(Or if not, is the java.io.tmpdir property pointing somewhere funny?)
On 24/04/2020 18:24, Hailu, Andreas wr
Hi,
I have an upstream operator that outputs device state transition messages
with event timestamps. Meaning it only emits output when a transition takes
place.
For example,
state1 @ 1 PM
state2 @ 2 PM
and so on.
*Using a downstream operator, I want to emit notification messages as per
some config
Hi All!
I was trying to flatten a nested tuple into named columns with the
fromDataStream method and I hit some problems with mapping tuple fields to
column names.
It seems like the `f0 as ColumnName` kind of expressions are not parsed
correctly.
It is very easy to reproduce:
tableEnv.fromDataSt
Hello, my friend.
I have a dimension table.
createtabledim_u_score(u_idbigint,varchar,score_adouble,score_bdouble)with{xxx}Inascene
The condition of lookup is fliter score_a > 0.9
In another scenario
The condition of lookup is fliter score_b > 1
In Flink, at present, lookup join can use on to
Sorry, just noticed this thread...
@Stephan I cannot remember the discussion but I think it's an interesting
topic, will find some time to consider it (unregister states).
@Eleanore Glad to know that Beam community has fixed it and thanks for the
reference.
Best Regards,
Yu
On Sun, 26 Apr 2020
Hi, gyula.fora
If you’re trying convert Table from a Tuple DataStream, Alias the filed by `as`
expression is no supported yet,
because all fields are referenced by position in this point. You can simply
alias like following syntax:
```
tableEnv.fromDataStream(env.fromElements(Tuple2.of("a", 1))
Hello Flink Community!
We have a flink streaming application with a particular use case where a
closure variable Set is used in a filter function.
Currently, the variable is set at startup time.
It’s populated from an S3 location, where several files exist (we consume the
one with the last upd
Hi Leonard,
The tuple fields can also be referenced as their POJO names (f0, f1), they
can be reordered similar to pojo fields, however you cannot alias them. (If
you look at the link I have sent that shows how it is supposed to work but
it throws an exception when I try it)
Also what I am trying
My machine's /tmp directory is not large enough to support the archived files,
so I changed my java.io.tmpdir to be in some other location which is
significantly larger. I hadn't set anything for historyserver.web.tmpdir, so I
suspect it was still pointing at /tmp. I just tried setting
historys
Hi
The image is not very clear.
For RocksDBStateBackend, do you enable incremental checkpoint?
Currently, checkpoint on TM side contains some steps:
1 barrier align
2 sync snapshot
3 async snapshot
For expired checkpoint, could you please check the tasks in the first
operator of the DAG to find o
If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so
that should be fine.
What are the contents of /local/scratch/flink_historyserver_tmpdir?
I assume there are already archives in HDFS?
On 27/04/2020 16:02, Hailu, Andreas wrote:
My machine’s /tmp directory is not large enou
Hi Gyula,
I think you are hitting a bug with the naming/aliasing of the fields of
a Tuple. The bug is in the
org.apache.flink.table.typeutils.FieldInfoUtils#isReferenceByPosition
method. As it does not work correctly for aliases. Would you mind
creating an issue for it?
You should be able to alia
Thanks, Benchao.
Maybe change the dimension table will work, but this changes a lot, include
`size/count` is not the column of one dim table.
I notice that user can define Aggregate Functions[1], but this page also said:
> Accumulators are automatically backup-ed by Flink’s checkpointing mechan
Hi Dawid,
Thanks for the clarification on this issue and I agree that there is too
much going on with these conversions already.
What do you mean by "Unfortunately you can not reorder the fields that
way." ?
I can reorder POJO fields even after aliasing and also tuple fields (f1,
f0) so I assume
Hi izual,
IMO, implementing your own COUNT/SUM UDAF doesn't solve the problem.
The state is not managed in UDAF, it's managed by aggregation operator, and
your UDAF's aggregator will be handled by operator using state.
izual 于2020年4月27日周一 下午11:21写道:
> Thanks, Benchao.
>
> Maybe change the dimen
What I meant by "Unfortunately you can not reorder the fields that way."
is that
tableEnv.fromDataStream(input, “name, age, height");
uses the so-called referenceByPosition mode. It will name the f0 field
-> name, the f1 -> age and f2 -> height.
If it wasn't for the bug you could reorder and
bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/
total 8
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76
There are jus
Hi Senthil,
I think you are right that you cannot update closure variables
directly and expect them to show up at the workers.
If the variable values are read from S3 files, I think currently you
will need to define a source explicitly to read the latest value of the file.
Wh
Hello Yang,
My Hadoop version is Hadoop 3.2.1-amzn-0
and I have put in flink/lib. flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
then I am getting below error :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/mnt/yarn/usercache/hadoop/appcache/application_15879
Hi Navneeth,
But what’s the problem with using `keyBy(…)`? If you have a set of keys that
you want to process together, in other words they are are basically equal from
the `keyBy(…)` perspective, why can’t you use this in your `KeySelector`?
Maybe to make it clear, you can think about this in
Hi,
I’m not sure, but I don’t think there is an existing window that would do
exactly what you want. I would suggest to go back to the `keyedProcessFunction`
(or a custom operator?), and have a MapState
currentStates field. Your key would be for example a timestamp of the beginning
of your win
Following up on Piotr's outline, there's an example in the documentation of
how to use a KeyedProcessFunction to implement an event-time tumbling
window [1]. Perhaps that can help you get started.
Regards,
David
[1]
https://ci.apache.org/projects/flink/flink-docs-master/tutorials/event_driven.htm
It seems requiring the checkpoint URL to create the RocksDBStateBackend mixes
up the operational aspects of cluster within the job.
RocksDBStateBackend stateBackend = new RocksDBStateBackend(“CHECKPOINT_URL”,
true);
stateBackend.setDbStoragePath(“DB_STORAGE_PATH”);
Also, noticed that the RocksDB
Hi Jiayi Liao.
Thanks your replying. Add attachment . And can’t get any useful messages;
> 2020年4月27日 12:40,Jiayi Liao 写道:
>
> <粘贴的图形-1.tiff>
Hi forideal,
Currently, dimension table join requires at least one join key. That is the
`u_id` in your example. The join key will be used as lookup key.
If you have some additional filters on dimension table, that's fine, Flink
will help to filter the fetched data.
That means Flink supports follo
Hi izual,
In such case, I think you should try COUNT DISTINCT instead of COUNT.
DISTINCT will help to deduplicate, so no matter how many times you received
id=1, the region count should always 3.
SELECT tblEvent.id, COUNT(DISTINCT tblDim.region) FROM tblEvent JOIN tblDim
FOR SYSTEM AS OF tblEvent
Thank you, Jark.
I also have tried COUNT DISTINCT ^_^, the only problem is that if the
`tblEvent` generates two simple id, such as:
t1: {"id": 1}
t2: {"id": 1}
But the sql will only output one result only on t1 record received.
I think maybe some optimizer worker background when the result doe
Hi Bajaj
Current "state.checkpoints.dir" defines cluster-wide location for cluster and
each job would create the specific checkpoint location under it with job-id
sub-directory. It is the same for the checkpoint URL in RocksDB.
And the configuration option "state.backend.rocksdb.localdir" [1] s
Yes.
This is an optimization if the previous result is the same to the new
result, then operator will not generate records for the new result.
Best,
Jark
On Tue, 28 Apr 2020 at 11:05, izual wrote:
> Thank you, Jark.
>
> I also have tried COUNT DISTINCT ^_^, the only problem is that if the
> `tb
Hello Flinkers,
I am building a *streaming* prototype system on top of Flink and I want
ideally to enable ML training (if possible DL) in Flink. It would be nice to
lay down all the existing libraries that provide primitives that enable the
training of ML models.
I assume it is more efficient to
Hi:
When the stream is joined with another stream , the cached stream data
will be saved as a state and deleted as the watermark advances.
I found that there is also a parameter that can set the state expiration
time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI &SQL
.This
Hi, I find that blink planner supports CUBE. CUBE can be used together
with a field but not window. For example, the following SQL is not supported:
SELECT A, B, sum(C)
FROM person
GROUP BY cube(A, B), TUMBLE(curTimestamp, interval '1' minute)
The following error is reported. Is th
Thank you very much. It solved my problem.
> 2020年4月22日 下午5:15,Jingsong Li [via Apache Flink User Mailing List archive.]
> 写道:
>
> Hi,
>
> Sorry for the mistake, [1] is related, but this bug has been fixed totally in
> [2], so the safe version should be 1.9.3+ and 1.10.1+, so there is
40 matches
Mail list logo