Hi Nikola,
the reason for deprecating `registerTableSource` is that we aim to have
everything declarative in Table API. A table program should simply
declare what it needs and the planner should find a suitable connector,
regardless how the underlying class structure looks like. This might
al
Hi Aljoscha,
Thank you, that clarification helps. I am generating a new watermark in the
getCurrentWatermark() method of my assigner, which causes the watermark to
be actually updated every autoWatermark interval. I assumed that actual
watermark updates were caused by only setAutoWatermark() method
Do you have any logs that could help us identify the issue? How many files
is a long date range?
In general, you could try out the same program with the DataStream API (use
StreamExecutionEnvironment#readFile [1] with PROCESS_ONCE to get a behavior
equivalent to batch). DataStreams are only slight
I have many lzo files on HDFS in such path format: /logs/{id}/{date}/xxx[1-100].lzo/logs/a/ds=2018-01-01/xxx1.lzo/logs/b/ds=2018-01-01/xxx1.lzo.../logs/z/ds=2018-01-02/xxx1.lzo.../logs/z/ds=2020-05-01/xxx100.lzoI'am using Flink Dataset to read those files by a range of {date} and a
Hi Leonard,
Thank you so much! It worked, I did get a new error but it is unrelated to
this question.
Den man. 18. maj 2020 kl. 15.21 skrev Leonard Xu :
> More precisely: Should the sink table `sql-sink` missed required version
> option.
>
>
> 在 2020年5月18日,21:13,Leonard Xu 写道:
>
> Hi,
>
> Lo
Hi Aljoscha,
Thanks a lot for the suggestion!
Best,
Eleanore
On Mon, May 18, 2020 at 2:16 AM Aljoscha Krettek
wrote:
> Now I see what you mean. I think you would have to somehow set up the
> Flink metrics system as a backend for opencensus. Then the metrics would
> be reported to the same syst
Hi Arvid,
Thanks for the suggestion! I will tryout to see how it works.
Best,
Eleanore
On Mon, May 18, 2020 at 8:04 AM Arvid Heise wrote:
> Hi Eleanore,
>
> The question in general is what you understand under edge data centers as
> the term is pretty fuzzy. Since Flink is running on Java, it'
Hi Jingsong,
Cool, Thanks for your reply.
Best wishes.
From: Jingsong Li
Sent: Tuesday, May 19, 2020 10:46
To: Thomas Huang
Cc: Flink
Subject: Re: Is it possible to change 'connector.startup-mode' option in the
flink job
Hi Thomas,
Good to hear from you. Th
Hi Thomas,
Good to hear from you. This is a very common problem.
In 1.11, we have two FLIP to solve your problem. [1][2] You can take a look.
I think dynamic table options (table hints) is enough for your requirement.
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dyna
Hi guys,
I'm using hive to store kafka topic metadata as follows::
CREATE TABLE orders (
user_idBIGINT,
productSTRING,
order_time TIMESTAMP(3),
WATERMARK FOR order_time AS order_time - '5' SECONDS
) WITH (
'connector.type' = 'kafka',
Hi Theo
Sorry for the late reply and thanks for your detailed explanation.
>From your description, I know
1. The Impala only handle the parquet file that it has been
notified(scanned).
2. You could accept that only updating the state once per partition in
your scenario.
3. There
Hi Rahul,
Thanks for explaining. I see. Now there is no way to dynamic control file
name in StreamingFileSink.
If the number of organizations is not so huge. Like Sivaprasanna said, you
can use "BucketAssigner" to create bucket by your organization ID. The
bucket in StreamingFileSink is like Hive
I also previously had some low volume data sources that I wanted to process
and I was always convinced that the proper solution would be to have
auto-scaling and just decrease the used resources as much as possible
(which is not trivial because of state rescaling). But thinking a bit
further, it wo
Thanks Yun for highlighting this, it's very helpful! I'll give it a go with
that in mind.
We have already begun using checkpoints for recovery. Having these
improvements would still be immensely helpful to reduce downtime for
savepoint recovery.
On Mon, May 18, 2020 at 3:14 PM Yun Tang wrote:
>
Hi Joey
Previously, I also looked at the mechanism to create on-disk SSTables as I
planed to use RocksDB's benchmark to mock scenario in Flink. However, I found
the main challenge is how to ensure the keys are inserted in a strictly
increasing order. The key order in java could differ from the
Thanks all for responding.
To give a bit more context:
- I'm building a tool that performs a *fully deterministic* stream
processing of mission-critical data
- all input data is in the form of an append-only event log (Parquet files)
- users define streaming SQL transformations to do all kinds of
Hi Flink users!
TL;DR: My Flink taskmanagers frequently permanently hang in a shutdown
handler’s Thread.sleep() call when I issue a stop. Hitting a wall trying to
debug. https://issues.apache.org/jira/browse/FLINK-17470
I’m really scratching my head at this issue. On a particular environment i
Hi Annemarie
First of all, I'm afraid Flink does not support to make window state as
queryable currently. It was planed to support but haven't implemented as lack
of continuous development in this area for Flink community.
Secondly, I think the doc just want to tell user how to enable this feat
Hey,
While running a Flink application with a large-state, savepoint recovery
has been a painful part of operating the application because recovery time
can be several hours. During some profiling that chohan (cc'd) had done, a
red flag stood out — savepoint recovery consisted mostly of RocksDB Ge
Hi Jaswin
As Arvid suggested, it's not encouraged to query the internal RocksDB directly.
Apart from Arvid's solution, I think queryable state [1] might also help you. I
think you just want to know the left entries in both of map state after several
days and query the state should make the meet
Hi Jingsong,
We have a system where organizations keep getting added and removed on a
regular basis, As the new organizations get added the data from these
organization starts flowing into the streaming system, we do group by on
Organisation ID which is part of the incoming event, If in the incomi
Thanks Yu for the release manager and everyone involved in.
Best,
Zhijiang
--
From:Arvid Heise
Send Time:2020年5月18日(星期一) 23:17
To:Yangze Guo
Cc:dev ; Apache Announce List ;
user ; Yu Li ; user-zh
Subject:Re: [ANNOUNCE] Apache Fli
Hi,
Actually, seems like spark dynamic allocation saves more resources in that
case.
From: Arvid Heise
Sent: Monday, May 18, 2020 11:15:09 PM
To: Congxian Qiu
Cc: Sergii Mikhtoniuk ; user
Subject: Re: Process available data and stop with savepoint
Hi Sergii,
Hi Jaswin,
I'd discourage using rocksdb directly. It's more of an implementation
detail of Flink. I'd also discourage to write to Kafka directly without
using our Kafka Sink, as you will receive duplicates upon recovery.
If you run the KeyedCoProcessFunction continuously anyways, I'd add a timer
I’m wondering that why you use a beta feature for production. Why not push the
latest state into down sink like redis or hbase with Apache phoenix .
From: Annemarie Burger
Sent: Monday, May 18, 2020 11:19:23 PM
To: user@flink.apache.org
Subject: Re: Incremental
Hi,
Thanks for your suggestions!
However, as I'm reading the docs for queryable state, it says that it can
only be used for Processing time, and my windows are defined using event
time. So, I guess this means I should use the KeyedProcessFunction. Could
you maybe suggest a rough implementation fo
Thank you very much!
On Mon, May 18, 2020 at 8:28 AM Yangze Guo wrote:
> Thanks Yu for the great job. Congrats everyone who made this release
> possible.
> Best,
> Yangze Guo
>
> On Mon, May 18, 2020 at 10:57 AM Leonard Xu wrote:
> >
> >
> > Thanks Yu for being the release manager, and everyone
Hi Sergii,
your requirements feel a bit odd. It's neither batch nor streaming.
Could you tell us why it's not possible to let the job run as a streaming
job that runs continuously? Is it just a matter of saving costs?
If so, you could monitor the number of records being processed and trigger
stop
Hi,
I want to use Queryable State to communicate between PU's in the same Flink
job. I'm aware this is not the intended use of Queryable State, but I was
wondering if and how it could be done.
More specifically, I want to query the (event-time) window state of one PU,
from another PU, while both
Hi Eleanore,
The question in general is what you understand under edge data centers as
the term is pretty fuzzy. Since Flink is running on Java, it's not suitable
for embedded clusters as of now. There is plenty of work done already to
tests that Flink runs on ARM clusters [1].
If you just mean i
Hi Roc,
just as an addition to Congxian, Flink 1.11 will be released soonish. We
just started the release process (created release branch to freeze the
features), which should include 2-4 weeks of testing/bug fixing. Of course,
if you are interested, you could use the current master/release branch
I think there is some confusion in this thread between the auto
watermark interval and the interval (length) of an event-time window.
Maybe clearing that up for everyone helps.
The auto watermark interval is the periodicity (in processing time) at
which Flink asks the source (or a watermark ge
Hi Yuval,
currently there is no API for getting those insights. I guess you need
to use internal API for getting this information. Which planner and
version are you using?
Regards,
Timo
On 18.05.20 14:16, Yuval Itzchakov wrote:
Hi,
Is there any way to infer if a Table is going to generate
More precisely: Should the sink table `sql-sink` missed required version
option.
> 在 2020年5月18日,21:13,Leonard Xu 写道:
>
> Hi,
>
> Look likes you missed two required parameters: version and topic[1], you need
> to add them for both source table and sink table.
>
> .connect(
> new Kafka()
Hi,
Look likes you missed two required parameters: version and topic[1], you need
to add them for both source table and sink table.
.connect(
new Kafka()
.version("0.11")// required: valid connector versions are
// "0.8", "0.9", "0.10", "0.11", and "universal
Hi Omar,
I think in the next couple of weeks the Flink 1.11 release should be
released. It mainly depends on the testing and bug fixing period as Xintong
said.
Cheers,
Till
On Mon, May 18, 2020 at 1:48 PM Xintong Song wrote:
> 1.11.0 is feature freezing today. The final release date depends on
Hi Shubham,
great that tweaking the JDBC sink helped. Maybe I don't fully understand
your logic but:
The watermark that you are receiving in an operator should already be
the minimum of all subtasks. Because it is sent to all subsequent
operators by the precedeing operator. So a watermark ca
Hi,
Is there any way to infer if a Table is going to generate an AppendStream
or a RetractStream under the hood in order to figure out if I need to call
`toAppendStream` vs `toRetractStream` for the DataStream conversion? Note
this information is important to me further down the DAG so I want to
p
Hi community,
I have cut the release-1.11 from the master branch based
on bf5594cdde428be3521810cb3f44db0462db35df commit.
If you will be merging something into the master branch, please make sure
to set the correct fix version in the JIRA, accordingly to which branch
have you merged your code. E
1.11.0 is feature freezing today. The final release date depends on the
progress of release testing / bug fixing.
Thank you~
Xintong Song
On Mon, May 18, 2020 at 6:36 PM Omar Gawi wrote:
> Thanks Till!
> Do you know what is 1.11.0 release date?
>
>
> On Mon, May 18, 2020 at 12:49 PM Till Roh
On 15.05.20 15:17, Slotterback, Chris wrote:
My understanding is that while all these windows build their memory state, I
can expect heap memory to grow for the 24 hour length of the
SlidingEventTimeWindow, and then start to flatten as the t-24hr window frames
expire and release back to the JV
Hi Gnanasoundari,
Your use case is very typical and pretty much the main motivation for event
time and watermarks. It's supported out of the box. I recommend reading
again the first resource of Alex.
To make it clear, let's have a small example:
Source 1 -\
+--> Window --> Sink
I see, I had not considered the serialization; that was the issue.
Thank you.
On Mon, May 18, 2020 at 12:29 PM Chesnay Schepler
wrote:
> We don't publish sources for test classes.
>
> Have you considered that the sink will be serialized on job submission,
> meaning that your myTestSink instance
Hi Ivan,
First let's address the issue with idle partitions. The solution is to use
a watermark assigner that also emits a watermark with some idle timeout [1].
Now the second question, on why Kafka commits are committed for in-flight,
checkpointed data. The basic idea is that you are not losing
Thanks Till!
Do you know what is 1.11.0 release date?
On Mon, May 18, 2020 at 12:49 PM Till Rohrmann wrote:
> Hi Omar,
>
> with FLINK-15154 [1] which will be released with the upcoming 1.11.0
> release, it will be possible to bind the Blob server to the hostname
> specified via jobmanager.bind-
Watermarks are maintained for each source separately, but need to be
combined when you combine the different streams. If not, you would not be
able to perform windows and any related join or aggregation. Having
different timestamps in the sources is actually the normal case and the
main motivation
Hi Aissa,
Your use case is quite unusual. You usually have an alert in a dashboard
(potentially sending an email), if any of the sensors show an error. You
usually want to retain the original error code to be able to quickly
identify the issue.
Your use case would make sense if you want to filte
Hi Ivan,
Just to add up to chaining: When splitting the map into two parts, objects
need to be copied from one operator to the chained operator. Since your
objects are very heavy that can take quite long, especially if you don't
have a specific serializer configured but rely on Kryo.
You can avoi
Hi Omar,
with FLINK-15154 [1] which will be released with the upcoming 1.11.0
release, it will be possible to bind the Blob server to the hostname
specified via jobmanager.bind-host. Per default it will still bind to the
wildcard address but with this option you can bind it to localhost, for
examp
Hi Senthil,
since your records are so big, I recommend to take the time to evaluate
some different serializers [1].
[1]
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
On Wed, May 13, 2020 at 5:40 PM Senthil Kumar wrote:
> Zhijiang,
>
>
>
> Thanks for your sugges
Hi,
I was wondering that since it is possible to "query the state of an
in-flight window", if it is also possible to make sure we query *every*
window at the proper time. So how to access in flight window state of a
window of a PU from another PU with Queryable State.
I want to query the window st
Now I see what you mean. I think you would have to somehow set up the
Flink metrics system as a backend for opencensus. Then the metrics would
be reported to the same system (prometheus) in this case. In Opencensus
lingo, this would mean using a Flink-based Stats Exporter instead of the
Prometh
Hey,
If you want to pass your Hadoop jars using the HADOOP_CLASSPATH environment
variable, you need to remove the "flink-shaded-hadoop.jar" from the
classpath by deleting the "lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar"
file.
This jar file contains all Hadoop classes as well. That's also the rea
Hi,
I'm using flink1.10.0 and Hadoop3.2.1. My flink cluster is deployed on yarn.
According to Flink-Hadoop-Integration Doc, I set up HADOOP_CLASSPATH
environment variable on each cluster machine.
From the beginning, everythins is ok, I can read from/write to HDFS/S3.
And then, I want to try
Hi Dario,
What version of Flink are you using? Are you implementing your own
TableSource or do you use the JdbcTableSource that comes with Flink?
Which planner do you use blink or the old one?
Unfortunately the rework of Table API type system is not finished yet,
therefore there are still rough e
/**
* Alipay.com Inc.
* Copyright (c) 2004-2020 All Rights Reserved.
*/
package com.paytm.reconsys.functions.processfunctions;
import com.paytm.reconsys.Constants;
import com.paytm.reconsys.configs.ConfigurationsManager;
import com.paytm.reconsys.enums.DescripancyTypeEnum;
import com.paytm.reco
Hi,
I have implemented the flink job with MapStates. The functionality is like,
1. I have two datastreams which I connect with connect operator and then
call coprocessfunction with every pair of objects.
2. For element of first datastream, processElement1 method is called and for
an elemen
Hi Nick,
yes, you can be lucky that no involved classes have changed (much), but
there is no guarantee.
You could try to fiddle around and add the respective class (
*ClosureCleanerLevel)* from Flink 1.9 in your jar, but it's hacky at best.
Another option is to bundle Flink 1.9 with your code if
Dear community,
happy to share this week's community update. With feature freeze for Flink
1.11 today, there are not a lot of feature-related discussions right now on
Apache Flink development mailing list. Still I can share some news on
Apache Flink 1.11 and 1.10.1, Flink Forward Global and a disc
Yes. The retract message will be generated first, then the new result.
lec ssmi 于2020年5月18日周一 下午3:00写道:
> Hi:
> When encountering Retract, there was a following sql :
> *select count(1) count, word group by word*
>
> Suppose the current aggregation result is :
>* 'hello'->3*
> When
Hi:
When encountering Retract, there was a following sql :
*select count(1) count, word group by word*
Suppose the current aggregation result is :
* 'hello'->3*
When there is record to come again, the count of 'hello' will be changed to
4.
The following two records will be generated i
We don't publish sources for test classes.
Have you considered that the sink will be serialized on job submission,
meaning that your myTestSink instance is not the one actually used by
the job? This typically means that have to store stuff in a static field
instead.
Alternatively, depending on
62 matches
Mail list logo