Flink handles its parallelism independently from the number of partitions
in the topic(s) being read. The parallelism comes from whatever is set in
the cluster configuration, without any concern for the source's native
parallelism. If there are fewer kafka partitions than the flink
parallelism, the
My guess is that this only fails when pyflink is used with the heap state
backend, in which case one possible workaround is to use the RocksDB state
backend instead. Another workaround would be to rely on timers in the
process function, and clear the state yourself.
David
On Fri, Mar 8, 2024 at 1
With streaming execution, the entire pipeline is always running, which is
necessary so that results can be continuously produced. But with batch
execution, the job graph can be segmented into separate pipelined stages
that can be executed sequentially, each running to completion before the
next beg
For a collection of several complete sample applications using Flink with
Kafka, see https://github.com/confluentinc/flink-cookbook.
And I agree with Marco -- in fact, I would go farther, and say that using
Spring Boot with Flink is an anti-pattern.
David
On Wed, Feb 7, 2024 at 4:37 PM Marco Vil
I've seen enough demand for a streaming broadcast join in the community to
justify a FLIP -- I think it's a good idea, and look forward to the
discussion.
David
On Fri, Feb 2, 2024 at 6:55 AM Feng Jin wrote:
> +1 a FLIP for this topic.
>
>
> Best,
> Feng
>
> On Fri, Feb 2, 2024 at 10:26 PM Mart
When it comes to decoupling the state store from Flink, I suggest taking a
look at FlinkNDB, which is an experimental state backend for Flink that
puts the state into an external distributed database. There's a Flink
Forward talk [1] and a master's thesis [2] available.
[1] https://www.youtube.com
While the readFile method would monitor changes to existing files, it would
completely re-ingest each changed file after every change. This behavior
wasn't very user friendly.
David
On Fri, Jan 5, 2024 at 2:22 AM Martijn Visser
wrote:
> Hi Prasanna,
>
> I think this is as expected. There is no
Hi, Alex!
Yes, in PyFlink the various flatmap and process functions are implemented
as generator functions, so they use yield to emit results.
David
On Tue, Nov 7, 2023 at 1:16 PM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:
> Java ProcessFunction API defines a clear way to collect d
Rui,
I don't have any direct experience with this topic, but given the
motivation you shared, the proposal makes sense to me. Given that the new
default feels more complex than the current behavior, if we decide to do
this I think it will be important to include the rationale you've shared in
the
> As you suggested message broker below then how it is feasible in this
case?
To my mind, the idea would be to use something like a socket source for
Kafka Connect. This would give you a simple, reliable way to get the data
stored into a replayable data store. You'd then be able to start, stop, an
I believe bloom filters are off by default because they add overhead and
aren't always helpful. I.e., in workloads that are write heavy and have few
reads, bloom filters aren't worth the overhead.
David
On Fri, Oct 20, 2023 at 11:31 AM Mate Czagany wrote:
> Hi,
>
> There have been no reports ab
In Flink, all user functions, including KeyedBroadcastProcessFunction,
are (effectively) single threaded, so the processBroadcastElement
method will run to completion before any further messages are
processed in the processElement method. (I said "effectively" because
in the case of processing time
This may or may not help, but you can get the execution plan from
inside the client, by doing something like this (I printed the plan to
stderr):
...
System.err.println(env.getExecutionPlan());
env.execute("my job");
The result is a JSON-encoded representation of the job graph, which
Back in 2020, there was a Flink Forward talk [1] about how Lyft was
doing blue green deployments. Earlier (all the way back in 2017)
Drivetribe described [2] how they were doing so as well.
David
[1] https://www.youtube.com/watch?v=Hyt3YrtKQAM
[2] https://www.ververica.com/blog/drivetribe-cqrs-ap
This join optimization sounds promising, but I'm wondering why Flink
SQL isn't taking advantage of the N-Ary Stream Operator introduced in
FLIP-92 [1][2] to implement a n-way join in a single operator. Is
there something that makes this impossible/impractical?
[1] https://cwiki.apache.org/confluen
There's already a built-in concept of WindowStagger that provides an
interface for accomplishing this.
It's not as well integrated (or documented) as it might be, but the
basic mechanism exists. To use it, I believe you would do something
like this:
assigner = new TumblingEventTimeWindows(Time.se
I'm not 100% certain what "alignment duration" is measuring exactly in
the context of unaligned checkpoints -- however, even with unaligned
checkpointing each operator still has to wait until all of the
barriers are present in the operator's input queues. It doesn't have
to wait for the barriers to
UE if the subtask is
>>> restarted and no event from source is processed.
>>>
>>> Best,
>>> Shammon FY
>>>
>>> On Tue, Mar 14, 2023 at 4:58 PM Alexis Sarda-Espinosa
>>> wrote:
>>>>
>>>> H
I believe there is some noticeable overhead if you are using the
heap-based state backend, but with RocksDB I think the difference is
negligible.
David
On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu wrote:
>
> Hi, Tony.
> "be detrimental to performance" means that some extra space overhead of the
Watermarks always follow the corresponding event(s). I'm not sure why
they were designed that way, but that is how they are implemented.
Windows maintain this contract by emitting all of their results before
forwarding the watermark that triggered the results.
David
On Mon, Mar 13, 2023 at 5:28 P
I can't respond to the python-specific aspects of this situation, but
I don't believe you need to use the same OutputTag instance. It should
be enough that the various tag instances involved all have the same
String id. (That's why the id exists.)
David
On Tue, Feb 14, 2023 at 11:51 AM Andrew Ott
DataStream time windows and Flink SQL make assumptions about the timestamps
and watermarks being milliseconds since the epoch. But the underlying
machinery does not. So if you limit yourself to process functions (for
example), then nothing will assign any semantics to the time values.
David
On Th
Flink only officially supports Scala 2.12 up to 2.12.7 -- you are running
into the binary compatibility check, intended to keep you from unknowingly
running into problems. You can disable japicmp, and everything will
hopefully work:
mvn clean install -DskipTests -Djapicmp.skip -Dscala-2.12
-Dscala
When it comes to event time processing and watermarks, I believe that if
you stick to the lower level APIs, then the milliseconds assumption is
indeed arbitrary, but at higher levels that assumption is baked in.
In other words, that rules out using Flink SQL, or things
like TumblingEventTimeWindow
Yes, that will work as you expect. So long as you don't put another shuffle
or rebalance in between, the keyed partitioning that's already in place
will carry through the async i/o operator, and beyond. In most cases you
can even use reinterpretAsKeyedStream on the output (so long as you haven't
do
I was wrong about this. The AS OF style processing join has been disabled
at a higher level,
in
org.apache.flink.table.planner.plan.nodes.exec.stream.StreamExecTemporalJoin#createJoinOperator
David
On Thu, Oct 6, 2022 at 9:59 AM David Anderson wrote:
> Salva,
>
> Have you tried doing
As for your original question, the documentation states that a temporal
table function can only be registered via the Table API, and I believe this
is true.
David
On Thu, Oct 6, 2022 at 9:59 AM David Anderson wrote:
> Salva,
>
> Have you tried doing an AS OF style processing time temp
Salva,
Have you tried doing an AS OF style processing time temporal join? I know
the documentation leads one to believe this isn't supported, but I think it
actually works. I'm basing this on this comment [1] in the code for
the TemporalProcessTimeJoinOperator:
The operator to temporal join a str
I want to clarify one point here, which is that modifying jobs written in
Scala to use Flink's Java API does not require porting them to Java. I can
readily understand why folks using Scala might rather use Java 17 than Java
11, but sticking to Scala will remain an option even if Flink's Scala API
Logically it would make sense to be able to initialize BroadcastState in
the open method of a BroadcastProcessFunction, but in practice I don't
believe it can be done -- because the necessary Context isn't made
available.
Perhaps you could use the State Processor API to bootstrap some state into
t
Vishal,
If you decide you can't live with dropping that state, [1] is a complete
example showing how to migrate from Kryo by using the state processor API.
David
[1]
https://www.docs.immerok.cloud/docs/cookbook/migrating-state-away-from-kryo/
On Fri, Sep 16, 2022 at 8:32 AM Vishal Santoshi
wr
The way that Flink handles session windows is that every new event is
initially assigned to its own session window, and then overlapping sessions
are merged. I imagine this is why you are seeing so many calls
to createAccumulator.
This implementation choice is deeply embedded in the code; I don't
If I remember correctly, there's a fix for this in Flink 1.14 (but the
feature is disabled by default in 1.14, and enabled by default in 1.15).
(I'm thinking
that execution.checkpointing.checkpoints-after-tasks-finish.enabled [1]
takes care of this.)
With Flink 1.13 I believe you'll have to handle
t; What's that?
>
>
>
> Sent: Monday, August 01, 2022 at 2:49 PM
> From: "Martijn Visser" martijnvis...@apache.org][mailto:martijnvis...@apache.org[mailto:
> martijnvis...@apache.org]]>
> To: pod...@gmx.com[mailto:pod...@gmx.com][mailto:pod...@gmx.com[ma
The role of CREATE TABLE is to provide the necessary metadata for the table
-- the location of the data, its format, etc. Executing CREATE TABLE
creates an entry in the catalog, but otherwise doesn't do anything.
In a case like this one
CREATE TABLE Table1 (column_name1 STRING, column_name2 DOUBL
If you have two watermark strategies in your job, the
downstream TimestampsAndWatermarksOperator will absorb incoming watermarks
and not forward them downstream, but it will have no effect upstream.
The only exception to this is that watermarks equal to Long.MAX_VALUE are
forwarded downstream, sin
You can keep the same transaction ID if you are restarting the job as a
continuation of what was running before. You need distinct IDs for
different jobs that will be running against the same kafka brokers. I think
of the transaction ID as an application identifier.
See [1] for a complete list of
. The problem was quite obvious when I
> enabled idleness and data flowed through much faster with different results
> even though the topics were not idle.
>
> Regards.
>
> On Mon, Aug 15, 2022 at 12:12 AM David Anderson
> wrote:
>
>> Although I'm not very familiar wi
Although I'm not very familiar with the design of the code involved, I also
looked at the code, and I'm inclined to agree with you that this is a bug.
Please do raise an issue.
I'm wondering how you noticed this. I was thinking about how to write a
failing test, and I'm wondering if this has some
The configuration parameter passed to the open method is a legacy holdover
that has been retained to avoid breaking a public API, but is no longer
used.
Your options are to either get the global job parameters from the execution
context as described in [1], or to pass the configuration to a constr
You need to add
'csv.field-delimiter'=';'
to the definition of Table1 so that the input from test4.txt can be
correctly parsed:
tEnv.executeSql("CREATE TABLE Table1 (column_name1 STRING,
column_name2 DOUBLE) WITH ('connector.type' = 'filesystem',
'connector.path' = 'file:///C:/temp/test4
What did change was the default starting position when not starting from a
checkpoint. With FlinkKafkaConsumer, it starts from the committed offsets
by default. With KafkaSource, it starts from the earliest offset.
David
On Fri, Jul 15, 2022 at 5:57 AM Chesnay Schepler wrote:
> I'm not sure abo
This is, in fact, the expected behavior. Let me explain why:
In order for Flink to provide exactly-once guarantees, the input sources
must be able to rewind and then replay any events since the last checkpoint.
In the scenario you shared, the last checkpoint was checkpoint 2, which
occurred befor
available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351546
We would like to thank all contributors of the Apache Flink community who
made this release possible!
Regards,
David Anderson
ption:
> org.apache.flink.api.common.InvalidProgramException: Table program cannot
> be compiled. This is a bug. Please file an issue.
>
> at
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:94)
> ~[flink-table-runtime-1.15.0.jar:1.15.0]
>
A Table can have at most one time attribute. In your Table the proc_time
column is a processing time attribute, and when you define a watermark on
the event_time column then that column becomes an event-time attribute.
If you want to combine event time and processing time, you can use
the PROCTIME
Sweta,
Flink does not include watermarks in savepoints, nor are they included in
aligned checkpoints. For what it's worth, I believe that with unaligned
checkpoints in-flight watermarks are included in checkpoints, but I don't
believe that would solve the problem, since the watermark strategy's st
I've taken care of this.
David
On Sun, May 22, 2022 at 4:12 AM Shubham Bansal
wrote:
> Hi Everyone,
>
> I am not sure who to reach out for the reviews of these changesets, so I
> am putting this on the mailing list here.
>
> I have raised the review for
> FLINK-27507 - https://github.com/apache
t; Great, that all makes sense to me. Thanks again.
>
> On Thu, May 19, 2022 at 11:42 AM David Anderson
> wrote:
> >
> > Sure, happy to try to help.
> >
> > What's happening with the hadoop filesystem is that before it writes
> each key it checks to see if the
sto plugin with S3 I would
> conclude that this increases the size of the cluster that would
> require entropy injection, yes? But that it doesn't necessarily get
> rid of the need because one could have a large enough cluster and say
> a lifecycle policy that could still end up requi
Aeden, this is probably happening because you are using the Hadoop
implementation of S3.
The Hadoop S3 filesystem tries to imitate a filesystem on top of S3. In so
doing it makes a lot of HEAD requests. These are expensive, and they
violate read-after-create visibility, which is what you seem to b
ths of any and all checkpoint
data files. With presto the metadata path won't include "_entropy_" at all
(it will disappear, rather than being replaced by something specific).
For point 2, I'm not sure.
David
On Thu, May 19, 2022 at 2:37 PM David Anderson wrote:
> This so
This sounds like it could be FLINK-17359 [1]. What version of Flink are you
using?
Another likely explanation arises from the fact that only the
checkpoint data files (the ones created and written by the task managers)
will have the _entropy_ replaced. The job manager does not inject entropy
into
This sounds like it might be a use case for something like a
KeyedCoProcessFunction (or possibly a KeyedBroadcastProcessFunction,
depending on the details). These operators can receive inputs from two
different sources, and share state between them.
The rides and fares exercise [1] from the flink-
pared
>>> >> to the user-zh@ ML, which I'd attribute to the improvement of
>>> interaction
>>> >> experiences. Admittedly, there are questions being repeatedly asked &
>>> >> answered, but TBH I don't think that compares to the benefit
I have mixed feelings about this.
I have been rather visible on stack overflow, and as a result I get a lot
of DMs asking for help. I enjoy helping, but want to do it on a platform
where the responses can be searched and shared.
It is currently the case that good questions on stack overflow frequ
Alexis,
Compaction isn't an all-at-once procedure. RocksDB is organized as a series
of levels, each 10x larger than the one below. There are a few different
compaction algorithms available, and they are tunable, but what's typically
happening during compaction is that one SST file at level n is be
The DataStream API's BATCH execution mode first sorts by key, and within
each key, it sorts by timestamp. By operating this way, it only needs to
keep state for one key at a time, so this keeps the runtime simple and
efficient.
Regards,
David
P.S. I see you also asked this question on stack overf
Ty,
Usually what's done is to run a separate instance of the app to handle the
re-ingestion of the historic data while another instance is processing live
data. That way the backfill job won't be confused by observing events with
recent timestamps -- it will only see the historic data. But you wil
In this situation, changing your configuration [1] to include
cluster.evenly-spread-out-slots: true
should change the scheduling behavior to what you are looking for.
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#cluster-evenly-spread-out-slots
Regar
I remember a Flink Forward talk several years ago where the speaker shared
how they were running on spot instances. They were catching the
notification that the instance was being shutdown, taking a savepoint, and
relaunching. They were also proactively monitoring spot instance prices
around the wo
Matthias,
You can use a CROSS JOIN UNNEST, as mentioned very briefly in the docs [1].
Something like this should work:
SELECT
id, customerid, productid, quantity, ...
FROM
orders
CROSS JOIN UNNEST(entries) AS items (productid, quantity, unit_price,
discount);
[1]
https://nightlies.apache.or
ts will be dramatically smaller.
David
On Wed, Feb 16, 2022 at 10:17 PM David Anderson
wrote:
> I'm afraid not. The DataStream window implementation uses internal APIs to
> manipulate the state backend namespace, which isn't possible to do with the
> public-facing API. And without this
I'm afraid not. The DataStream window implementation uses internal APIs to
manipulate the state backend namespace, which isn't possible to do with the
public-facing API. And without this, you can't implement this as
efficiently.
David
On Wed, Feb 16, 2022 at 12:04 PM Ruibin Xing wrote:
> Hi,
>
You are probably running with Java 11 (with Java 8 these messages aren't
produced). The Flink docs [1] say
These warnings are considered harmless and will be addressed in future
Flink releases.
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.10/#java-11-su
Flink uses watermarks to indicate when a stream has become complete up
through some point in time. Various operations on streams wait for
watermarks in order to know when they can safely stop waiting for
further input, and so go ahead and produce their results. These
operations include event-time w
Before Kafka introduced their universal client, Flink had version-specific
connectors, e.g., for versions 0.8, 0.9, 0.10, and 0.11. Those were
eventually removed in favor of FlinkKafkaConsumer, which is/was backward
compatible back to Kafka version 0.10.
FlinkKafkaConsumer itself was deprecated in
I agree.
The Twitter connector is used in a few (unofficial) tutorials, so if we
remove it that will make it more difficult for those tutorials to be
maintained. On the other hand, if I recall correctly, that connector uses
V1 of the Twitter API, which has been deprecated, so it's really not very
Hussein,
To use a JsonRowDeserializationSchema you'll need to use the Table API, and
not DataStream.
You'll want to use a JsonRowSchemaConverter to convert your json schema
into the TypeInformation needed by Flink, which is done for you by
the JsonRowDeserializationSchema builder:
json_row_s
For questions like this one, please address them to either Stack Overflow
or the user mailing list, but not both at once. Those two forums are
appropriate places to get help with using Flink's APIs. And once you've
asked a question, please allow some days for folks to respond before trying
again.
Another approach that I find quite natural is to use Flink's Stateful
Functions API [1] for model serving, and this has some nice advantages,
such as zero-downtime deployments of new models, and the ease with which
you can use Python. [2] is an example of this approach.
[1] https://flink.apache.or
One way to solve this with Flink SQL would be to use MATCH_RECOGNIZE. [1]
is an example illustrating a very similar use case.
[1] https://stackoverflow.com/a/62122751/2000823
On Fri, Jan 7, 2022 at 11:32 AM Ali Bahadir Zeybek
wrote:
> Hello Hans,
>
> If you would like to see some hands-on examp
Most of the inquiries I've had about Gelly in recent memory have been from
folks looking for a streaming solution, and it's only been a handful.
+1 for dropping Gelly
David
On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann wrote:
> I haven't seen any changes or requests to/for Gelly in ages. Hence,
Event ordering in Flink is only maintained between pairs of events that
take exactly the same path through the execution graph. So if you
have multiple instances of A (let's call them A1 and A2), each broadcasting
a partition of the total rule space, then one instance of B (B1) might
receive rule1
org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
>
> On Mon, Nov 22, 2021 at 10:37 AM David Anderson
> wrote:
>
>> I've seen a few questions recently from folks migrating from
>> FlinkKafkaConsumer to KafkaSource that make me suspec
I've seen a few questions recently from folks migrating from
FlinkKafkaConsumer to KafkaSource that make me suspect that something has
changed.
In FlinkKafkaConsumerBase we have this code which sets a source subtask to
idle if all of its partitions are empty when the subtask starts:
// ma
>
> It seems to work for some jobs and not for others. Maybe jobs with little
> or empty state don't have _entropy_ swapped out correctly?
This is done by design. As the documentation explains:
The Flink runtime currently passes the option to inject entropy only to
> checkpoint data files. All
Another possibility, if you know in advance the values of the keys, is to
find a mapping that transforms the original keys into new keys that will,
in fact, end up in disjoint key groups that will, in turn, be assigned to
different slots (given a specific parallelism). This is ugly, but feasible.
FYI, I've responded to this on stack overflow:
https://stackoverflow.com/questions/68715430/apache-flink-accessing-keyed-state-from-late-window
On Mon, Aug 9, 2021 at 3:16 AM suman shil wrote:
> I am writing a Flink application which consumes time series data from
> kafka topic. Time series dat
I am hearing quite often from users who are struggling to manage memory
usage, and these are all users using RocksDB. While I don't know for
certain that RocksDB is the cause in every case, from my perspective,
getting the better memory stability of version 6.20 in place is critical.
Regards,
Davi
The StreamingFileSink requires that you have checkpointing enabled. I'm
guessing that you don't have checkpointing enabled, since that would
explain the behavior you are seeing.
The relevant section of the docs [1] explains:
Checkpointing needs to be enabled when using the StreamingFileSink. Part
By the way, views that use MATCH_RECOGNIZE don't work in Flink 1.11. [1]
[1] https://issues.apache.org/jira/browse/FLINK-20077
On Thu, May 13, 2021 at 11:06 AM David Anderson
wrote:
> I was able to get something like this working, but only by introducing a
> view:
>
> CREATE T
I was able to get something like this working, but only by introducing a
view:
CREATE TEMPORARY VIEW mmm AS SELECT id FROM events MATCH_RECOGNIZE (...);
SELECT * FROM event WHERE id IN (SELECT id FROM mmm);
Regards,
David
On Tue, May 11, 2021 at 9:22 PM Tejas wrote:
> Hi,
> I am using flink 1
Well, I was thinking you could have avoided overwhelming your internal
services by using something like Flink's async i/o operator, tuned to limit
the total number of concurrent requests. That way the pipeline could have
uniform parallelism without overwhelming those services, and then you'd
rely o
Interesting. So if I understand correctly, basically you limited the
parallelism of the sources in order to avoid running the job with constant
backpressure, and then scaled up the windows to maximize throughput.
On Tue, May 4, 2021 at 11:23 PM vishalovercome wrote:
> In one of my jobs, windowin
Could you describe a situation in which hand-tuning the parallelism of
individual operators produces significantly better throughput than the
default approach? I think it would help this discussion if we could have a
specific use case in mind where this is clearly better.
Regards,
David
On Tue, M
I think you'd be better off using the State Processor API [1] instead. The
State Processor API has cleaner semantics -- as you'll be seeing a
self-consistent snapshot of all the state -- and it's also much more
performant.
Note also that the Queryable State API is "approaching end of life" [2].
Th
eprecated and isn't supported by
the savepoint API.
David
On Fri, Apr 30, 2021 at 5:42 PM Abdullah bin Omar <
abdullahbinoma...@gmail.com> wrote:
> Hi,
>
> So, can't we extract all previous savepoint data by using
> ExistingSavepoint?
>
>
> Thank you
>
>
>
>
> I am trying to *import*
> org.apache.flink.training.exercises.common.sources.TaxiFareGenerator;
> However, it can not resolve.
>
> What is dependency (in pom.xml) for the org.apache.flink.training?
>
>
> Thank you
>
> On Fri, Apr 30, 2021 at 10:12 AM David Anderso
file location will be the location
> that set up in the flink conf and FileSystemBackend will have to use
> instead of MemoryStateBackend. *is this correct?*
>
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html
>
> [2]
>
You can read about assigning unique IDs to stateful operators in the docs
[1][2]. What the uid() method does is to establish a stable and unique
identifier for a stateful operator. Then as you evolve your application,
this helps ensure that future versions of your job will be able to restore
savepo
The isBackPressured metric is a Boolean -- it reports true or false, rather
than 1 or 0. The Flink web UI can not display it (it shows NaN); perhaps
the same is true for Datadog.
https://issues.apache.org/jira/browse/FLINK-15753 relates to this.
Regards,
David
On Tue, Apr 13, 2021 at 12:13 PM Cl
Apr 25, 2021 at 12:25 PM Omngr
wrote:
> Thank you David. That's perfect.
>
> Now, I'm just worried about the state size. State size will grow forever.
> There is no TTL.
>
> 24 Nis 2021 Cmt 17:42 tarihinde David Anderson
> şunu yazdı:
>
>&g
for bootstrapping rocksdb state?
>
> David Anderson , 24 Nis 2021 Cmt, 15:43 tarihinde
> şunu yazdı:
>
>> Oguzhan,
>>
>> Note, the state size is very large and I have to feed the state from
>>> batch flow firstly. Thus I can not use the internal state like ro
Oguzhan,
Note, the state size is very large and I have to feed the state from batch
> flow firstly. Thus I can not use the internal state like rocksdb.
How large is "very large"? Using RocksDB, several users have reported
working with jobs using many TBs of state.
And there are techniques for b
Abdullah,
ReadRidesAndFaresSnapshot [1] is an example that shows how to use the State
Processor API to display the contents of a snapshot taken while running
RidesAndFaresSolution [2].
Hopefully that will help you get started.
[1]
https://github.com/ververica/flink-training/blob/master/state-pro
The withIdleness option does not attempt to handle situations where all of
the sources are idle.
Flink operators with multiple input channels keep track of the current
watermark from each channel, and use the minimum of these watermarks as
their own watermark. withIdleness marks idle channels as i
Yes, since the two streams have the same type, you can union the two
streams, key the resulting stream, and then apply something like a
RichFlatMapFunction. Or you can connect the two streams (again, they'll
need to be keyed so you can use state), and apply a RichCoFlatMapFunction.
You can use whic
Prometheus is a metrics system; you can use Flink's Prometheus metrics
reporter to send metrics to Prometheus.
Grafana can also be connected to influxdb, and to databases like mysql and
postgresql, for which sinks are available.
And the Elasticsearch sink can be used to create visualizations with
There needs to be a Flink session cluster available to the SQL client on
which it can run the jobs created by your queries. See the Getting Started
[1] section of the SQL Client documentation for more information:
The SQL Client is bundled in the regular Flink distribution and thus
runnable out-of
1 - 100 of 226 matches
Mail list logo