It's probably worth noting that these jobs were originally run with
taskmanager.numberOfTaskSlots of 1 before increasing to 2, which may also
explain the issue. I thought I'd mention it for context in case it's
relevant.
On Mon, Dec 9, 2024 at 8:13 AM Rion Williams wrote:
&
Hi all,
In trying to optimize the performance of some of the existing Flink jobs
that are running in production environments, I've recently done some
experimenting with taking advantage of the taskmanager.numberOfTaskSlots
configuration for some of my Flink jobs and noticed an issue.
It appears w
Hi Sigalit,
To restart from an explicit savepoint, you need to not only set your
initialSavepointPath but also the savepointRedeployNonce which should trigger
the job to restart using that savepoint.
Try setting/incrementing that property and you should see the job run from your
manual savepo
, at 9:39 AM, Rion Williams wrote:Hi all,I’m bubbling this request up from a thread that I had started on both the Flink Slack channel as well as the User mailing list . I recently had the need to upgrade a series of my Flink jobs that heavily interact with Elasticsearch and noticed that many
...@flink.apache.orgBest RegardsAhmed HamdyOn Wed, 31 Jul 2024 at 15:06, Rion Williams <rionmons...@gmail.com> wrote:Hi again all,
Just following up on this as I’ve scoured around trying to find any documentation for using the ES 8.x connector, however everything only appears to reference 6/7.
The ES 8.x seems t
like to avoid doing something like forking
the bits I need in my local repository if possible and building it on my own.
Thanks in advance,
Rion
> On Jul 30, 2024, at 1:00 PM, Rion Williams wrote:
>
> Hi all,
>
> I see that the Elasticsearch Connector for 8.x is supported pe
Hi all,
I see that the Elasticsearch Connector for 8.x is supported per the repo (and
completed JIRAs). Is there a way to reference this via Maven? Or is it required
to build the connector from the source directly?
We recently upgraded an Elasticsearch cluster to 8.x and some of the writes are
Hi Eric,I believe you might be referring to use of the adaptive scheduler which should support these “in-place” scaling operations via:jobmanager.scheduler: adaptiveYou can see the documentation for Elastic Scaling here for additional details and configuration.On Jun 24, 2024, at 11:56 PM, Enric Ot
feel like in this case I'd need to share the lib directory for
the image running the job to copy it over from the original image. Does
that make sense? Is there a better way to handle this in this scenario?
On Thu, Feb 22, 2024 at 6:31 AM Rion Williams wrote:
> Correct! Building a custo
> in the lib folder (don’t confuse this with the usrlib folder).
>
> Kind Regards
> Dominik
> From: Rion Williams
> Date: Thursday, 22 February 2024 at 13:09
> To: Bünzli Dominik, INI-DNA-INF
> Cc: user@flink.apache.org
> Subject: Re: Using Custom JSON Formatting with Fli
that I also needed to add the additional
> dependencies (I guess JsonTemplateLayout is one of them) to the lib folder of
> the deployment.
>
> Kind regards
> Dominik
>
> From: Rion Williams
> Date: Thursday, 22 February 2024 at 00:46
> To: Flink User List
&g
Hey Flinkers,
Recently I’ve been in the process of migrating a series of older Flink jobs to
use the official operator and have run into a snag on the logging front.
I’ve attempted to use the following configuration for the job:
```
logConfiguration:
log4j-console.properties: |+
rootLogge
Hi all,
I have a pipeline that is currently reading from Kafka and writing to
Elasticsearch. I recently was doing some testing for how it handles failures
and was wondering if there’s a best practice or recommendation for doing so.
Specifically, if I have a batch of 100 records being sent via a
, Ken Krugler wrote:Hi Rion,I’m using Gson to deserialize to a Map.1-2 records/second sounds way too slow, unless each record is enormous.— KenOn Mar 21, 2023, at 6:18 AM, Rion Williams <rionmons...@gmail.com> wrote:Hi Ken,Thanks for the response. I hadn't tried exploring the use of the R
and other types supported by flink[1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/Best,Shammon FYOn Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com> wrote:Hi all,
I’m reaching out today for some suggestions (and hopefully a solutio
Hi all,
I’m reaching out today for some suggestions (and hopefully a solution) for a
Flink job that I’m working on. The job itself reads JSON strings from a Kafka
topic and reads those into JSONObjects (currently via Gson), which are then
operated against, before ultimately being written out t
Hi all,
I’m reaching out today for some suggestions (and hopefully a solution) for a
Flink job that I’m working on. The job itself reads JSON strings from a Kafka
topic and reads those into JSONObjects (currently via Gson), which are then
operated against, before ultimately being written out to
Hey Flinkers,
Firstly, early Happy New Year’s to everyone in the community. I’ve been digging
a bit into exactly-once processing with Flink and Pinot and I came across this
session from Flink Foward last year:
-
https://www.slideshare.net/FlinkForward/exactlyonce-financial-data-processing-at-s
+dev
> On Aug 30, 2022, at 11:20 AM, Rion Williams wrote:
>
>
> Hi all,
>
> I wasn't sure if this would be the best audience, if not, please advise if
> you know of a better place to ask it. I figured that at least some folks here
> either work for Verve
Hi all,
I wasn't sure if this would be the best audience, if not, please advise if
you know of a better place to ask it. I figured that at least some folks
here either work for Ververica or might have used their platform.
*tl;dr; I'm trying to migrate an existing stateful Flink job to run in
Verv
Hi all,
I've recently been encountering some issues that I've noticed in the logs
of my Flink job that handles writing to an Elasticsearch index. I was
hoping to leverage some of the metrics that Flink exposes (or piggyback on
them) to update metric counters when I encounter specific kinds of erro
Great news all! Looking forward to it!
> On Sep 29, 2021, at 10:43 AM, Theo Diefenthal
> wrote:
>
>
> Awesome, thanks for the release.
>
> - Ursprüngliche Mail -
> Von: "Dawid Wysakowicz"
> An: "dev" , "user" ,
> annou...@apache.org
> Gesendet: Mittwoch, 29. September 2021 15:59:47
ing extra
> that the unit test doesn't in this case, so I'd recommend it as the next
> step (I'm also bit concerned that this test would take a long time to
> execute / be resource intensive as it would need to spawn more elastic
> clusters?).
>
> Best,
> D.
>
use-cases, but also don't want to go overkill.
Thanks again for all of your help,
Rion
On Wed, Aug 25, 2021 at 2:10 PM Rion Williams wrote:
> Thanks again David,
>
> I've spun up a JIRA issue for the ticket
> <https://issues.apache.org/jira/browse/FLINK-23977> while I wo
on" as data. Most complete generic work on this topic
> that I'm aware of are Splittable DoFn based IOs in Apache Beam.
>
> I think the best module for the contribution would be
> "elasticsearch-base", because this could be easily reused for all ES
> versions that we c
[3]
> https://github.com/apache/flink/blob/release-1.13.2/flink-connectors/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBaseTest.java
>
> Best,
> D.
>
> On Tue, Aug 24, 2021 at 12:03 AM Rion Williams
> wrote:
&
element.f0.httpHosts,
> (ElasticsearchSinkFunction<
>
>Tuple2>)
>(el, ctx, indexer) -> {
>// Constru
mentioned that the configuration fetching is very infrequent, why
> don't you use a blocking approach to send HTTP requests and receive
> responses? This seems like a more reasonable solution to me.
>
> Rion Williams 于2021年8月17日周二 上午4:00写道:
>> Hi all,
>>
>> I&
Hi all,
I've been exploring a few different options for storing tenant-specific
configurations within Flink state based on the messages I have flowing
through my job. Initially I had considered creating a source that would
periodically poll an HTTP API and connect that stream to my original event
wrote:
> To give you a better idea, in high-level I think could look something like
> this <https://gist.github.com/dmvk/3f8124d585cd33e52cacd4a38b80f8c8> [1].
>
> [1] https://gist.github.com/dmvk/3f8124d585cd33e52cacd4a38b80f8c8
>
> On Fri, Aug 13, 2021 at 2:57 PM Rion Williams
&g
n to Flink (I'm able to provide some guidance).
>
> [1]
> https://github.com/apache/flink/blob/release-1.13.2/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkFunction.java
>
> Best,
> D.
&
Hi folks,
I have a use-case that I wanted to initially pose to the mailing list as I’m
not terribly familiar with the Elasticsearch connector to ensure I’m not going
down the wrong path trying to accomplish this in Flink (or if something
downstream might be a better option).
Basically, I have
iek
>
> wt., 3 sie 2021 o 18:07 Rion Williams napisał(a):
>>
>> Thanks Maciek,
>>
>> It looks like my initial issue had another problem with a bad interface that
>> was being used (or an improper one), but after changing that and ensuring
>> all of th
>
> As far as I see, you're not overriding functions like open,
> setRuntimeContext, snapshotState, initializeState - the calls needs to
> be passed to the inner sink function.
>
> pon., 2 sie 2021 o 19:31 Rion Williams napisał(a):
> >
> > Hi again Maciek (and all),
&g
functions like: open, close,
> notifyCheckpointComplete, snapshotState, initializeState and
> setRuntimeContext.
>
> The problem is that if you want to catch problematic record you need
> to set batch size to 1, which gives very bad performance.
>
> Regards,
> Maciek
>
my own
JdbcSink?)
Thanks,
Rion
On Wed, Jul 14, 2021 at 9:56 AM Maciej Bryński wrote:
> Hi Rion,
> We have implemented such a solution with Sink Wrapper.
>
>
> Regards,
> Maciek
>
> śr., 14 lip 2021 o 16:21 Rion Williams napisał(a):
> >
> > Hi all,
> &g
Hi all,
Recently I've been encountering an issue where some external dependencies
or process causes writes within my JDBCSink to fail (e.g. something is
being inserted with an explicit constraint that never made it's way there).
I'm trying to see if there's a pattern or recommendation for handling
Hey Flink folks,
I was discussing the use of the Broadcast Pattern with some colleagues today
for a potential enrichment use-case and noticed that it wasn’t currently backed
by RocksDB. This seems to indicate that it would be solely limited to the
memory allocated, which might not support a lar
Hey all,
I’ve been taking a very TDD-oriented approach to developing many of the Flink
apps I’ve worked on, but recently I’ve encountered a problem that has me
scratching my head.
A majority of my integration tests leverage a few external technologies such as
Kafka and typically a relational d
o executing the Flink pipeline
>> to ensure synchronicity)
>> - Use this to initialize the state of my broadcast stream (if possible)
>> - At this point that stream would be broadcasting any new records coming in,
>> so I “should” stay up to date at that point.
>&
obviously better / well known
approach to handling this?
Thanks,
Rion
> On May 14, 2021, at 9:51 AM, Rion Williams wrote:
>
>
> Hi all,
>
> I've encountered a challenge within a Flink job that I'm currently working
> on. The gist of it is that I have a job tha
Hi all,
I've encountered a challenge within a Flink job that I'm currently working
on. The gist of it is that I have a job that listens to a series of events
from a Kafka topic and eventually sinks those down into Postgres via the
JDBCSink.
A requirement recently came up for the need to filter th
Hey all,
I've been working with JdbcSink and it's really made my life much easier,
but I had two questions about it that folks might be able to answer or
provide some clarity around.
*Accessing Statement Execution / Results*
Is there any mechanism in place (or out of the box) to support reading
oup =
> new InterceptingTaskMetricGroup() {
> @Overridepublic OperatorMetricGroup
> getOrAddOperator(OperatorID id, String name) {
> return operatorMetricGroup; }
> };new MockEnvironmentBuilder()
> .setMetric
6, 2021 at 9:36 AM Chesnay Schepler wrote:
> Are you actually running a job, or are you using a harness for testing
> your function?
>
> On 3/16/2021 3:24 PM, Rion Williams wrote:
>
> Hi Chesnay,
>
> Thanks for the prompt response and feedback, it's very much appreci
2 reporter instances; one for the JM and
> one for the TM.
> To remedy this, I would recommend creating a factory that returns a static
> reporter instance instead; overall this tends to be cleaner.
>
> Alternatively, when using the testing harnesses IIRC you can also set set
> a
Hi all,
Recently, I was working on adding some custom metrics to a Flink job that
required the use of dynamic labels (i.e. capturing various counters that
were "slicable" by things like tenant / source, etc.).
I ended up handling it in a very naive fashion that would just keep a
dictionary of met
).
Any ideas/recommendations/workarounds would be greatly welcome and I’d be happy
to share my specific code / use-cases if needed.
Thanks much,
Rion
> On Mar 12, 2021, at 10:19 AM, Rion Williams wrote:
>
>
> Hi all,
>
> I've been using the KafkaSource API as opposed
Hi all,
I've been using the KafkaSource API as opposed to the classic consumer and
things have been going well. I configured my source such that it could be
used in either a streaming or bounded mode, with the bounded approach
specifically aimed at improving testing (unit/integration).
I've notic
Hey folks,
The community here has been awesome with my recent questions about Flink, so
I’d like to give back. I’m already a member of the ASF JIRA but I was wondering
if I could get access to the Flink Project.
I’ve contributed a good bit to Apache Beam in the past, but I figured that I’ll
b
Hi all,
I’ve been playing around with a proof-of-concept application with Flink to
assist a colleague of mine. The application is fairly simple (take in a
single input and identify various attributes about it) with the goal of
outputting those to separate tables in Postgres:
object AttributeIdent
ion automatically calls
> open(), thus any mutations on the harness happen too late.
>
> I'd suggest to take a look at the implementation of that method and
> essentially copy the code.
> You can then call the harness constructor manually and mutate the
> execution config before c
2021 at 6:47 AM Chesnay Schepler wrote:
> Could you show us how you create test harness?
>
> On 3/4/2021 5:13 AM, Rion Williams wrote:
>
> Hi all,
>
> Early today I had asked a few questions regarding the use of the many
> testing constructs available within Flink and b
Hi all,
Early today I had asked a few questions regarding the use of the many
testing constructs available within Flink and believe that I have things in
a good direction at present. I did run into a specific case that either may
not be supported, or just isn't documented well enough for me to det
s like that anyway)
>
> On 3/3/2021 8:10 PM, Rion Williams wrote:
>> Hi all!
>>
>> Is it possible to apply assertions against the underlying state stores
>> within a KeyedProcessFunction using the existing
>> KeyedOneInputStreamOperatorTestHarness class within
Hi all!
Is it possible to apply assertions against the underlying state stores
within a KeyedProcessFunction using the existing
KeyedOneInputStreamOperatorTestHarness class within unit tests? Basically I
wanted to ensure that if I passed in two elements each with unique keys
that I would be able t
t's a fairly common use-case for testing, but maybe not?
Thanks much!
Rion
On Sat, Feb 27, 2021 at 10:56 AM Rion Williams
wrote:
> Thanks David,
>
> I figured that the correct approach would obviously be to adopt a keying
> strategy upstream to ensure the same data that I used a
ting metadata from context into metric labels.
>
> If this doesn't work for you, then consider encoding tenant identifier
> into job names, and extract this identifier in a metric_relabel_config [2]
>
> [0]: https://github.com/prometheus/node_exporter/issues/319
> [1]:
> ht
]: https://github.com/prometheus/node_exporter/issues/319
> [1]:
> https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
> [2]:
> https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs
>
>
> Fro
rks is you are using the
> metric counter.
>
> Prasanna.
>
>> On Sat, Feb 27, 2021 at 9:01 PM Rion Williams wrote:
>> Hi folks,
>>
>> I’ve just recently started working with Flink and I was in the process of
>> adding some metrics through my existing
the processing you need to
> do. Otherwise, a simple change that may help would be to increase the bounded
> delay you use in calculating your own per-tenant watermarks, thereby making
> late events less likely.
>
> David
>
>> On Sat, Feb 27, 2021 at 3:29 AM Rion William
Hi folks,
I’ve just recently started working with Flink and I was in the process of
adding some metrics through my existing pipeline with the hopes of building
some Grafana dashboards with them to help with observability.
Initially I looked at the built-in Flink metrics that were available, but
that Flink provides but calling
>> your own logic based on the timestamps that enter your process function
>> and the stored state.
>>
>> Regards,
>> Timo
>>
>>
>> On 26.02.21 00:29, Rion Williams wrote:
>> >
>> > Hi David,
>
aightforward.
>
> Hope this helps,
> David
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/learn-flink/event_driven.html#example
>
>> On Thu, Feb 25, 2021 at 9:05 PM Rion Williams wrote:
>> Hey folks, I have a somewhat high-level/a
Hey folks, I have a somewhat high-level/advice question regarding Flink and
if it has the mechanisms in place to accomplish what I’m trying to do. I’ve
spent a good bit of time using Apache Beam, but recently pivoted over to
native Flink simply because some of the connectors weren’t as mature or
di
65 matches
Mail list logo