Re: Issues with Kafka-based Exactly Once Processing and taskmanager.numberOfTaskSlots

2024-12-09 Thread Rion Williams
It's probably worth noting that these jobs were originally run with taskmanager.numberOfTaskSlots of 1 before increasing to 2, which may also explain the issue. I thought I'd mention it for context in case it's relevant. On Mon, Dec 9, 2024 at 8:13 AM Rion Williams wrote: &

Issues with Kafka-based Exactly Once Processing and taskmanager.numberOfTaskSlots

2024-12-09 Thread Rion Williams
Hi all, In trying to optimize the performance of some of the existing Flink jobs that are running in production environments, I've recently done some experimenting with taking advantage of the taskmanager.numberOfTaskSlots configuration for some of my Flink jobs and noticed an issue. It appears w

Re: flink operator - restart a pipeline from a manually trigger savepoint

2024-08-14 Thread Rion Williams
Hi Sigalit, To restart from an explicit savepoint, you need to not only set your initialSavepointPath but also the savepointRedeployNonce which should trigger the job to restart using that savepoint. Try setting/incrementing that property and you should see the job run from your manual savepo

Re: Elasticsearch 8.x Connector in Maven

2024-08-05 Thread Rion Williams
, at 9:39 AM, Rion Williams wrote:Hi all,I’m bubbling this request up from a thread that I had started on both the Flink Slack channel as well as the User mailing list . I recently had the need to upgrade a series of my Flink jobs that heavily interact with Elasticsearch and noticed that many

Re: Elasticsearch 8.x Connector in Maven

2024-07-31 Thread Rion Williams
...@flink.apache.orgBest RegardsAhmed HamdyOn Wed, 31 Jul 2024 at 15:06, Rion Williams <rionmons...@gmail.com> wrote:Hi again all, Just following up on this as I’ve scoured around trying to find any documentation for using the ES 8.x connector, however everything only appears to reference 6/7. The ES 8.x seems t

Re: Elasticsearch 8.x Connector in Maven

2024-07-31 Thread Rion Williams
like to avoid doing something like forking the bits I need in my local repository if possible and building it on my own. Thanks in advance, Rion > On Jul 30, 2024, at 1:00 PM, Rion Williams wrote: > > Hi all, > > I see that the Elasticsearch Connector for 8.x is supported pe

Elasticsearch 8.x Connector in Maven

2024-07-30 Thread Rion Williams
Hi all, I see that the Elasticsearch Connector for 8.x is supported per the repo (and completed JIRAs). Is there a way to reference this via Maven? Or is it required to build the connector from the source directly? We recently upgraded an Elasticsearch cluster to 8.x and some of the writes are

Re: flink kubernetes flink autoscale behavior

2024-06-24 Thread Rion Williams
Hi Eric,I believe you might be referring to use of the adaptive scheduler which should support these “in-place” scaling operations via:jobmanager.scheduler: adaptiveYou can see the documentation for Elastic Scaling here for additional details and configuration.On Jun 24, 2024, at 11:56 PM, Enric Ot

Re: Using Custom JSON Formatting with Flink Operator

2024-02-22 Thread Rion Williams
feel like in this case I'd need to share the lib directory for the image running the job to copy it over from the original image. Does that make sense? Is there a better way to handle this in this scenario? On Thu, Feb 22, 2024 at 6:31 AM Rion Williams wrote: > Correct! Building a custo

Re: Using Custom JSON Formatting with Flink Operator

2024-02-22 Thread Rion Williams
> in the lib folder (don’t confuse this with the usrlib folder). > > Kind Regards > Dominik > From: Rion Williams > Date: Thursday, 22 February 2024 at 13:09 > To: Bünzli Dominik, INI-DNA-INF > Cc: user@flink.apache.org > Subject: Re: Using Custom JSON Formatting with Fli

Re: Using Custom JSON Formatting with Flink Operator

2024-02-22 Thread Rion Williams
that I also needed to add the additional > dependencies (I guess JsonTemplateLayout is one of them) to the lib folder of > the deployment. > > Kind regards > Dominik > > From: Rion Williams > Date: Thursday, 22 February 2024 at 00:46 > To: Flink User List &g

Using Custom JSON Formatting with Flink Operator

2024-02-21 Thread Rion Williams
Hey Flinkers, Recently I’ve been in the process of migrating a series of older Flink jobs to use the official operator and have run into a snag on the logging front. I’ve attempted to use the following configuration for the job: ``` logConfiguration: log4j-console.properties: |+ rootLogge

Handling Batched Failures in ElasticsearchSink

2023-03-23 Thread Rion Williams
Hi all, I have a pipeline that is currently reading from Kafka and writing to Elasticsearch. I recently was doing some testing for how it handles failures and was wondering if there’s a best practice or recommendation for doing so. Specifically, if I have a batch of 100 records being sent via a

Re: Handling JSON Serialization without Kryo

2023-03-22 Thread Rion Williams
, Ken Krugler wrote:Hi Rion,I’m using Gson to deserialize to a Map.1-2 records/second sounds way too slow, unless each record is enormous.— KenOn Mar 21, 2023, at 6:18 AM, Rion Williams <rionmons...@gmail.com> wrote:Hi Ken,Thanks for the response. I hadn't tried exploring the use of the R

Re: Handling JSON Serialization without Kryo

2023-03-20 Thread Rion Williams
and other types supported by flink[1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/Best,Shammon FYOn Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com> wrote:Hi all, I’m reaching out today for some suggestions (and hopefully a solutio

Handling JSON Serialization without Kryo

2023-03-18 Thread Rion Williams
Hi all, I’m reaching out today for some suggestions (and hopefully a solution) for a Flink job that I’m working on. The job itself reads JSON strings from a Kafka topic and reads those into JSONObjects (currently via Gson), which are then operated against, before ultimately being written out t

Handling JSON Serialization without Kryo

2023-03-18 Thread Rion Williams
Hi all, I’m reaching out today for some suggestions (and hopefully a solution) for a Flink job that I’m working on. The job itself reads JSON strings from a Kafka topic and reads those into JSONObjects (currently via Gson), which are then operated against, before ultimately being written out to

Flink Forward Session Question

2022-12-31 Thread Rion Williams
Hey Flinkers, Firstly, early Happy New Year’s to everyone in the community. I’ve been digging a bit into exactly-once processing with Flink and Pinot and I came across this session from Flink Foward last year: - https://www.slideshare.net/FlinkForward/exactlyonce-financial-data-processing-at-s

Re: Question Regarding State Migrations in Ververica Platform

2022-08-31 Thread Rion Williams
+dev > On Aug 30, 2022, at 11:20 AM, Rion Williams wrote: > >  > Hi all, > > I wasn't sure if this would be the best audience, if not, please advise if > you know of a better place to ask it. I figured that at least some folks here > either work for Verve

Question Regarding State Migrations in Ververica Platform

2022-08-30 Thread Rion Williams
Hi all, I wasn't sure if this would be the best audience, if not, please advise if you know of a better place to ask it. I figured that at least some folks here either work for Ververica or might have used their platform. *tl;dr; I'm trying to migrate an existing stateful Flink job to run in Verv

Exception Handling in ElasticsearchSink

2022-04-21 Thread Rion Williams
Hi all, I've recently been encountering some issues that I've noticed in the logs of my Flink job that handles writing to an Elasticsearch index. I was hoping to leverage some of the metrics that Flink exposes (or piggyback on them) to update metric counters when I encounter specific kinds of erro

Re: [ANNOUNCE] Apache Flink 1.14.0 released

2021-09-29 Thread Rion Williams
Great news all! Looking forward to it! > On Sep 29, 2021, at 10:43 AM, Theo Diefenthal > wrote: > >  > Awesome, thanks for the release. > > - Ursprüngliche Mail - > Von: "Dawid Wysakowicz" > An: "dev" , "user" , > annou...@apache.org > Gesendet: Mittwoch, 29. September 2021 15:59:47

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-09-04 Thread Rion Williams
ing extra > that the unit test doesn't in this case, so I'd recommend it as the next > step (I'm also bit concerned that this test would take a long time to > execute / be resource intensive as it would need to spawn more elastic > clusters?). > > Best, > D. >

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-26 Thread Rion Williams
use-cases, but also don't want to go overkill. Thanks again for all of your help, Rion On Wed, Aug 25, 2021 at 2:10 PM Rion Williams wrote: > Thanks again David, > > I've spun up a JIRA issue for the ticket > <https://issues.apache.org/jira/browse/FLINK-23977> while I wo

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread Rion Williams
on" as data. Most complete generic work on this topic > that I'm aware of are Splittable DoFn based IOs in Apache Beam. > > I think the best module for the contribution would be > "elasticsearch-base", because this could be easily reused for all ES > versions that we c

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread Rion Williams
[3] > https://github.com/apache/flink/blob/release-1.13.2/flink-connectors/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBaseTest.java > > Best, > D. > > On Tue, Aug 24, 2021 at 12:03 AM Rion Williams > wrote: &

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-23 Thread Rion Williams
element.f0.httpHosts, > (ElasticsearchSinkFunction< > >Tuple2>) >(el, ctx, indexer) -> { >// Constru

Re: Handling HTTP Requests for Keyed Streams

2021-08-17 Thread Rion Williams
mentioned that the configuration fetching is very infrequent, why > don't you use a blocking approach to send HTTP requests and receive > responses? This seems like a more reasonable solution to me. > > Rion Williams 于2021年8月17日周二 上午4:00写道: >> Hi all, >> >> I&

Handling HTTP Requests for Keyed Streams

2021-08-16 Thread Rion Williams
Hi all, I've been exploring a few different options for storing tenant-specific configurations within Flink state based on the messages I have flowing through my job. Initially I had considered creating a source that would periodically poll an HTTP API and connect that stream to my original event

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-16 Thread Rion Williams
wrote: > To give you a better idea, in high-level I think could look something like > this <https://gist.github.com/dmvk/3f8124d585cd33e52cacd4a38b80f8c8> [1]. > > [1] https://gist.github.com/dmvk/3f8124d585cd33e52cacd4a38b80f8c8 > > On Fri, Aug 13, 2021 at 2:57 PM Rion Williams &g

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-13 Thread Rion Williams
n to Flink (I'm able to provide some guidance). > > [1] > https://github.com/apache/flink/blob/release-1.13.2/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkFunction.java > > Best, > D. &

Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-08 Thread Rion Williams
Hi folks, I have a use-case that I wanted to initially pose to the mailing list as I’m not terribly familiar with the Elasticsearch connector to ensure I’m not going down the wrong path trying to accomplish this in Flink (or if something downstream might be a better option). Basically, I have

Re: Dead Letter Queue for JdbcSink

2021-08-03 Thread Rion Williams
iek > > wt., 3 sie 2021 o 18:07 Rion Williams napisał(a): >> >> Thanks Maciek, >> >> It looks like my initial issue had another problem with a bad interface that >> was being used (or an improper one), but after changing that and ensuring >> all of th

Re: Dead Letter Queue for JdbcSink

2021-08-03 Thread Rion Williams
> > As far as I see, you're not overriding functions like open, > setRuntimeContext, snapshotState, initializeState - the calls needs to > be passed to the inner sink function. > > pon., 2 sie 2021 o 19:31 Rion Williams napisał(a): > > > > Hi again Maciek (and all), &g

Re: Dead Letter Queue for JdbcSink

2021-08-02 Thread Rion Williams
functions like: open, close, > notifyCheckpointComplete, snapshotState, initializeState and > setRuntimeContext. > > The problem is that if you want to catch problematic record you need > to set batch size to 1, which gives very bad performance. > > Regards, > Maciek >

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
my own JdbcSink?) Thanks, Rion On Wed, Jul 14, 2021 at 9:56 AM Maciej Bryński wrote: > Hi Rion, > We have implemented such a solution with Sink Wrapper. > > > Regards, > Maciek > > śr., 14 lip 2021 o 16:21 Rion Williams napisał(a): > > > > Hi all, > &g

Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
Hi all, Recently I've been encountering an issue where some external dependencies or process causes writes within my JDBCSink to fail (e.g. something is being inserted with an explicit constraint that never made it's way there). I'm trying to see if there's a pattern or recommendation for handling

Handling Large Broadcast States

2021-06-16 Thread Rion Williams
Hey Flink folks, I was discussing the use of the Broadcast Pattern with some colleagues today for a potential enrichment use-case and noticed that it wasn’t currently backed by RocksDB. This seems to indicate that it would be solely limited to the memory allocated, which might not support a lar

Guidance for Integration Tests with External Technologies

2021-05-18 Thread Rion Williams
Hey all, I’ve been taking a very TDD-oriented approach to developing many of the Flink apps I’ve worked on, but recently I’ve encountered a problem that has me scratching my head. A majority of my integration tests leverage a few external technologies such as Kafka and typically a relational d

Re: Handling "Global" Updating State

2021-05-17 Thread Rion Williams
o executing the Flink pipeline >> to ensure synchronicity) >> - Use this to initialize the state of my broadcast stream (if possible) >> - At this point that stream would be broadcasting any new records coming in, >> so I “should” stay up to date at that point. >&

Re: Handling "Global" Updating State

2021-05-16 Thread Rion Williams
obviously better / well known approach to handling this? Thanks, Rion > On May 14, 2021, at 9:51 AM, Rion Williams wrote: > >  > Hi all, > > I've encountered a challenge within a Flink job that I'm currently working > on. The gist of it is that I have a job tha

Handling "Global" Updating State

2021-05-14 Thread Rion Williams
Hi all, I've encountered a challenge within a Flink job that I'm currently working on. The gist of it is that I have a job that listens to a series of events from a Kafka topic and eventually sinks those down into Postgres via the JDBCSink. A requirement recently came up for the need to filter th

Capturing Statement Execution / Results within JdbcSink

2021-03-19 Thread Rion Williams
Hey all, I've been working with JdbcSink and it's really made my life much easier, but I had two questions about it that folks might be able to answer or provide some clarity around. *Accessing Statement Execution / Results* Is there any mechanism in place (or out of the box) to support reading

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
oup = > new InterceptingTaskMetricGroup() { > @Overridepublic OperatorMetricGroup > getOrAddOperator(OperatorID id, String name) { > return operatorMetricGroup; } > };new MockEnvironmentBuilder() > .setMetric

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
6, 2021 at 9:36 AM Chesnay Schepler wrote: > Are you actually running a job, or are you using a harness for testing > your function? > > On 3/16/2021 3:24 PM, Rion Williams wrote: > > Hi Chesnay, > > Thanks for the prompt response and feedback, it's very much appreci

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
2 reporter instances; one for the JM and > one for the TM. > To remedy this, I would recommend creating a factory that returns a static > reporter instance instead; overall this tends to be cleaner. > > Alternatively, when using the testing harnesses IIRC you can also set set > a

Unit Testing for Custom Metrics in Flink

2021-03-15 Thread Rion Williams
Hi all, Recently, I was working on adding some custom metrics to a Flink job that required the use of dynamic labels (i.e. capturing various counters that were "slicable" by things like tenant / source, etc.). I ended up handling it in a very naive fashion that would just keep a dictionary of met

Re: Handling Bounded Sources with KafkaSource

2021-03-13 Thread Rion Williams
). Any ideas/recommendations/workarounds would be greatly welcome and I’d be happy to share my specific code / use-cases if needed. Thanks much, Rion > On Mar 12, 2021, at 10:19 AM, Rion Williams wrote: > >  > Hi all, > > I've been using the KafkaSource API as opposed

Handling Bounded Sources with KafkaSource

2021-03-12 Thread Rion Williams
Hi all, I've been using the KafkaSource API as opposed to the classic consumer and things have been going well. I configured my source such that it could be used in either a streaming or bounded mode, with the bounded approach specifically aimed at improving testing (unit/integration). I've notic

Request for Flink JIRA Access

2021-03-07 Thread Rion Williams
Hey folks, The community here has been awesome with my recent questions about Flink, so I’d like to give back. I’m already a member of the ASF JIRA but I was wondering if I could get access to the Flink Project. I’ve contributed a good bit to Apache Beam in the past, but I figured that I’ll b

Dynamic JDBC Sink Support

2021-03-05 Thread Rion Williams
Hi all, I’ve been playing around with a proof-of-concept application with Flink to assist a colleague of mine. The application is fairly simple (take in a single input and identify various attributes about it) with the goal of outputting those to separate tables in Postgres: object AttributeIdent

Re: Defining GlobalJobParameters in Flink Unit Testing Harnesses

2021-03-04 Thread Rion Williams
ion automatically calls > open(), thus any mutations on the harness happen too late. > > I'd suggest to take a look at the implementation of that method and > essentially copy the code. > You can then call the harness constructor manually and mutate the > execution config before c

Re: Defining GlobalJobParameters in Flink Unit Testing Harnesses

2021-03-04 Thread Rion Williams
2021 at 6:47 AM Chesnay Schepler wrote: > Could you show us how you create test harness? > > On 3/4/2021 5:13 AM, Rion Williams wrote: > > Hi all, > > Early today I had asked a few questions regarding the use of the many > testing constructs available within Flink and b

Defining GlobalJobParameters in Flink Unit Testing Harnesses

2021-03-03 Thread Rion Williams
Hi all, Early today I had asked a few questions regarding the use of the many testing constructs available within Flink and believe that I have things in a good direction at present. I did run into a specific case that either may not be supported, or just isn't documented well enough for me to det

Re: Unit Testing State Stores in KeyedProcessFunctions

2021-03-03 Thread Rion Williams
s like that anyway) > > On 3/3/2021 8:10 PM, Rion Williams wrote: >> Hi all! >> >> Is it possible to apply assertions against the underlying state stores >> within a KeyedProcessFunction using the existing >> KeyedOneInputStreamOperatorTestHarness class within

Unit Testing State Stores in KeyedProcessFunctions

2021-03-03 Thread Rion Williams
Hi all! Is it possible to apply assertions against the underlying state stores within a KeyedProcessFunction using the existing KeyedOneInputStreamOperatorTestHarness class within unit tests? Basically I wanted to ensure that if I passed in two elements each with unique keys that I would be able t

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-03-01 Thread Rion Williams
t's a fairly common use-case for testing, but maybe not? Thanks much! Rion On Sat, Feb 27, 2021 at 10:56 AM Rion Williams wrote: > Thanks David, > > I figured that the correct approach would obviously be to adopt a keying > strategy upstream to ensure the same data that I used a

Re: Using Prometheus Client Metrics in Flink

2021-02-28 Thread Rion Williams
ting metadata from context into metric labels. > > If this doesn't work for you, then consider encoding tenant identifier > into job names, and extract this identifier in a metric_relabel_config [2] > > [0]: https://github.com/prometheus/node_exporter/issues/319 > [1]: > ht

Re: Using Prometheus Client Metrics in Flink

2021-02-28 Thread Rion Williams
]: https://github.com/prometheus/node_exporter/issues/319 > [1]: > https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config > [2]: > https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs > > > Fro

Re: Using Prometheus Client Metrics in Flink

2021-02-28 Thread Rion Williams
rks is you are using the > metric counter. > > Prasanna. > >> On Sat, Feb 27, 2021 at 9:01 PM Rion Williams wrote: >> Hi folks, >> >> I’ve just recently started working with Flink and I was in the process of >> adding some metrics through my existing

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-27 Thread Rion Williams
the processing you need to > do. Otherwise, a simple change that may help would be to increase the bounded > delay you use in calculating your own per-tenant watermarks, thereby making > late events less likely. > > David > >> On Sat, Feb 27, 2021 at 3:29 AM Rion William

Using Prometheus Client Metrics in Flink

2021-02-27 Thread Rion Williams
Hi folks, I’ve just recently started working with Flink and I was in the process of adding some metrics through my existing pipeline with the hopes of building some Grafana dashboards with them to help with observability. Initially I looked at the built-in Flink metrics that were available, but

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-26 Thread Rion Williams
that Flink provides but calling >> your own logic based on the timestamps that enter your process function >> and the stored state. >> >> Regards, >> Timo >> >> >> On 26.02.21 00:29, Rion Williams wrote: >> >  >> > Hi David, >

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread Rion Williams
aightforward. > > Hope this helps, > David > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/learn-flink/event_driven.html#example > >> On Thu, Feb 25, 2021 at 9:05 PM Rion Williams wrote: >> Hey folks, I have a somewhat high-level/a

Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread Rion Williams
Hey folks, I have a somewhat high-level/advice question regarding Flink and if it has the mechanisms in place to accomplish what I’m trying to do. I’ve spent a good bit of time using Apache Beam, but recently pivoted over to native Flink simply because some of the connectors weren’t as mature or di