IllegalStateException: invalid BLOB

2024-05-21 Thread Lars Skjærven
Hello, We're facing the bug reported in https://issues.apache.org/jira/browse/FLINK-32212 More specifically, when kubernetes decides to drain a node, a job manager restart (but not the task manager), the job fails with: java.lang.IllegalStateException: The library registration references a diffe

Checkpointing while loading causing issues

2024-05-14 Thread Lars Skjærven
Hello, When restarting jobs (e.g. after upgrade) with "large" state a task can take some time to "initialize" (depending on the state size). During this time I noticed that Flink attempts to checkpoint. In many cases checkpointing will fail repeatedly, and cause the job to hit the tolerable-failed

There is no savepoint operation with triggerId

2024-03-25 Thread Lars Skjærven
Hello, My job manager is constantly complaining with the following error: "Exception occurred in REST handler: There is no savepoint operation with triggerId= for job bc0cb60b710e64e23195aa1610ad790a". "logger_name":"org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$Savepoint

Stream enrichment with ingest mode

2024-02-13 Thread Lars Skjærven
Dear all, A reoccurring challenge we have with stream enrichment in Flink is a robust mechanism to estimate that all messages of the source(s) have been consumed/processed before output is collected. A simple example is two sources of catalogue metadata: - source A delivers products, - source B d

Re: Metrics with labels

2023-10-30 Thread Lars Skjærven
s.apache.org/flink/flink-docs-release-1.17/docs/ops/metrics/#user-variables > > On 17/10/2023 14:31, Lars Skjærven wrote: > > Hello, > > We're experiencing difficulties in using Flink metrics in a generic way > since various properties are included in the name of the met

Metrics with labels

2023-10-17 Thread Lars Skjærven
Hello, We're experiencing difficulties in using Flink metrics in a generic way since various properties are included in the name of the metric itself. This makes it difficult to generate sensible (and general) dashboards (with aggregations). One example is the metric for rocksdb estimated live da

Kafka coordinator not available

2023-07-20 Thread Lars Skjærven
only issue that they are unsuccessful in committing group offsets. Restarting the job from checkpoint/savepoint resolves the issue, but I would rather not restart all jobs after every kafka maintenance update. Any ideas ? Kind regards, Lars Skjærven

Kubernetes Operator resource limits and requests

2023-02-08 Thread Lars Skjærven
Hello, How can we define *limit* and *request* for the kubernetes pods as described here: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits Looks like we can only set one value for CPU and memory: https://nightlies.apache.org/flink/flink-kubernetes

Re: Could not restore keyed state backend for KeyedProcessOperator

2022-12-15 Thread Lars Skjærven
Same error again today. Any tips ? I'm considering downgrading to Flink 1.14 ? On Wed, Dec 14, 2022 at 11:51 AM Lars Skjærven wrote: > As far as I understand we are not specifying anything on restore mode. so > I guess default (NO_CLAIM) is what we're using. > > We'

Re: Could not restore keyed state backend for KeyedProcessOperator

2022-12-14 Thread Lars Skjærven
storage[1] which is not recommended in the flink checkpoint usage? >>> >>> [1] https://cloud.google.com/storage/docs/lifecycle >>> >>> On Fri, Dec 9, 2022 at 2:02 AM Lars Skjærven wrote: >>> >>>> Hello, >>>> We had an incident today w

Re: Could not restore keyed state backend for KeyedProcessOperator

2022-12-09 Thread Lars Skjærven
ycle > > On Fri, Dec 9, 2022 at 2:02 AM Lars Skjærven wrote: > >> Hello, >> We had an incident today with a job that could not restore after crash >> (for unknown reason). Specifically, it fails due to a missing checkpoint >> file. We've experienced this a tota

Could not restore keyed state backend for KeyedProcessOperator

2022-12-08 Thread Lars Skjærven
Hello, We had an incident today with a job that could not restore after crash (for unknown reason). Specifically, it fails due to a missing checkpoint file. We've experienced this a total of three times with Flink 1.15.2, but never with 1.14.x. Last time was during a node upgrade, but that was not

Switching kafka brokers

2022-10-06 Thread Lars Skjærven
Hello, What is the recommended approach for migrating flink jobs to a new kafka server? I was naively hoping to use Kafka Mirror Maker to sync the old server with the new server, and simply continue from savepoint with updated URL's. Unfortunately, the kafka offsets are not identical for log compa

Re: Cassandra sink with Flink 1.15

2022-09-08 Thread Lars Skjærven
e Flink > efforts. Either add an explicit dependency on flink-streaming-scala or > migrate to Flink tuples. > > On 07/09/2022 14:17, Lars Skjærven wrote: > > Hello, > > When upgrading from 1.14 to 1.15 we bumped into a type issue when > attempting to sink to Cassandra

Cassandra sink with Flink 1.15

2022-09-07 Thread Lars Skjærven
Hello, When upgrading from 1.14 to 1.15 we bumped into a type issue when attempting to sink to Cassandra (scala 2.12.13). This was working nicely in 1.14. Any tip is highly appreciated. Using a MapFunction() to generate the stream of tuples: CassandraSink .addSink( mystream.map(new ToTupleM

Window function - flush on job stop

2022-01-21 Thread Lars Skjærven
We're doing a stream.keyBy().window().aggregate() to aggregate customer feedback into sessions. Every now and then we have to update the job, e.g. change the key, so that we can't easlily continue from the previous state. Cancelling the job (without restarting from last savepoint) will result in l

WindowOperator TestHarness

2021-12-10 Thread Lars Skjærven
Hello, We're trying to write a test for an implementation of *AggregateFunction* following a *EventTimeSessionWindows.withGap*. We gave it a try using *WindowOperator*() which we hoped could be used as an argument to *KeyedOneInputStreamOperatorTestHarness*. We're a bit stuck, and we're hoping som

Re: Scala Case Class Serialization

2021-12-07 Thread Lars Skjærven
ntext. > But have you looked into using the TypeInfoFactory to define the schema > [1]? > > Best, > Matthias > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_serialization/#defining-type-information-usi

Scala Case Class Serialization

2021-12-07 Thread Lars Skjærven
Hello, We're running Flink 1.14 with scala, and we're suspecting that performance is suffering due to serialization of some scala case classes. Specifically we're seeing that our Case Class "cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as Generic

KafkaSink.builder setDeliveryGuarantee is not a member

2021-12-02 Thread Lars Skjærven
Hello, upgrading to 1.14 I bumped into an issue with the kafka sink builder when defining delivery guarantee: value setDeliveryGuarantee is not a member of org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchemaBuilder[] Seems to be working with the default value (i.e. without m

Building a flink connector

2021-09-17 Thread Lars Skjærven
We're in need of a Google Bigtable flink connector. Do you have any tips on how this could be done, e.g. general guidelines on how to write a connector ? Thanks, Lars

Re: KafkaSource builder and checkpointing with parallelism > kafka partitions

2021-09-16 Thread Lars Skjærven
Thanks for the feedback. > May I ask why you have less partitions than the parallelism? I would be happy to learn more about your use-case to better understand the > motivation. The use case is that topic A, contains just a few messages with product metadata that rarely gets updated, while topic

Re: KafkaSource builder and checkpointing with parallelism > kafka partitions

2021-09-15 Thread Lars Skjærven
]. > This configuration parameter is going to be introduced in the upcoming > Flink 1.14 release. > > Best, > Matthias > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-checkpointing-checkpoints-after-tasks-finish-enabled > > On Wed,

KafkaSource builder and checkpointing with parallelism > kafka partitions

2021-09-15 Thread Lars Skjærven
Using KafkaSource builder with a job parallelism larger than the number of kafka partitions, the job is unable to checkpoint. With a job parallelism of 4, 3 of the tasks are marked as FINISHED for the kafka topic with one partition. For this reason checkpointing seems to be disabled. When using F

Re: KafkaSourceBuilder causing invalid negative offset on checkpointing

2021-04-29 Thread Lars Skjærven
: Thursday, April 29, 2021 18:44 To: Lars Skjærven Cc: Becket Qin ; user@flink.apache.org Subject: Re: KafkaSourceBuilder causing invalid negative offset on checkpointing Thanks for the additional information Lars. Could you maybe also share the full stack traces of the errors you see when the

Re: KafkaSourceBuilder causing invalid negative offset on checkpointing

2021-04-29 Thread Lars Skjærven
rom: Till Rohrmann Sent: Thursday, April 29, 2021 09:16 To: Lars Skjærven ; Becket Qin Cc: user@flink.apache.org Subject: Re: KafkaSourceBuilder causing invalid negative offset on checkpointing Hi Lars, The KafkaSourceBuilder constructs the new KafkaSource which has not been fully hardene

KafkaSourceBuilder causing invalid negative offset on checkpointing

2021-04-28 Thread Lars Skjærven
Hello, I ran into an issue when using the new KafkaSourceBuilder (running Flink 1.12.2, scala 2.12.13, on ververica platform in a container with java 8). Initially it generated warnings on kafka configuration, but the job was able to consume and produce messages. The configuration 'client.id.prefi

KafkaSourceBuilder causing invalid negative offset on checkpointing

2021-04-28 Thread Lars Skjærven
Hello, I ran into some issues when using the new KafkaSourceBuilder (running Flink 1.12.2, scala 2.12.13, on ververica platform in a container with java 8). Initially it generated warnings on kafka configuration, but the job was able to consume and produce messages. The configuration 'client.i