Re: Messages ending up on the wrong topic, bug in Kafka client?

2025-02-09 Thread Oleksandr Shulgin
On Sun, Feb 9, 2025 at 3:24 AM Ismael Juma wrote: > One more thing: when this happens, is the client authorized to write to > both topicA and topicB? > We, for one, do not use per-topic authorization, as it is enforced one layer up. So the answer is "yes" in our case. -- Alex

Re: Messages ending up on the wrong topic, bug in Kafka client?

2025-02-05 Thread Oleksandr Shulgin
On Mon, Jan 20, 2025 at 9:58 AM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Fri, Jan 17, 2025 at 8:31 PM Donny Nadolny > wrote: > >> We're experiencing messages very occasionally ending up on a different >> topic than what they were publishe

Re: Messages ending up on the wrong topic, bug in Kafka client?

2025-01-20 Thread Oleksandr Shulgin
On Fri, Jan 17, 2025 at 8:31 PM Donny Nadolny wrote: > We're experiencing messages very occasionally ending up on a different > topic than what they were published to. That is, we publish a message to > topicA and consumers of topicB see it and fail to parse it because the > message contents are

Re: Kafka cluster collapsed for no reason

2024-12-02 Thread Oleksandr Shulgin
On Sat, Nov 30, 2024 at 10:02 AM Rybalka, Grigoriy (Fortebank) wrote: > > And this logs appears on remaining 2 brokers nodes, it seems that kafka > brokers is lost connection between nodes, but ssh and other traffic to/from > nodes was worked! And on the network side there is no problems, What el

Re: Spread primary and replica of a partition across different zones in AWS

2024-11-20 Thread Oleksandr Shulgin
On Wed, Nov 20, 2024 at 11:11 AM Soham Chakraborty wrote: > Fair enough. For one time task, we can probably use some manual work with > sprinkling of automation thrown in. I will look into cruise control. > > Question: now that Kafka knows about "rack" future assignments should > strictly be zone

Re: Spread primary and replica of a partition across different zones in AWS

2024-11-20 Thread Oleksandr Shulgin
On Wed, Nov 20, 2024 at 9:15 AM Soham Chakraborty wrote: > > So my goal is to know whether there is any knob by which I can force the > leader and replica to go to different AZs and is there any automated > way/tool to handle this for existing partitions. In other words, I want the > leader and

Re: Messages sent to the wrong topic-partitions during broker outage

2023-09-08 Thread Oleksandr Shulgin
On Wed, Sep 6, 2023 at 3:07 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > ... there were no changes to the publishing path recently that may have > directly caused this ... > I wrote "directly", because there was a change that might have influenced it i

Messages sent to the wrong topic-partitions during broker outage

2023-09-06 Thread Oleksandr Shulgin
Hello! We are facing a really weird issue with our production Kafka cluster (60x AWS EC2 instances split evenly across 3 "racks" / Availability Zones, with EBS storage). The Kafka server and client version is 3.3.2. Around 10 days ago one of the brokers all of a sudden became very slow as it was

How to disable metric collection for client quotas?

2023-05-26 Thread Oleksandr Shulgin
Hello! We have a setup where clients are not connecting to Kafka directly, but rather using a middle layer API for both producing and consuming messages. This layer is centrally managed by the same tech team that is operating Kafka, so we don't see a use for the Kafka client quotas mechanism. At

Idempotent producer vs. JVM heap usage

2022-05-02 Thread Oleksandr Shulgin
Hello, We are running Apache Kafka v2.7.1 on a total of 54 brokers distributed evenly across 3 racks. All machines are identical (c6g.4xlarge Amazon AWS EC2) and have 32 GB of RAM, of which 12 GB we dedicate to the JVM heap. This cluster hosts some thousands of topics (each replicated to all 3 r

Re: Log directory offline on AWS EBS

2022-02-10 Thread Oleksandr Shulgin
On Thu, Feb 10, 2022 at 8:45 AM Audrius Petrosius wrote: > Hello, > > We are encountering such issues on AWS EBS based system, nothing in AWS > logs. > > Is it memory or IO issue, as it states in one line > > in dir /srv/kafka/disk1 due to IOException > (kafka.server.LogDirFailureChannel) > java.

Re: Failing to connect to ZooKeeper on first start

2021-07-09 Thread Oleksandr Shulgin
On Fri, Jul 9, 2021 at 7:35 AM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > Since version 2.7.0 we observe the subj. issue with the first start of the > Kafka process when we rotate the EC2 instances (for the sake of software > upgrade). > Our supervisor scrip

Failing to connect to ZooKeeper on first start

2021-07-08 Thread Oleksandr Shulgin
Hi, We are running Apache Kafka 2.7.1 on AWS EC2 with ZooKeeper running in the same VPC (private network). Since version 2.7.0 we observe the subj. issue with the first start of the Kafka process when we rotate the EC2 instances (for the sake of software upgrade). Our supervisor script notices th

Re: Replica selection in unclean leader election and min.insync.replicas=2

2021-06-29 Thread Oleksandr Shulgin
On Tue, Jun 29, 2021 at 5:45 PM Péter Sinóros-Szabó wrote: > Hey, > > we had the same issue as you. > > I checked the code and it chooses the first live replica from the > assignment list. So if you describe a topic with kafka-topics, you will see > the brokers list that has the replica of each p

Re: Replica selection in unclean leader election and min.insync.replicas=2

2021-06-28 Thread Oleksandr Shulgin
On Mon, Jun 21, 2021 at 12:33 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > In summary: is there a risk of data loss in such a scenario? Is this risk avoidable and if so, what are > the prerequisites? Apologies if I messed up line breaks and that made reading ha

Replica selection in unclean leader election and min.insync.replicas=2

2021-06-21 Thread Oleksandr Shulgin
Hi, We are running Apache Kafka v2.7.0 in production in a 3-rack setup (3 AZs in a single AWS region) with the per-topic replication factor of 3 and the following global settings: unclean.leader.election.enable=false min.insync.replicas=2 replica.lag.time.max.ms=1 replica.selector.class=org.a