[Doubt] about breaking zero copy

2022-10-15 Thread Sergio Daniel Troiano
My doubt is regarding the produced batches conversions. (Breaking the zero
copy)

I am producing using Kafka streams 3.0 (same version as the Kafka cluster)

Messages are compressed on the producer.


I am seeing several ProduceConversion  per sec rate in a topic, as we know
the recompression of the batches breaks the “zero copy” which means more
resources to be used by the brokers.



Checking the source code I saw different ways of breaking “in place”
batches.


1 sourceCompression != DestinationCompression (I.e: producer using GZIP and
topic using LZ4)

2 Magic number < 2 or magic number mismatch

3 Offsets


I think this is the code snippet which triggers the method to break the
zero copy

** Code at the bottom of the email



What I suspect is for some reason the offsets are not contiguous on the
produced batches which leads me to the main doubt, what could be a scenario
when this could happen?


I tried to see this with the dump-logs sh tool but of course this is not
possible as Kafka already converted the batches.


Also I thought about transactions  could be the reason of the conversions  as
they use the IsControl batch but as I saw the IsControl batch will always
contain one record (the control), so I assume control batches will never
have other “client generated” records.



So I would appreciate if you tell me can example of the offsets not
contiguos  in a batch, in parallel I will continue my investigation as I am
intrigued about this conversions.



After this I want to write a public document about performance based on
batch conversions.


Thanks in advance


Best regards.


Sergio Troiano






-

recordsIterator.forEachRemaining { record =>
  val expectedOffset = expectedInnerOffset.getAndIncrement()
  val recordError = validateRecordCompression(batchIndex, record).orElse {
validateRecord(batch, topicPartition, record, batchIndex, now,
  timestampType, timestampDiffMaxMs, compactedTopic,
brokerTopicStats).orElse {
  if (batch.magic > RecordBatch.MAGIC_VALUE_V0 && toMagic >
RecordBatch.MAGIC_VALUE_V0) {
if (record.timestamp > maxTimestamp)
  maxTimestamp = record.timestamp

// Some older clients do not implement the V1 internal offsets
correctly.
// Historically the broker handled this by rewriting the batches rather
// than rejecting the request. We must continue this handling
here to avoid
// breaking these clients.
if (record.offset != expectedOffset)
  inPlaceAssignment = false
  }
  None
}
  }


[jira] [Created] (KAFKA-14306) add uniform serializer and deserializer to the kafka protocol

2022-10-15 Thread maayan shani (Jira)
maayan shani created KAFKA-14306:


 Summary: add uniform serializer and deserializer to the kafka 
protocol
 Key: KAFKA-14306
 URL: https://issues.apache.org/jira/browse/KAFKA-14306
 Project: Kafka
  Issue Type: New Feature
  Components: clients, core, protocol
Reporter: maayan shani


recently I noticed every client (java, js, python ...) uses a different 
serializer and deserializer,

by adding the protocol buffer (.proto)  files to the Kafka project, and 
describing the Kafka protocol, all clients and even the brokers can connect 
with more ease.

there will be less duplication of code,

it will be easy to add client implementation fast the clients will be more 
reliable and more up-to-date like the official Kafka protocol

 
 * protocol buffer is a uniformed way developed by google to describe 
serializer and deserializer protocols

related information: 

the protobuf by google -

https://developers.google.com/protocol-buffers

for example, pulsar added protobuf files to the project 

pulsar protocol buffer -  

[https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto]

[https://pulsar.apache.org/docs/developing-binary-protocol]

pulsar node js client -

[https://github.com/ayeo-flex-org/pulsar-flex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14307) KRaft controller time based snapshots

2022-10-15 Thread Jira
José Armando García Sancio created KAFKA-14307:
--

 Summary: KRaft controller time based snapshots
 Key: KAFKA-14307
 URL: https://issues.apache.org/jira/browse/KAFKA-14307
 Project: Kafka
  Issue Type: Sub-task
Reporter: José Armando García Sancio
Assignee: José Armando García Sancio






--
This message was sent by Atlassian Jira
(v8.20.10#820010)