Hi, Jonah,

The Kafka protocol started by using a short int to represent the length of
a string. compact_string was introduced in https://cwiki.apache.
org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields#KIP482:TheKafkaProtocolshouldSupportOptionalTaggedFields-CompactString
</>. It's used as a new serialization method when a flexible version is
specified. So, it makes sense to retain the length of the string as a short
integer. I agree that it would be useful to document this in the protocol.

Thanks,

Jun

On Mon, Oct 27, 2025 at 9:56 AM Jonah Hooper <[email protected]>
wrote:

> Hi Jun,
>
> Thanks for responding!
>
> When a field is flexible, we represent STRING as COMPACT_STRING, but it has
> > the same requirement on length.
>
>
> Do you know whether this is specified in the documentation (or a KIP)
> somewhere?
>
> COMPACT_STRING is specified as:
>
> Represents a sequence of characters. First the length N + 1 is given as an
> > UNSIGNED_VARINT . Then N bytes follow which are the UTF-8 encoding of the
> > character sequence.
>
>
> I noticed that `UNSIGNED_VARINT` is never specified in terms of size
> anywhere but it seems that VARINT is specified as part of the
> primitive-types <https://kafka.apache.org/protocol#protocol_types>
> section.
>
> Represents an integer between -231 and 231-1 inclusive. Encoding follows
> > the variable-length zig-zag encoding from Google Protocol Buffers
> > <https://code.google.com/apis/protocolbuffers/docs/encoding.html>.
> >
>
> I'd assume that UNSIGNED_VARINT would be [0, 2^32-1]? IMO we may wish to
> clarify the language of the documentation if the same length restrictions
> still apply for COMPACT_STRING.
>
> Best,
> Jonah
>
> On Fri, Oct 24, 2025 at 5:08 PM Jun Rao <[email protected]> wrote:
>
> > Hi, Jonah,
> >
> > In https://kafka.apache.org/protocol#protocol_types </>, we define the
> > primitive type of String as the following.
> >
> > STRING Represents a sequence of characters. First the length N is given
> as
> > an INT16. Then N bytes follow which are the UTF-8 encoding of the
> character
> > sequence. Length must not be negative.
> >
> > So, a STRING can only have length up to 32767 characters.
> >
> > When a field is flexible, we represent STRING as COMPACT_STRING, but it
> has
> > the same requirement on length.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Oct 24, 2025 at 12:44 PM Jonah Hooper
> <[email protected]
> > >
> > wrote:
> >
> > > Hi Kafka Developers,
> > >
> > > I'd like to discuss something I've noticed about the generated
> > > serialization code of the Kafka Protocol
> > > <https://kafka.apache.org/protocol.html>.
> > >
> > > I'm attempting to create a topic using the most recent KafkaAdminClient
> > > implementation on maven
> > > <
> https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/4.1.0
> > >.
> > > The CREATE_TOPIC
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/CreateTopicsRequest.json
> > > >
> > > RPC specifies that configuration values may be COMPACT_STRING
> > >
> > > CreateTopics Request (Version: 7) => [topics] timeout_ms validate_only
> > > _tagged_fields
> > >   topics => name num_partitions replication_factor [assignments]
> > [configs]
> > > _tagged_fields
> > >     name => COMPACT_STRING
> > >     ...
> > >     configs => name value _tagged_fields
> > >       name => COMPACT_STRING
> > >       value => COMPACT_NULLABLE_STRING
> > >     ...
> > >
> > > COMPACT_STRING is defined as follows:
> > >
> > > > Represents a sequence of characters. First the length N + 1 is given
> as
> > > an UNSIGNED_VARINT . Then N bytes follow which are the UTF-8 encoding
> of
> > > the character sequence.
> > >
> > > I'm not sure whether a maximum size has been specified for
> > UNSIGNED_VARINT.
> > > So I assumed that these strings can have an arbitrary size.
> > >
> > > When I generate a CREATE_TOPIC request in the KafkaAdminClient:
> > >
> > > String longValue = "x".repeat(524_288);
> > > Map<String, String> newTopicConfig = new HashMap<>();
> > > newTopicConfig.put(TopicConfig.COMPRESSION_TYPE_CONFIG, longValue);
> > >
> > > And send the request - I end up with the following exception:
> > >
> > > java.lang.RuntimeException: 'value' field is too long to be serialized
> > > at
> > >
> > >
> >
> org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopicConfig.addSize(CreateTopicsRequestData.java:1219)
> > > at
> > >
> > >
> >
> org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopic.addSize(CreateTopicsRequestData.java:576)
> > > at
> > >
> > >
> >
> org.apache.kafka.common.message.CreateTopicsRequestData.addSize(CreateTopicsRequestData.java:207)
> > > at
> > >
> > >
> >
> org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
> > > at
> > >
> > >
> >
> org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
> > > at
> > >
> > >
> >
> org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:110)
> > > at
> org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:608)
> > > at
> org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:582)
> > >
> > > It seems that the generated code of CreateTopicRequestData contains:
> > >
> > > byte[] _stringBytes = value.getBytes(StandardCharsets.UTF_8);
> > > if (_stringBytes.length > 0x7fff) {
> > >   throw new RuntimeException("'value' field is too long to be
> > serialized");
> > > }
> > >
> > > This code sample is generated by this function
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/409a43eff77511e89bba2f95934cb1ebc417236d/generator/src/main/java/org/apache/kafka/message/MessageDataGenerator.java#L1117
> > > >
> > > and
> > > is what causes the exception to occur.
> > >
> > > Is it intended that RPCs using COMPACT_STRING should have this size
> > limit?
> > >
> > > Thanks!
> > >
> >
>

Reply via email to