Hi, Jonah, The Kafka protocol started by using a short int to represent the length of a string. compact_string was introduced in https://cwiki.apache. org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields#KIP482:TheKafkaProtocolshouldSupportOptionalTaggedFields-CompactString </>. It's used as a new serialization method when a flexible version is specified. So, it makes sense to retain the length of the string as a short integer. I agree that it would be useful to document this in the protocol.
Thanks, Jun On Mon, Oct 27, 2025 at 9:56 AM Jonah Hooper <[email protected]> wrote: > Hi Jun, > > Thanks for responding! > > When a field is flexible, we represent STRING as COMPACT_STRING, but it has > > the same requirement on length. > > > Do you know whether this is specified in the documentation (or a KIP) > somewhere? > > COMPACT_STRING is specified as: > > Represents a sequence of characters. First the length N + 1 is given as an > > UNSIGNED_VARINT . Then N bytes follow which are the UTF-8 encoding of the > > character sequence. > > > I noticed that `UNSIGNED_VARINT` is never specified in terms of size > anywhere but it seems that VARINT is specified as part of the > primitive-types <https://kafka.apache.org/protocol#protocol_types> > section. > > Represents an integer between -231 and 231-1 inclusive. Encoding follows > > the variable-length zig-zag encoding from Google Protocol Buffers > > <https://code.google.com/apis/protocolbuffers/docs/encoding.html>. > > > > I'd assume that UNSIGNED_VARINT would be [0, 2^32-1]? IMO we may wish to > clarify the language of the documentation if the same length restrictions > still apply for COMPACT_STRING. > > Best, > Jonah > > On Fri, Oct 24, 2025 at 5:08 PM Jun Rao <[email protected]> wrote: > > > Hi, Jonah, > > > > In https://kafka.apache.org/protocol#protocol_types </>, we define the > > primitive type of String as the following. > > > > STRING Represents a sequence of characters. First the length N is given > as > > an INT16. Then N bytes follow which are the UTF-8 encoding of the > character > > sequence. Length must not be negative. > > > > So, a STRING can only have length up to 32767 characters. > > > > When a field is flexible, we represent STRING as COMPACT_STRING, but it > has > > the same requirement on length. > > > > Thanks, > > > > Jun > > > > On Fri, Oct 24, 2025 at 12:44 PM Jonah Hooper > <[email protected] > > > > > wrote: > > > > > Hi Kafka Developers, > > > > > > I'd like to discuss something I've noticed about the generated > > > serialization code of the Kafka Protocol > > > <https://kafka.apache.org/protocol.html>. > > > > > > I'm attempting to create a topic using the most recent KafkaAdminClient > > > implementation on maven > > > < > https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/4.1.0 > > >. > > > The CREATE_TOPIC > > > < > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/CreateTopicsRequest.json > > > > > > > RPC specifies that configuration values may be COMPACT_STRING > > > > > > CreateTopics Request (Version: 7) => [topics] timeout_ms validate_only > > > _tagged_fields > > > topics => name num_partitions replication_factor [assignments] > > [configs] > > > _tagged_fields > > > name => COMPACT_STRING > > > ... > > > configs => name value _tagged_fields > > > name => COMPACT_STRING > > > value => COMPACT_NULLABLE_STRING > > > ... > > > > > > COMPACT_STRING is defined as follows: > > > > > > > Represents a sequence of characters. First the length N + 1 is given > as > > > an UNSIGNED_VARINT . Then N bytes follow which are the UTF-8 encoding > of > > > the character sequence. > > > > > > I'm not sure whether a maximum size has been specified for > > UNSIGNED_VARINT. > > > So I assumed that these strings can have an arbitrary size. > > > > > > When I generate a CREATE_TOPIC request in the KafkaAdminClient: > > > > > > String longValue = "x".repeat(524_288); > > > Map<String, String> newTopicConfig = new HashMap<>(); > > > newTopicConfig.put(TopicConfig.COMPRESSION_TYPE_CONFIG, longValue); > > > > > > And send the request - I end up with the following exception: > > > > > > java.lang.RuntimeException: 'value' field is too long to be serialized > > > at > > > > > > > > > org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopicConfig.addSize(CreateTopicsRequestData.java:1219) > > > at > > > > > > > > > org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopic.addSize(CreateTopicsRequestData.java:576) > > > at > > > > > > > > > org.apache.kafka.common.message.CreateTopicsRequestData.addSize(CreateTopicsRequestData.java:207) > > > at > > > > > > > > > org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) > > > at > > > > > > > > > org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) > > > at > > > > > > > > > org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:110) > > > at > org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:608) > > > at > org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:582) > > > > > > It seems that the generated code of CreateTopicRequestData contains: > > > > > > byte[] _stringBytes = value.getBytes(StandardCharsets.UTF_8); > > > if (_stringBytes.length > 0x7fff) { > > > throw new RuntimeException("'value' field is too long to be > > serialized"); > > > } > > > > > > This code sample is generated by this function > > > < > > > > > > https://github.com/apache/kafka/blob/409a43eff77511e89bba2f95934cb1ebc417236d/generator/src/main/java/org/apache/kafka/message/MessageDataGenerator.java#L1117 > > > > > > > and > > > is what causes the exception to occur. > > > > > > Is it intended that RPCs using COMPACT_STRING should have this size > > limit? > > > > > > Thanks! > > > > > >
