Hi Jun, Thanks for responding!
When a field is flexible, we represent STRING as COMPACT_STRING, but it has > the same requirement on length. Do you know whether this is specified in the documentation (or a KIP) somewhere? COMPACT_STRING is specified as: Represents a sequence of characters. First the length N + 1 is given as an > UNSIGNED_VARINT . Then N bytes follow which are the UTF-8 encoding of the > character sequence. I noticed that `UNSIGNED_VARINT` is never specified in terms of size anywhere but it seems that VARINT is specified as part of the primitive-types <https://kafka.apache.org/protocol#protocol_types> section. Represents an integer between -231 and 231-1 inclusive. Encoding follows > the variable-length zig-zag encoding from Google Protocol Buffers > <https://code.google.com/apis/protocolbuffers/docs/encoding.html>. > I'd assume that UNSIGNED_VARINT would be [0, 2^32-1]? IMO we may wish to clarify the language of the documentation if the same length restrictions still apply for COMPACT_STRING. Best, Jonah On Fri, Oct 24, 2025 at 5:08 PM Jun Rao <[email protected]> wrote: > Hi, Jonah, > > In https://kafka.apache.org/protocol#protocol_types </>, we define the > primitive type of String as the following. > > STRING Represents a sequence of characters. First the length N is given as > an INT16. Then N bytes follow which are the UTF-8 encoding of the character > sequence. Length must not be negative. > > So, a STRING can only have length up to 32767 characters. > > When a field is flexible, we represent STRING as COMPACT_STRING, but it has > the same requirement on length. > > Thanks, > > Jun > > On Fri, Oct 24, 2025 at 12:44 PM Jonah Hooper <[email protected] > > > wrote: > > > Hi Kafka Developers, > > > > I'd like to discuss something I've noticed about the generated > > serialization code of the Kafka Protocol > > <https://kafka.apache.org/protocol.html>. > > > > I'm attempting to create a topic using the most recent KafkaAdminClient > > implementation on maven > > <https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/4.1.0 > >. > > The CREATE_TOPIC > > < > > > https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/CreateTopicsRequest.json > > > > > RPC specifies that configuration values may be COMPACT_STRING > > > > CreateTopics Request (Version: 7) => [topics] timeout_ms validate_only > > _tagged_fields > > topics => name num_partitions replication_factor [assignments] > [configs] > > _tagged_fields > > name => COMPACT_STRING > > ... > > configs => name value _tagged_fields > > name => COMPACT_STRING > > value => COMPACT_NULLABLE_STRING > > ... > > > > COMPACT_STRING is defined as follows: > > > > > Represents a sequence of characters. First the length N + 1 is given as > > an UNSIGNED_VARINT . Then N bytes follow which are the UTF-8 encoding of > > the character sequence. > > > > I'm not sure whether a maximum size has been specified for > UNSIGNED_VARINT. > > So I assumed that these strings can have an arbitrary size. > > > > When I generate a CREATE_TOPIC request in the KafkaAdminClient: > > > > String longValue = "x".repeat(524_288); > > Map<String, String> newTopicConfig = new HashMap<>(); > > newTopicConfig.put(TopicConfig.COMPRESSION_TYPE_CONFIG, longValue); > > > > And send the request - I end up with the following exception: > > > > java.lang.RuntimeException: 'value' field is too long to be serialized > > at > > > > > org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopicConfig.addSize(CreateTopicsRequestData.java:1219) > > at > > > > > org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopic.addSize(CreateTopicsRequestData.java:576) > > at > > > > > org.apache.kafka.common.message.CreateTopicsRequestData.addSize(CreateTopicsRequestData.java:207) > > at > > > > > org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) > > at > > > > > org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) > > at > > > > > org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:110) > > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:608) > > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:582) > > > > It seems that the generated code of CreateTopicRequestData contains: > > > > byte[] _stringBytes = value.getBytes(StandardCharsets.UTF_8); > > if (_stringBytes.length > 0x7fff) { > > throw new RuntimeException("'value' field is too long to be > serialized"); > > } > > > > This code sample is generated by this function > > < > > > https://github.com/apache/kafka/blob/409a43eff77511e89bba2f95934cb1ebc417236d/generator/src/main/java/org/apache/kafka/message/MessageDataGenerator.java#L1117 > > > > > and > > is what causes the exception to occur. > > > > Is it intended that RPCs using COMPACT_STRING should have this size > limit? > > > > Thanks! > > >
