Re: [PR] NIFI-14424 Support for confluent encoded protobuf headers [nifi]

via GitHub Tue, 05 Aug 2025 08:54:12 -0700


awelless commented on code in PR #10105:
URL: https://github.com/apache/nifi/pull/10105#discussion_r2254727311



##########
nifi-extension-bundles/nifi-standard-services/nifi-schema-registry-service-api/src/main/java/org/apache/nifi/schemaregistry/services/SchemaRegistry.java:
##########
@@ -45,4 +45,32 @@ public interface SchemaRegistry extends ControllerService {
      * @return the set of all Schema Fields that are supplied by the 
RecordSchema that is returned from {@link #retrieveSchema(SchemaIdentifier)}
      */
     Set<SchemaField> getSuppliedSchemaFields();
+
+    /**
+     * Retrieves the raw schema definition including its textual 
representation and references.
+     * <p>
+     * This method is used to retrieve the complete schema definition 
structure, including the raw schema text
+     * and any schema references. Unlike {@link 
#retrieveSchema(SchemaIdentifier)}, which returns a parsed
+     * {@link RecordSchema} ready for immediate use, this method returns a 
{@link SchemaDefinition} containing
+     * the raw schema content that can be used for custom schema processing, 
compilation, or when schema
+     * references need to be resolved.
+     * </p>
+     * <p>
+     * This method is particularly useful for:
+     * <ul>
+     *   <li>Processing schemas that reference other schemas (e.g., Protocol 
Buffers with imports)</li>
+     *   <li>Custom schema compilation workflows where the raw schema text is 
needed</li>
+     *   <li>Accessing schema metadata and references for advanced schema 
processing</li>
+     * </ul>
+     * </p>
+     *
+     * @param schemaIdentifier the schema identifier containing id, name, 
version, and optionally branch information
+     * @return a {@link SchemaDefinition} containing the raw schema text, 
type, identifier, and references
+     * @throws IOException if unable to communicate with the backing store
+     * @throws SchemaNotFoundException if unable to find the schema based on 
the given identifier
+     * @throws UnsupportedOperationException if the schema registry 
implementation does not support raw schema retrieval
+     */
+    default SchemaDefinition retrieveSchemaRaw(SchemaIdentifier 
schemaIdentifier) throws IOException, SchemaNotFoundException {
+        throw new UnsupportedOperationException("retrieveSchemaRaw is not 
supported by this SchemaRegistry implementation");

Review Comment:
   The concern is that currently there is no way to check which schema 
registries support `retrieveRawSchema` and which don't.
   
   So that a user can pass a schema registry which doesn't support 
`retrieveRawSchema` into a `ProtobufReader` and the validation in the processor 
won't say a thing. But in runtime on a first message 
`UnsupportedOperationException` will be thrown.



##########
nifi-extension-bundles/nifi-confluent-platform-bundle/nifi-confluent-protobuf-message-name-resolver/src/test/java/org/apache/nifi/confluent/schemaregistry/VarintUtilsTest.java:
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.confluent.schemaregistry;
+
+import org.junit.jupiter.api.Test;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import static 
org.apache.nifi.confluent.schemaregistry.VarintUtils.decodeZigZag;
+import static 
org.apache.nifi.confluent.schemaregistry.VarintUtils.readVarintFromStream;
+import static 
org.apache.nifi.confluent.schemaregistry.VarintUtils.readVarintFromStreamAfterFirstByteConsumed;
+import static 
org.apache.nifi.confluent.schemaregistry.VarintUtils.writeZigZagVarint;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class VarintUtilsTest {
+
+    @Test
+    public void testReadVarintFromStream_SingleByte() throws IOException {
+        byte[] data = {0x08}; // 8 in varint format (0x08 = 00001000)
+        InputStream inputStream = new ByteArrayInputStream(data);
+
+        int result = readVarintFromStream(inputStream);
+        assertEquals(8, result);
+    }
+
+    @Test
+    public void testReadVarintFromStream_MultiByte() throws IOException {
+        byte[] data = {(byte) 0x96, 0x01}; // 150 in varint format (10010110 
00000001)
+        InputStream inputStream = new ByteArrayInputStream(data);
+
+        int result = readVarintFromStream(inputStream);
+        assertEquals(150, result);
+    }
+
+    @Test
+    public void testReadVarintFromStream_MaxValue() throws IOException {
+        // Maximum 32-bit value in varint format (11111111 11111111 11111111 
11111111 00001111)
+        byte[] data = {(byte) 0xFF, (byte) 0xFF, (byte) 0xFF, (byte) 0xFF, 
0x0F};
+        InputStream inputStream = new ByteArrayInputStream(data);
+
+        int result = readVarintFromStream(inputStream);
+        assertEquals(0xFFFFFFFF, result);
+    }
+
+    @Test
+    public void testReadVarintFromStream_WithFirstByte() throws IOException {

Review Comment:
   From my understanding, the `data` array is not used during decoding. 
   We pass the first `0x08` byte in the 
`readVarintFromStreamAfterFirstByteConsumed` arguments. Since it doesn't have a 
continuation bit set, we end up decoding.
   
   I suggest we should also test a scenario when the first bit has a 
continuation bit set. The `readVarintFromStreamAfterFirstByteConsumed` should 
combine the passed `inputStream` with the first byte we pass as an argument. 
   This is the only test case for `readVarintFromStreamAfterFirstByteConsumed`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] NIFI-14424 Support for confluent encoded protobuf headers [nifi]

Reply via email to