Yuanhao Zhu created NIFI-14496:
----------------------------------

             Summary: ConvertRecord processor cannot convert Avro bytes typed 
field to string properly
                 Key: NIFI-14496
                 URL: https://issues.apache.org/jira/browse/NIFI-14496
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 2.3.0, 2.2.0, 2.1.0, 2.0.0
            Reporter: Yuanhao Zhu


When using ConvertRecord processor in 2.x we found that it is not able to 
convert an avro bytes field into string properly.

The setup is as following, the ConvertRecord uses an avro reader which uses the 
built-in schema from the avro file. the record writer is a JsonRecordSetWriter 
which uses a custom schema(copied from the avro file's schema only that the 
"Body" field is marked as string(in avro file ":Body" field is marked as bytes 
in the built-in schema) 

 

In 1.x the "Body" field will be converted into string that contains json 
objects and we would use evaluateJsonPath to extract further. However, in 2.x 
this behavior becomes that the result of "Body" field would always be something 
like "[Ljava.lang.Object;@279aa943" which is the toString returned value from 
an Object array

 

After some investigation in nifi repo, I think the reason is that In 1.x 
DataTypeUtils conversion, the toString method also deals with the scenario 
where incoming value is an array of object,

[https://github.com/apache/nifi/blob/883338fe28883733417d10f6ffa9319e75f5ea06/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/util/DataTypeUtils.java#L975]

 

where it will convert each of the object into a string. While in the 2.x, where 
the conversion is moved to ObjectStringFieldConverter.java, 
[https://github.com/apache/nifi/blob/0fde8be07270e41433d07fa1e3f940b1a08674d9/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/field/ObjectStringFieldConverter.java#L102]

this scenario is not covered and instead the default toString method of the 
incoming object will be invoked which also explained why we see that 
"[Ljava.lang.Object;@279aa943" in 2.x .

Not sure why the Avro reader reads the byte array in as an Object array though. 

Would you mind take a look into it? Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to