Steve Stagg created AVRO-3834: --------------------------------- Summary: [Python] Incorrect decimal encoding/decoding Key: AVRO-3834 URL: https://issues.apache.org/jira/browse/AVRO-3834 Project: Apache Avro Issue Type: Bug Components: logical types, python Affects Versions: 1.11.2 Environment: Python 3.10.3, Avro 1.11.2
Reporter: Steve Stagg When encoding `decimal.Decimal` values using the python avro library, the exponent of the value is largely ignored. This means that incorrect twos-complement values are calculated, and we end up with incorrect avros are produced. Here's a reasonalby compact reproducer: ```python import avro import avro.io from decimal import Decimal from io import BytesIO TESTS = [ '314', '31', '3', '3.1', '31.4', '3.14', '3.141', '3.1415', ] if __name__ == '__main__': schema_text = '''{ "type": "bytes", "logicalType": "decimal", "precision": 8, "scale": 4 }''' print(f"AVRO VERSION: {avro.__version__}") schema = avro.schema.parse(schema_text) writer = avro.io.DatumWriter(schema) reader = avro.io.DatumReader(schema) for val in TESTS: buf = BytesIO() val = Decimal(val) writer.write(val, avro.io.BinaryEncoder(buf)) buf.seek(0) decoded_val = reader.read(avro.io.BinaryDecoder(buf)) match = val == decoded_val result = 'PASS' if match else 'FAIL' print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val} {result}') ``` Which outputs: ``` AVRO VERSION: 1.11.2 Encoded: 314 -> b'\x04\x01:' -> 0.0314 FAIL Encoded: 31 -> b'\x02\x1f' -> 0.0031 FAIL Encoded: 3 -> b'\x02\x03' -> 0.0003 FAIL Encoded: 3.1 -> b'\x02\x1f' -> 0.0031 FAIL Encoded: 31.4 -> b'\x04\x01:' -> 0.0314 FAIL Encoded: 3.14 -> b'\x04\x01:' -> 0.0314 FAIL Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141 FAIL Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415 PASS ``` The problem is that the code here: https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468 does not use `exp` to shift the digits, exp is just checked to ensure it's not greater than scale for validation purposes. If you look in the output, the produced avro bytes for '31.4' and '3.14' is identical, because the exp is ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)