Steve Stagg created AVRO-3834:
---------------------------------

             Summary: [Python] Incorrect decimal encoding/decoding
                 Key: AVRO-3834
                 URL: https://issues.apache.org/jira/browse/AVRO-3834
             Project: Apache Avro
          Issue Type: Bug
          Components: logical types, python
    Affects Versions: 1.11.2
         Environment: Python 3.10.3, Avro 1.11.2

 
            Reporter: Steve Stagg


When encoding `decimal.Decimal` values using the python avro library, the 
exponent of the value is largely ignored.

This means that incorrect twos-complement values are calculated, and we end up 
with incorrect avros are produced.

Here's a reasonalby compact reproducer:

```python
import avro
import avro.io
from decimal import Decimal
from io import BytesIO

TESTS = [
    '314',
    '31',
    '3',
    '3.1',
    '31.4',
    '3.14',
    '3.141',
    '3.1415',
]

if __name__ == '__main__':
    schema_text = '''{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 8,
  "scale": 4
    }'''
    print(f"AVRO VERSION: {avro.__version__}")
    schema = avro.schema.parse(schema_text)
    writer = avro.io.DatumWriter(schema)
    reader = avro.io.DatumReader(schema)

    for val in TESTS:
        buf = BytesIO()

        val = Decimal(val)
        writer.write(val, avro.io.BinaryEncoder(buf))
        buf.seek(0)
        decoded_val = reader.read(avro.io.BinaryDecoder(buf))
        
        match = val == decoded_val
        result = 'PASS' if match else 'FAIL'
        print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val}   {result}')
        
```
Which outputs:
```
AVRO VERSION: 1.11.2
Encoded: 314 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 31 -> b'\x02\x1f' -> 0.0031   FAIL
Encoded: 3 -> b'\x02\x03' -> 0.0003   FAIL
Encoded: 3.1 -> b'\x02\x1f' -> 0.0031   FAIL
Encoded: 31.4 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 3.14 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141   FAIL
Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415   PASS
```

The problem is that the code here:
https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468
does not use `exp` to shift the digits, exp is just checked to ensure it's not 
greater than scale for validation purposes.

If you look in the output, the produced avro bytes for '31.4' and '3.14' is 
identical, because the exp is ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to