[
https://issues.apache.org/jira/browse/AVRO-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fokko Driesprong resolved AVRO-2203.
------------------------------------
Resolution: Cannot Reproduce
Can't open the link provided. Please resubmit the issue when it still persists
with Avro 1.8.2
> avro module in python generates different bytes while writing file to local
> storage and s3
> -------------------------------------------------------------------------------------------
>
> Key: AVRO-2203
> URL: https://issues.apache.org/jira/browse/AVRO-2203
> Project: Apache Avro
> Issue Type: Bug
> Components: python
> Affects Versions: 1.8.0
> Environment: S3. UNIX, HDFS, python
> Reporter: Vinuthna
> Priority: Blocker
>
> Hi,
> I am trying to convert a csv file to avro format and store it on S3 storage
> using python. During this process, I see that there is data loss in the file
> written to s3 storage. This is confirmed by converting the avro file on local
> storage and avro file on s3 storage to json format by comparing the content
> and total number of lines present in each file.
> A deep investigation into this issue shows that avro data generated while
> writing to local storage is not exactly same as the avro data generated while
> writing to s3 storage.
> I suspect issue is in getting a writer object using DatumWriter.
> writer = avro.datafile.DataFileWriter(<fileobject>, avro.io.DatumWriter(),
> schema)
> Exact code is present in git hub link below-
> https://github.com/mpenkov/smart_open/blob/209/integration-tests/test_209.py
> Could you please help solve this issue?
>
> Thanks
> Vinuthna
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)