[
https://issues.apache.org/jira/browse/ARROW-18228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628957#comment-17628957
]
Vadym Dytyniak edited comment on ARROW-18228 at 11/4/22 12:54 PM:
------------------------------------------------------------------
[~willjones127] It helped. Thanks. Do you recommend to use this strategy or it
means that we exceed rate limit and should review our implementation?
was (Author: JIRAUSER297843):
[~willjones127] It helped. Do you recommend to use this strategy or it means
that we exceed rate limit and should review our implementation?
> AWS Error SLOW_DOWN during PutObject operation
> ----------------------------------------------
>
> Key: ARROW-18228
> URL: https://issues.apache.org/jira/browse/ARROW-18228
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 10.0.0
> Reporter: Vadym Dytyniak
> Priority: Major
>
> We use Dask to parallelise read/write operations and pyarrow to write dataset
> from worker nodes.
> After pyarrow released version 10.0.0, our data flows automatically switched
> to the latest version and some of them started to fail with the following
> error:
> {code:java}
> File "/usr/local/lib/python3.10/dist-packages/org/store/storage.py", line
> 768, in _write_partition
> ds.write_dataset(
> File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line
> 988, in write_dataset
> _filesystemdataset_write(
> File "pyarrow/_dataset.pyx", line 2859, in
> pyarrow._dataset._filesystemdataset_write
> check_status(CFileSystemDataset.Write(c_options, c_scanner))
> File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
> raise IOError(message)
> OSError: When creating key 'equities.us.level2.by_security/' in bucket
> 'org-prod': AWS Error SLOW_DOWN during PutObject operation: Please reduce
> your request rate. {code}
> In total flow failed many times: most failed with the error above, but one
> failed with:
> {code:java}
> File "/usr/local/lib/python3.10/dist-packages/chronos/store/storage.py", line
> 857, in _load_partition
> table = ds.dataset(
> File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line
> 752, in dataset
> return _filesystem_dataset(source, **kwargs)
> File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line
> 444, in _filesystem_dataset
> fs, paths_or_selector = _ensure_single_source(source, filesystem)
> File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line
> 411, in _ensure_single_source
> file_info = filesystem.get_file_info(path)
> File "pyarrow/_fs.pyx", line 564, in pyarrow._fs.FileSystem.get_file_info
> info = GetResultValue(self.fs.GetFileInfo(path))
> File "pyarrow/error.pxi", line 144, in
> pyarrow.lib.pyarrow_internal_check_status
> return check_status(status)
> File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
> raise IOError(message)
> OSError: When getting information for key
> 'ns/date=2022-10-31/channel=4/feed=A/9f41f928eedc431ca695a7ffe5fc60c2-0.parquet'
> in bucket 'org-poc': AWS Error NETWORK_CONNECTION during HeadObject
> operation: curlCode: 28, Timeout was reached {code}
>
> Do you have any idea what was changed for dataset write between 9.0.0 and
> 10.0.0 to help us to fix the issue?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)