[
https://issues.apache.org/jira/browse/IMPALA-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17974024#comment-17974024
]
ASF subversion and git services commented on IMPALA-14144:
----------------------------------------------------------
Commit fae42323da6791958ddf5506219012ecd9492bab in impala's branch
refs/heads/master from Laszlo Gaal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fae42323d ]
IMPALA-14144: Make pip_download.py more tolerant with PEP 503 simple pages
Recent package updates on PyPI have introduced package description
pages that have extra newlines in addition to the newline character
separating the complete URLs for the difference package versions.
These extra newlines usually show up before the closing angle bracket
character ('>') of the opening half of the anchor tag.
This broke pip_download.py, because it uses a regex to crack out
various data items (file name, download path, hash algorithm and hash
value) from the download page. The regex attempts the whole anchor
element up to and including the closing '</a>' tag, which fails because
the '.' in a regex matches any character, except a newline. This failure
causes all lines in the package descriptor page to be rejected as not
matching the search pattern, so the package with a page in this format
can never be recognized.
This patch works around this formatting issue by adding the flag
re.DOTALL to the regex search call, making the regex '.' character match
the newline as well, so that the regex can match the complete anchor
element across a line break as well.
Change-Id: Ia56f87c54e0d9cad97b7e0ffbcce8f4c0f715c44
Reviewed-on: http://gerrit.cloudera.org:8080/23026
Reviewed-by: Joe McDonnell <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Joe McDonnell <[email protected]>
> pip_download.py fails to download several packages from pypi.org
> ----------------------------------------------------------------
>
> Key: IMPALA-14144
> URL: https://issues.apache.org/jira/browse/IMPALA-14144
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Reporter: Laszlo Gaal
> Assignee: Laszlo Gaal
> Priority: Blocker
>
> infra/python/deps/pip_download.py runs at the start of buildall.sh to ensure
> that all the Python requirements can be installed into the Impala virtualenv
> used by the test framework. This download was implemented to download the
> packages in multiple parallel streams.
> Recently this downloader has started failing: it reports a complete download
> failure for several packages, e.g. {{hdfs}}, {{impyla}}, {{bitarray}} and
> several others.
> The failure is not caused by a network communication problem, as the same
> packages from the same repo can be successfully downloaded with a browser.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]