[ 
https://issues.apache.org/jira/browse/IMPALA-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17966293#comment-17966293
 ] 

Joe McDonnell commented on IMPALA-14129:
----------------------------------------

Merged 
[https://github.com/cloudera/native-toolchain/commit/a1f257d5b75745670d43af20d254f3f3260e7070]
 to native-toolchain:
{noformat}
commit a1f257d5b75745670d43af20d254f3f3260e7070
Author: Joe McDonnell <[email protected]>
Date:   Thu Jun 5 16:42:08 2025 -0700    IMPALA-14129: Patch hadoop-client to 
disable repository.apache.org
    
    This applies a patch on top of hadoop that disables two
    Maven repositories: repository.jboss.org and repository.apache.org
    The build does not actually need those repositories. All the
    artifacts needed for this build are available from central.
    Repeated requests to repository.apache.org are discouraged and
    can lead to an IP being banned.
    
    Applying a patch changes some of the directories names, so this
    needed further adjustments to handle that.
    
    Testing:
     - Ran ARM build and used that to build Impala on ARM
     - Verified that the hadoop-client build did not access
       repository.apache.org based on the logs
    
    Change-Id: I2a441c1dc2c43e5fdcd467486b50e531daff62eb
    Reviewed-on: http://gerrit.cloudera.org:8080/22992
    Reviewed-by: Michael Smith <[email protected]>
    Tested-by: Joe McDonnell <[email protected]>
{noformat}

> Native-toolchain's hadoop-client build should not contact Apache servers
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-14129
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14129
>             Project: IMPALA
>          Issue Type: Task
>          Components: Infrastructure
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> The hadoop-client build (needed for ARM) does not get all of its dependencies 
> from the central repository. Instead, it has some attempts to download from 
> repository.jboss.org and repository.apache.org:
> {noformat}
> [INFO] Downloading from apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/apache/24/apache-24.pom
> [INFO] Downloading from repository.jboss.org: 
> https://repository.jboss.org/nexus/content/groups/public/org/apache/apache/24/apache-24.pom
> [INFO] Downloading from central: 
> https://repo.maven.apache.org/maven2/org/apache/apache/24/apache-24.pom
> [INFO] Downloaded from central: 
> https://repo.maven.apache.org/maven2/org/apache/apache/24/apache-24.pom (20 
> kB at 140 kB/s){noformat}
> Everything that it needs to download is available in central, so these extra 
> requests don't do anything. We should find a way to avoid contacting 
> repository.apache.org. One option is to apply a patch to hadoop to 
> specifically disable those repositories so that it only uses central.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to