Re: Object Store Integration Tests in Pre-submit

Chris Nauroth Fri, 22 Aug 2025 18:08:49 -0700

Sorry, yes, INFRA-27071.

The latest status on that bug is infra doesn't have any presence on GCP and
doesn't intend to. This appears to be a non-starter at the moment.


I do still think it could be valuable for S3A, but unfortunately my work
doesn't align well with that. Is there someone else in the community
focused on S3A who would like to take up the collaboration with infra on
that bug?

Chris Nauroth


On Sat, Jul 26, 2025 at 6:35 AM Brahma Reddy Battula <[email protected]>
wrote:

> Hey Chris,
>
> Hope you mean "https://issues.apache.org/jira/browse/INFRA-27071";
>
>
> Regards,
> Brahma
>
> On Fri, Jul 25, 2025 at 4:38 AM Chris Nauroth <[email protected]> wrote:
> >
> > Great, thanks everyone! I went ahead and filed an infra ticket to ask for
> > buckets/credentials:
> >
> > https://issues.apache.org/jira/browse/INFRA-24353
> >
> > I'll keep you posted on progress.
> >
> > Steve, yes, I'm planning to start a HADOOP-19343 merge discuss/vote soon.
> >
> > Chris Nauroth
> >
> >
> > On Thu, Jul 24, 2025 at 4:47 AM Steve Loughran
> <[email protected]>
> > wrote:
> >
> > > Didn't know about the ASF credentials. We'd want them to be used
> somewhere
> > > to generate those session credentials, with those credentials the only
> > > secrets that a test run would have.
> > >
> > > I"d thought of somehow generating restricted session credentials to the
> > > target bucket only, and with a duration of 60 minutes -loss of
> credentials
> > > would only have marginal effect, primarily one of cost rather than
> > > privilege.
> > >
> > >
> > > >
> > > >
> > > > One nice aspect of GitHub Actions is that they can also be run on
> > > > individual forks. Contributors can configure their own AWS
> credentials
> > > > as secrets in their forks of the Hadoop repo and run the tests there.
> > > > This would help avoid consuming ASF resources directly. If ASF
> > > > credentials aren’t available, a link to the successful run on their
> > > > fork can also be included as a comment on the PR to confirm the test
> > > > results.
> > > >
> > > >
> > > +1
> > >
> > >
> > > > This was just an early idea I had back then—feel free to explore it
> > > > further if it seems useful.
> > > >
> > > > -Ayush
> > > >
> > > > [1] https://issues.apache.org/jira/browse/INFRA-24353
> > > >
> > > > On Thu, 24 Jul 2025 at 04:30, Chris Nauroth <[email protected]>
> wrote:
> > > > >
> > > > > Hello everyone,
> > > > >
> > > > > For years, we've relied on specific contributors to run and verify
> the
> > > > > integration tests for object store integrations like S3A, because
> the
> > > > tests
> > > > > require credentials for specific cloud providers. I'd like to
> explore
> > > if
> > > > we
> > > > > have any path forward today to bringing those tests into the
> pre-submit
> > > > > automation. If successful, I'd like to apply that strategy to the
> GCS
> > > > > integration tests, which are part of HADOOP-19343.
> > > >
> > >
> > > thinking about this -do you think this stuff should be merged in and
> > > stabilize in place?
> > > You've all been working on it for a while
> > >
> > > >
> > > > > To make this work, we'd need to either 1) run tests in a VM hosted
> in
> > > the
> > > > > cloud provider, where credentials are vended natively from an
> adjacent
> > > > > metadata server, or
> > >
> > >
> > > Impala does this
> > >
> > >
> > > > 2) export credentials so that the tests can run in any
> > > > > VM outside the cloud provider (and be really, really, really
> careful to
> > > > > secure the access to those exported credentials).
> > > >
> > >
> > > If I could wire up my own credentials to github credentials/actions,
> I'd
> > > locally generated a 12 hour session triple and upload them to github
> > > secrets for my own actions only. I'd need to somehow set up the test
> run so
> > > that
> > >
> > >    1. the binding info is picked up (i.e. auth-keys.xml) created in the
> > >    right place -or that is modified to fall back to env vars (it
> probably
> > >    already does this for aws credentials, so it's only the target
> bucket
> > > to be
> > >    picked up, e.g. HADOOP_AWS_TARGET_BUCKET. Easily done.
> > >    2. maven test runs exclude the root bucket tests and instead pick
> up a
> > >    run ID to use as the base path for tests. The build is set up for
> ths.
> > >
> > >
> > > Running tests with an env var test target rather than an auth-keys file
> > > could be done with something in core-site.xml which would set the test
> > > target to that of an env var; auth-keys.xml would override it.
> > >
> > >  <!-- why do we have two? -->
> > >   <property>
> > >     <name>test.fs.s3a.name</name>
> > >     <value>${env.HADOOP_AWS_BUCKET:-s3a://none}</value>
> > >   </property>
> > >
> > >   <property>
> > >     <name>fs.contract.test.fs.s3a</name>
> > >     <value>${test.fs.s3a.name}</value>
> > >   </property>
> > >
> > >   <include xmlns="http://www.w3.org/2001/XInclude";
> href="auth-keys.xml">
> > >     <fallback/>
> > >   </include>
> > >
> > > we'd need some special handling in test setup/S3AContract to recognise
> that
> > > "s3a://none" is a special marker to indicate there is no target FS.
> Again,
> > > easily done.
> > >
> > > Summary of thoughts:
> > >
> > >    1. we put env var binding into core-site.xml with S3AContract
> support
> > >    2. github action can run the itests without root bucket tests
> enabled
> > >    (if they ever want to test PRs in parallel)
> > >    3. people can upload their own (session) credentials with very
> > >    restricted roles
> > >    4. document this
> > >    5. let someone bold try it out
> > >
> > >
> > > There's also flaky tests. My Junit5 ITest PR adds a @FlakyTest tag
> which
> > > could be used to turn off those which are a bit brittle -but should
> only be
> > > used if the behavior is unfixable (network buffer overruns in
> > > (AbstractContractUnbufferTest) is the only legit use I can see
> > >
> > > >
> > > > > Has anyone else already explored this recently? If not, I was
> thinking
> > > of
> > > > > filing an INFRA ticket to discuss if they already have established
> > > > patterns
> > > > > for this. This is potentially relevant to other projects. (It was
> the
> > > > code
> > > > > review for FLINK-37247 that prompted me to start this
> conversation.) I
> > > > > think it makes sense to solve it in Hadoop first and then extend
> it to
> > > > > other projects.
> > > > >
> > > >
> > > > Spark and Iceberg use docker and Minio. Good: you only need docker.
> Bad:
> > > it's still some variant of a mock test, as passing it says very little
> > > about things working with the real stores. I wouldn't trust a PR to go
> in
> > > with only that.
> > >
> > > Anyway, I like everyone to test in their own setup as that helps find
> cases
> > > where the connector is brittle to different deployment setups. The more
> > > diverse test environments are, the more issues get found and fixed
> before
> > > we ship
> > >
>

Re: Object Store Integration Tests in Pre-submit

Reply via email to