sollhui opened a new pull request, #64684:
URL: https://github.com/apache/doris/pull/64684
### What problem does this PR solve?
Before this change, S3-compatible glob listing derived the object-store
`ListObjects` prefix by stopping at the first glob metacharacter. For a path
like:
`s3://bucket/asin_trend/sale/month/date=2025-{0[3-9],1[0-2]}-01/mp_id=8/0/0/436/*`
the old behavior listed the broad prefix:
`asin_trend/sale/month/date=2025-`
and then filtered all returned object keys in FE. If many unrelated objects
existed under `date=2025-*`, for example other dates, `mp_id`s, or deeper
paths, S3 TVF planning could spend a long time listing and filtering files
before query execution started.
After this change, Doris expands safely enumerable glob fragments before
issuing object-store list requests. The same path is now listed through
narrower prefixes such as:
`asin_trend/sale/month/date=2025-03-01/mp_id=8/0/0/436/`
...
`asin_trend/sale/month/date=2025-12-01/mp_id=8/0/0/436/`
Doris still applies the full glob regex after listing, so result correctness
is unchanged. The optimization only reduces the remote listing scope. Expansion
is limited to bounded brace alternations and positive character classes, with a
hard cap to avoid generating too many prefixes. Existing pagination behavior
through `startAfter` and `maxFile` is preserved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]