compasses opened a new pull request, #15839:
URL: https://github.com/apache/doris/pull/15839
# Proposed changes
Issue Number: just one part of #11640
## Problem summary
Describe your changes.
## Checklist(Required)
1. Does it affect the original behavior:
- [ ] Yes
- [ ✓] No
- [ ] I don't know
2. Has unit tests been added:
- [ ] Yes
- [ ] No
- [ ✓] No Need
3. Has document been added or modified:
- [ ] Yes
- [ ✓] No
- [ ] No Need
4. Does it need to update dependencies:
- [ ] Yes
- [ ✓] No
5. Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [ ✓] No
## Further comments
This PR is one part of our bulk load implementation, which provide the tool
to build the segment file of a tablet in an external way.
It's support build local and HDFS, which means you need provide the meta
file and the data file like this:
```
./segment_builder --meta_file=/path/to/hdr/88409.hdr
--data_path=/path/to/data/file --format=parquet --is_remote=false
ll /path/to/data/file
xxx1..gz.parquet
xxx2..gz.parquet
...
```
If the file all from the HDFS, the path should be the HDFS path. Currently
only support parquet.
Since from internal we use the privately-owned HDFS lib, *** so this PR HDFS
related code may not work ***. I don't have such open source HDFS environment
to test it.

From above picture you can see the final work flow:
1. Read the hdr file from the meta path, do some validation and system
initialization.
2. Build the HDFS scanner, and read the parquet file from HDFS directly, and
generate the segment file on local disk.
3. At last upload the segment file to HDFS, same path with the hdr file, and
all these files will be used by the load segment statement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]