wenzhenghu opened a new pull request, #64649:
URL: https://github.com/apache/doris/pull/64649

   ### What problem does this PR solve?
   
   Issue Number: None
   
   Related PR: None
   
   Problem Summary:
   
   This PR adds a new workload policy condition 
`be_scan_bytes_from_remote_storage`, which allows Doris to cancel queries 
according to the amount of data read from remote storage by BE scan tasks. This 
is useful for limiting external table queries that read too much remote HDFS or 
object storage data.
   
   Implementation summary:
   
   - Add a new BE-side workload metric type in thrift for remote storage scan 
bytes.
   - Add FE workload policy parsing, validation, metadata mapping, and replay 
support for `be_scan_bytes_from_remote_storage`.
   - Add BE workload condition evaluation based on 
`io_context()->scan_bytes_from_remote_storage()`.
   - Add regression coverage using an existing Hive external `lineitem` table.
   
   ### Release note
   
   Support workload policy cancellation by BE remote storage scan bytes.
   
   ### Check List (For Author)
   
   - Test:
       - FE UT: passed
       - BE UT: passed
       - Regression test: passed, `test_workload_policy_remote_scan_bytes`
       - Manual test: verified existing workload policy behavior and new remote 
scan bytes cancellation on a deployed Doris instance
   - Behavior changed: Yes. Add a new workload policy condition 
`be_scan_bytes_from_remote_storage`.
   - Does this need documentation: Yes. The workload policy condition list 
should be updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to