milenkovicm commented on code in PR #1374:
URL:
https://github.com/apache/datafusion-ballista/pull/1374#discussion_r2678525417
##########
ballista/core/src/execution_plans/shuffle_reader.rs:
##########
@@ -521,33 +539,44 @@ async fn fetch_partition_remote(
async fn fetch_partition_local(
location: &PartitionLocation,
+ arrow_ipc_reader_skip_validation: bool,
) -> result::Result<SendableRecordBatchStream, BallistaError> {
let path = &location.path;
let metadata = &location.executor_meta;
let partition_id = &location.partition_id;
- let reader = fetch_partition_local_inner(path).map_err(|e| {
- // return BallistaError::FetchFailed may let scheduler retry this task.
- BallistaError::FetchFailed(
- metadata.id.clone(),
- partition_id.stage_id,
- partition_id.partition_id,
- e.to_string(),
- )
- })?;
+ let reader = fetch_partition_local_inner(path,
arrow_ipc_reader_skip_validation)
+ .map_err(|e| {
+ // return BallistaError::FetchFailed may let scheduler retry this
task.
+ BallistaError::FetchFailed(
+ metadata.id.clone(),
+ partition_id.stage_id,
+ partition_id.partition_id,
+ e.to_string(),
+ )
+ })?;
Ok(Box::pin(LocalShuffleStream::new(reader)))
}
fn fetch_partition_local_inner(
path: &str,
+ arrow_ipc_reader_skip_validation: bool,
) -> result::Result<StreamReader<BufReader<File>>, BallistaError> {
let file = File::open(path).map_err(|e| {
BallistaError::General(format!("Failed to open partition file at
{path}: {e:?}"))
})?;
let file = BufReader::new(file);
- let reader = StreamReader::try_new(file, None).map_err(|e| {
- BallistaError::General(format!("Failed to new arrow FileReader at
{path}: {e:?}"))
- })?;
+ // Safety: setting `skip_validation` requires `unsafe`, user assures data
is valid
Review Comment:
I'm not sure if we're introducing security issue with `skip validation`, at
the moment shuffle files use absolutely paths which may bring issues with skip
validation.
as I mentioned already, would it make sense to make this compile time
configuration so users could decide for themself if they want it or not?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]