westhide commented on PR #1216:
URL: 
https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2777671990

   > > Q1: As the `BallistaFlightService` keep listenning on each 
Executor,writting it allow client to send a `do_get` request, and without check 
`FetchPartition` action's `path` is created by shuffle writer, so the client 
can try to read any file on the executor. In this scene, we can enable 
validation.
   > 
   > I guess we can safely assume that only shuffle files are accessed, not a 
random files.
   > 
   > > Q2: I'm not sure. As currently we just read ipc file created by 
`ShuffleWriterExec`, it's safe to skip all validation.
   > 
   > what I was having in mind is enabling unsafe (without validation) by 
default but having a executor configuration switch which could revert this. It 
might be easy to cover case with arrow flight but a bit harder when 
ShuffleReader reads local shuffle file
   > 
   > > ### Addition
   > > Should we consider the power down scene? It may cause Arrow ipc file 
broken if `ShuffleWriterExec` is writting. Maybe we can support Job recover and 
reuse the partition file in the future.
   > 
   > Im not sure this case can be achieved with this validation? how can we be 
sure that file is written fully? I guess we'd need some kind of checksum for 
this scenario.
   
   All right, will move the validation config to Executor.
   
   And will try to add a checksum validation with failed Job rerun after finish 
the wasm udf.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to