westhide commented on PR #1216: URL: https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2777671990
> > Q1: As the `BallistaFlightService` keep listenning on each Executor,writting it allow client to send a `do_get` request, and without check `FetchPartition` action's `path` is created by shuffle writer, so the client can try to read any file on the executor. In this scene, we can enable validation. > > I guess we can safely assume that only shuffle files are accessed, not a random files. > > > Q2: I'm not sure. As currently we just read ipc file created by `ShuffleWriterExec`, it's safe to skip all validation. > > what I was having in mind is enabling unsafe (without validation) by default but having a executor configuration switch which could revert this. It might be easy to cover case with arrow flight but a bit harder when ShuffleReader reads local shuffle file > > > ### Addition > > Should we consider the power down scene? It may cause Arrow ipc file broken if `ShuffleWriterExec` is writting. Maybe we can support Job recover and reuse the partition file in the future. > > Im not sure this case can be achieved with this validation? how can we be sure that file is written fully? I guess we'd need some kind of checksum for this scenario. All right, will move the validation config to Executor. And will try to add a checksum validation with failed Job rerun after finish the wasm udf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org