milenkovicm commented on PR #1267: URL: https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926982414
there is issue with having job id tied to physical directory, which may make mess when scheduler is restarted without restarting executors, making possibility to overlap job data directories (shuffle readers may read wrong spill in current implementation) thus just using incremental id in current implementation may make problems. I was looking at the `ULID` they generate random but sortable IDs, downside is that they are too big (IMHO), not easiest to reason about ordering either. snowflake like ids, are sortable and unique; for them machine id should be exposed as a configuration parameter. Current approach is not sortable, now there is slight chance to get name collision (with very low probability). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org