Re: [PR] feat: job id is incremental [datafusion-ballista]

via GitHub Sun, 01 Jun 2025 03:36:00 -0700


milenkovicm commented on PR #1267:
URL: 
https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926982414


   there is issue with having job id tied to physical directory, which may make 
mess when scheduler is restarted without restarting executors, making 
possibility to overlap job data directories (shuffle readers may read wrong 
spill in current implementation) thus just using incremental id in current 
implementation may make problems. 
   
   I was looking at the `ULID` they generate random but sortable IDs, downside 
is that they are too big (IMHO), not easiest to reason about ordering either.
   
   snowflake like ids, are sortable and unique; for them machine id should be 
exposed as a configuration parameter. 
   
   Current approach is not sortable, now there is slight chance to get name 
collision (with very low probability).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: job id is incremental [datafusion-ballista]

Reply via email to