Extending the same thought from Steven. If you are going to do a small delay, it is better to do it via a Lease.
So SCM could offer a lease for 60 seconds, with a provision to reacquire the lease one more time. This does mean that a single container inside the data node technically could become larger than 5GB (but that is possible even today). I do think a lease or a timeout based approach (as suggested by Steven) might be easier than pre-allocating blocks. Thanks Anu On Fri, Sep 9, 2022 at 12:47 AM Stephen O'Donnell <sodonn...@cloudera.com.invalid> wrote: > > 4. Datanode wait until no write commands to this container, then close > it. > > This could be done on SCM, with a simple delay. Ie hold back the close > commands for a "normal" close for some configurable amount of time. Eg if > we hold for 60 seconds, it is likely almost all blocks will get written. If > a very small number fail, it is likely OK. > > On Fri, Sep 9, 2022 at 5:01 AM Kaijie Chen <c...@apache.org> wrote: > > > Thanks Ethan. Yes this could be a simpler solution. > > The main idea is allowing container size limit to be exceeded, > > to ensure all allocated blocks can be finished. > > > > We can change it to something like this: > > 1. Datanode notices the container is near full. > > 2. Datanode sends close container action to SCM immediately. > > 3. SCM closes the container and stops allocating new blocks in it. > > 4. Datanode wait until no write commands to this container, then close > it. > > > > It's still okay to wait for the next heartbeat in step 2. > > Step 4 is a little bit tricky, we need a lease or timeout to determine > the > > time. > > > > Kaijie > > > > ---- On Fri, 09 Sep 2022 08:54:34 +0800 Ethan Rose wrote --- > > > I believe the flow is: > > > 1. Datanode notices the container is near full. > > > 2. Datanode sends close container action to SCM on its next heartbeat. > > > 3. SCM closes the container and sends a close container command on the > > > heartbeat response. > > > 4. Datanodes get the response and close the container. If it is a > Ratis > > > container, the leader will send the close via Ratis. > > > > > > There is a "grace period" of sorts between steps 1 and 2, but this > does > > not > > > help the situation because SCM does not stop issuing blocks to this > > > container until after step 3. Perhaps some amount of pause between > > steps 3 > > > and 4 would help, either on the SCM or datanode side. This would > > provide a > > > "grace period" between when SCM stops allocating blocks for the > > container > > > and when the container is actually closed. I'm not sure exactly how > this > > > would be implemented in the code given the current setup, but it seems > > like > > > a simple option we should try before other more complicated solutions. > > > > > > Ethan > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > For additional commands, e-mail: dev-h...@ozone.apache.org > > > > >