Re: [RFC] Proposal: Reserve Space for Allocated Blocks

Kaijie Chen Wed, 28 Sep 2022 05:56:07 -0700

Hi Anu,

Thanks for your suggestions. These are indeed where we can
improve the code. I have something more to share.


I did more tests today, and I have observed containers over 15 GB,
which is 15 times of the configured container size limit (1 GB).
It might be related to the pipeline chosing policy and the container
close threshold (99%).

Because we have no control of how many block can be allocated
simultaneously, it seems there is risk we can get abnormally
large containers. What do you think?

I have also tested the simple delay proposal. It sometimes works well.
But sometimes still produces fragmented blocks. This is expected.

Kaijie

 ---- On Wed, 28 Sep 2022 08:00:38 +0800  anu engineer  wrote --- 
 >  Thank you for the POC, and the numbers from your POC. It looks very good.
 > I know this is a private POCproposal, yet I have two minor questions.
 > 
 > 1.  Should we maintain the client ID in  "private final Map<ContainerID,
 > Long> containerLeases" map ? so instead of a long we maintain a Long +
 > Client ID is what I was thinking. Might be useful for debugging.
 > 2. Suppose a client keeps on renewing a container lease, do we want to
 > enforce a maximum limit ? It is not needed per se -- more like a question
 > that I am asking myself.
 > 
 > Thanks
 > Anu
 > 
 > 
 > 
 > 
 > On Mon, Sep 26, 2022 at 2:42 AM Kaijie Chen [email protected]> wrote:
 > 
 > > Hi everyone,
 > >
 > > I've implemented a container lease POC [1], and the result looks good.
 > >
 > > Here's what's changed in the POC:
 > >
 > > 1. SCM will keep a LeaseExipreAt for each OPEN container. If SCM
 > >     receives container close command, it will change the container
 > >     state to CLOSING, but it will not send close container command
 > >     to DN until the lease expires.
 > > 2. OM will forward the container lease request from Client to SCM.
 > > 3. Client will acquire lease when a block is allocated (to be improved),
 > >     and it will renew leases for open blocks before its expiration.
 > >     Client will ignore any errors with leases, and keep writing chunks
 > >     to DN even if lease expires. Because the wrost case is simply
 > >     ContainerNotOpenException.
 > >
 > > Despite this POC is not perfect, the result in my tests looks good.
 > >
 > > Cluster: 48 datanodes on 4 machines
 > > Client: Ozone freon ockg
 > > Threads: 100
 > > Key count: 1000
 > > Key size: 1000 MB
 > > ReplicationConfig: EC/RS-10-4-1024K
 > >
 > > We should expect 14000x 100 MB blocks in ideal condition.
 > > I'm only showing the data from 1 of the 4 machines.
 > >
 > >
 > > Before the change (commit 1cf5678224bf00dee580ffdb14ab8b650cc1e2e0):
 > >     (The number before each sizes is the count of blocks in that size)
 > >
 > >     15 1.0M 48 2.0M 40 3.0M 48 4.0M 37 5.0M 33 6.0M 48 7.0M 51 8.0M
 > >     30 9.0M 49 10M 40 11M 65 12M 33 13M 18 14M 43 15M 46 16M 38 17M
 > >     20 18M 46 19M 32 20M 5 21M 54 22M 58 23M 33 24M 25 25M 39 26M
 > >     44 27M 48 28M 25 29M 18 30M 34 31M 42 32M 22 33M 23 34M 27 35M
 > >     26 36M 33 37M 27 38M 30 39M 60 40M 25 41M 27 42M 26 43M 20 44M
 > >     13 45M 18 46M 40 47M 27 48M 25 49M 15 50M 40 51M 26 52M 41 53M
 > >     41 54M 9 55M 11 56M 11 57M 19 58M 30 59M 28 60M 44 61M 36 62M
 > >     21 63M 14 64M 19 65M 14 66M 23 67M 33 68M 40 69M 34 70M 17 71M
 > >     10 72M 35 73M 28 74M 24 75M 21 76M 34 77M 26 78M 35 79M 18 80M
 > >     27 81M 26 82M 14 83M 19 84M 23 85M 29 86M 4 87M 23 88M 37 89M
 > >     11 90M 23 91M 38 92M 16 93M 12 94M 18 95M 21 96M 27 97M 19 98M
 > >     35 99M 2099 100M
 > >
 > > Container size before the change:
 > >
 > >     $ ./ozone admin container list -c 10000 | grep usedBytes | awk '{print
 > > $3}' | sort | xargs echo
 > >     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 > > 0, 0, 0, 1001390080,
 > >     1002438656, 1003487232, 1003487232, 1004535808, 1004535808, 1004535808,
 > >     1004535808, 1006632960, 1007681536, 1010827264, 1011875840, 1011875840,
 > >     1011875840, 1013972992, 1016070144, 1016070144, 1016070144, 1019215872,
 > >     1024458752, 1028653056, 1028653056, 1031798784, 1032847360, 1032847360,
 > >     1032847360, 1033895936, 1035993088, 1044381696, 1046478848, 1050673152,
 > >     1062207488, 1092616192, 1096810496, 968884224, 968884224, 970981376,
 > >     970981376, 972029952, 972029952, 973078528, 973078528, 974127104,
 > >     974127104, 975175680, 976224256, 976224256, 976224256, 976224256,
 > >     976224256, 976224256, 976224256, 976224256, 979369984, 980418560,
 > >     980418560, 980418560, 981467136, 981467136, 983564288, 983564288,
 > >     983564288, 984612864, 984612864, 984612864, 985661440, 985661440,
 > >     985661440, 985661440, 986710016, 986710016, 987758592, 987758592,
 > >     988807168, 988807168, 989855744, 989855744, 989855744, 989855744,
 > >     990904320, 990904320, 990904320, 990904320, 990904320, 990904320,
 > >     991952896, 991952896, 993001472, 994050048, 996147200, 997195776,
 > >     998244352, 998244352,
 > >
 > >
 > > After the change (commit 52c903ccc644aba63bbd5354bae98bc8bbe13675):
 > >     (Occasionally, there are a few blocks breaked into smaller ones)
 > >
 > >     3571 100M
 > >
 > > Container sizes after the change:
 > >
 > >     **Note: "ozone.scm.container.size" was set to 1G**
 > >     **Note: "hdds.datanode.storage.utilization.critical.threshold" was set
 > > to 0.99**
 > >
 > >     $ ./ozone admin container list -c 10000 | grep usedBytes | awk '{print
 > > $3}' | sort | xargs echo
 > >     0, 1258291200, 1258291200, 1363148800, 1468006400, 1782579200,
 > > 1887436800,
 > >     1887436800, 1992294400, 2306867200, 2621440000, 2621440000, 2726297600,
 > >     2831155200, 2831155200, 2936012800, 2936012800, 3040870400, 3040870400,
 > >     3040870400, 3040870400, 3040870400, 3145728000, 3250585600, 3250585600,
 > >     3355443200, 3355443200, 3460300800, 3565158400, 3565158400, 3670016000,
 > >     3670016000, 3774873600, 3879731200, 3879731200, 4404019200, 4404019200,
 > >
 > > I've also done tests in RATIS/THREE, the results looks similiar.
 > >
 > >
 > > What I've implemented in POC is basically don't let DN close a
 > > container if it is recently written to. And it could be implemented
 > > solely in DN by a lastUpdated timestamp in containers.
 > > So we won't need extra RPCs to achieve this, what do you think?
 > >
 > > Please help verify and give feedbacks and suggestions.
 > >
 > > Thanks,
 > > Kaijie
 > >
 > > ---
 > >
 > > [1]: https://github.com/kaijchen/ozone/tree/container-lease
 > >
 > > ---------------------------------------------------------------------
 > > To unsubscribe, e-mail: [email protected]
 > > For additional commands, e-mail: [email protected]
 > >
 > >
 > 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [RFC] Proposal: Reserve Space for Allocated Blocks

Reply via email to