Re: [RFC] Proposal: Reserve Space for Allocated Blocks

Kaijie Chen Wed, 28 Sep 2022 06:02:30 -0700

Plus: If someday the QPS of OM and SCM has been improved,
this (over-allocation) might be a more significant problem.


Kaijie

 ---- On Wed, 28 Sep 2022 20:54:17 +0800  Kaijie Chen  wrote --- 
 > Hi Anu,
 > 
 > Thanks for your suggestions. These are indeed where we can
 > improve the code. I have something more to share.
 > 
 > I did more tests today, and I have observed containers over 15 GB,
 > which is 15 times of the configured container size limit (1 GB).
 > It might be related to the pipeline chosing policy and the container
 > close threshold (99%).
 > 
 > Because we have no control of how many block can be allocated
 > simultaneously, it seems there is risk we can get abnormally
 > large containers. What do you think?
 > 
 > I have also tested the simple delay proposal. It sometimes works well.
 > But sometimes still produces fragmented blocks. This is expected.
 > 
 > Kaijie
 > 
 >  ---- On Wed, 28 Sep 2022 08:00:38 +0800  anu engineer  wrote --- 
 >  >  Thank you for the POC, and the numbers from your POC. It looks very good.
 >  > I know this is a private POCproposal, yet I have two minor questions.
 >  > 
 >  > 1.  Should we maintain the client ID in  "private final Map<ContainerID,
 >  > Long> containerLeases" map ? so instead of a long we maintain a Long +
 >  > Client ID is what I was thinking. Might be useful for debugging.
 >  > 2. Suppose a client keeps on renewing a container lease, do we want to
 >  > enforce a maximum limit ? It is not needed per se -- more like a question
 >  > that I am asking myself.
 >  > 
 >  > Thanks
 >  > Anu
 >  > 
 >  > 
 >  > 
 >  > 
 >  > On Mon, Sep 26, 2022 at 2:42 AM Kaijie Chen [email protected]> wrote:
 >  > 
 >  > > Hi everyone,
 >  > >
 >  > > I've implemented a container lease POC [1], and the result looks good.
 >  > >
 >  > > Here's what's changed in the POC:
 >  > >
 >  > > 1. SCM will keep a LeaseExipreAt for each OPEN container. If SCM
 >  > >     receives container close command, it will change the container
 >  > >     state to CLOSING, but it will not send close container command
 >  > >     to DN until the lease expires.
 >  > > 2. OM will forward the container lease request from Client to SCM.
 >  > > 3. Client will acquire lease when a block is allocated (to be improved),
 >  > >     and it will renew leases for open blocks before its expiration.
 >  > >     Client will ignore any errors with leases, and keep writing chunks
 >  > >     to DN even if lease expires. Because the wrost case is simply
 >  > >     ContainerNotOpenException.
 >  > >
 >  > > Despite this POC is not perfect, the result in my tests looks good.
 >  > >
 >  > > Cluster: 48 datanodes on 4 machines
 >  > > Client: Ozone freon ockg
 >  > > Threads: 100
 >  > > Key count: 1000
 >  > > Key size: 1000 MB
 >  > > ReplicationConfig: EC/RS-10-4-1024K
 >  > >
 >  > > We should expect 14000x 100 MB blocks in ideal condition.
 >  > > I'm only showing the data from 1 of the 4 machines.
 >  > >
 >  > >
 >  > > Before the change (commit 1cf5678224bf00dee580ffdb14ab8b650cc1e2e0):
 >  > >     (The number before each sizes is the count of blocks in that size)
 >  > >
 >  > >     15 1.0M 48 2.0M 40 3.0M 48 4.0M 37 5.0M 33 6.0M 48 7.0M 51 8.0M
 >  > >     30 9.0M 49 10M 40 11M 65 12M 33 13M 18 14M 43 15M 46 16M 38 17M
 >  > >     20 18M 46 19M 32 20M 5 21M 54 22M 58 23M 33 24M 25 25M 39 26M
 >  > >     44 27M 48 28M 25 29M 18 30M 34 31M 42 32M 22 33M 23 34M 27 35M
 >  > >     26 36M 33 37M 27 38M 30 39M 60 40M 25 41M 27 42M 26 43M 20 44M
 >  > >     13 45M 18 46M 40 47M 27 48M 25 49M 15 50M 40 51M 26 52M 41 53M
 >  > >     41 54M 9 55M 11 56M 11 57M 19 58M 30 59M 28 60M 44 61M 36 62M
 >  > >     21 63M 14 64M 19 65M 14 66M 23 67M 33 68M 40 69M 34 70M 17 71M
 >  > >     10 72M 35 73M 28 74M 24 75M 21 76M 34 77M 26 78M 35 79M 18 80M
 >  > >     27 81M 26 82M 14 83M 19 84M 23 85M 29 86M 4 87M 23 88M 37 89M
 >  > >     11 90M 23 91M 38 92M 16 93M 12 94M 18 95M 21 96M 27 97M 19 98M
 >  > >     35 99M 2099 100M
 >  > >
 >  > > Container size before the change:
 >  > >
 >  > >     $ ./ozone admin container list -c 10000 | grep usedBytes | awk 
 > '{print
 >  > > $3}' | sort | xargs echo
 >  > >     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 >  > > 0, 0, 0, 1001390080,
 >  > >     1002438656, 1003487232, 1003487232, 1004535808, 1004535808, 
 > 1004535808,
 >  > >     1004535808, 1006632960, 1007681536, 1010827264, 1011875840, 
 > 1011875840,
 >  > >     1011875840, 1013972992, 1016070144, 1016070144, 1016070144, 
 > 1019215872,
 >  > >     1024458752, 1028653056, 1028653056, 1031798784, 1032847360, 
 > 1032847360,
 >  > >     1032847360, 1033895936, 1035993088, 1044381696, 1046478848, 
 > 1050673152,
 >  > >     1062207488, 1092616192, 1096810496, 968884224, 968884224, 970981376,
 >  > >     970981376, 972029952, 972029952, 973078528, 973078528, 974127104,
 >  > >     974127104, 975175680, 976224256, 976224256, 976224256, 976224256,
 >  > >     976224256, 976224256, 976224256, 976224256, 979369984, 980418560,
 >  > >     980418560, 980418560, 981467136, 981467136, 983564288, 983564288,
 >  > >     983564288, 984612864, 984612864, 984612864, 985661440, 985661440,
 >  > >     985661440, 985661440, 986710016, 986710016, 987758592, 987758592,
 >  > >     988807168, 988807168, 989855744, 989855744, 989855744, 989855744,
 >  > >     990904320, 990904320, 990904320, 990904320, 990904320, 990904320,
 >  > >     991952896, 991952896, 993001472, 994050048, 996147200, 997195776,
 >  > >     998244352, 998244352,
 >  > >
 >  > >
 >  > > After the change (commit 52c903ccc644aba63bbd5354bae98bc8bbe13675):
 >  > >     (Occasionally, there are a few blocks breaked into smaller ones)
 >  > >
 >  > >     3571 100M
 >  > >
 >  > > Container sizes after the change:
 >  > >
 >  > >     **Note: "ozone.scm.container.size" was set to 1G**
 >  > >     **Note: "hdds.datanode.storage.utilization.critical.threshold" was 
 > set
 >  > > to 0.99**
 >  > >
 >  > >     $ ./ozone admin container list -c 10000 | grep usedBytes | awk 
 > '{print
 >  > > $3}' | sort | xargs echo
 >  > >     0, 1258291200, 1258291200, 1363148800, 1468006400, 1782579200,
 >  > > 1887436800,
 >  > >     1887436800, 1992294400, 2306867200, 2621440000, 2621440000, 
 > 2726297600,
 >  > >     2831155200, 2831155200, 2936012800, 2936012800, 3040870400, 
 > 3040870400,
 >  > >     3040870400, 3040870400, 3040870400, 3145728000, 3250585600, 
 > 3250585600,
 >  > >     3355443200, 3355443200, 3460300800, 3565158400, 3565158400, 
 > 3670016000,
 >  > >     3670016000, 3774873600, 3879731200, 3879731200, 4404019200, 
 > 4404019200,
 >  > >
 >  > > I've also done tests in RATIS/THREE, the results looks similiar.
 >  > >
 >  > >
 >  > > What I've implemented in POC is basically don't let DN close a
 >  > > container if it is recently written to. And it could be implemented
 >  > > solely in DN by a lastUpdated timestamp in containers.
 >  > > So we won't need extra RPCs to achieve this, what do you think?
 >  > >
 >  > > Please help verify and give feedbacks and suggestions.
 >  > >
 >  > > Thanks,
 >  > > Kaijie
 >  > >
 >  > > ---
 >  > >
 >  > > [1]: https://github.com/kaijchen/ozone/tree/container-lease
 >  > >
 >  > > ---------------------------------------------------------------------
 >  > > To unsubscribe, e-mail: [email protected]
 >  > > For additional commands, e-mail: [email protected]
 >  > >
 >  > >
 >  > 
 > 
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: [email protected]
 > For additional commands, e-mail: [email protected]
 > 
 > 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [RFC] Proposal: Reserve Space for Allocated Blocks

Reply via email to