Hi everyone,

I've implemented a container lease POC [1], and the result looks good.

Here's what's changed in the POC:

1. SCM will keep a LeaseExipreAt for each OPEN container. If SCM
    receives container close command, it will change the container
    state to CLOSING, but it will not send close container command
    to DN until the lease expires.
2. OM will forward the container lease request from Client to SCM.
3. Client will acquire lease when a block is allocated (to be improved),
    and it will renew leases for open blocks before its expiration.
    Client will ignore any errors with leases, and keep writing chunks
    to DN even if lease expires. Because the wrost case is simply
    ContainerNotOpenException.

Despite this POC is not perfect, the result in my tests looks good.

Cluster: 48 datanodes on 4 machines
Client: Ozone freon ockg
Threads: 100
Key count: 1000
Key size: 1000 MB
ReplicationConfig: EC/RS-10-4-1024K

We should expect 14000x 100 MB blocks in ideal condition.
I'm only showing the data from 1 of the 4 machines.


Before the change (commit 1cf5678224bf00dee580ffdb14ab8b650cc1e2e0):
    (The number before each sizes is the count of blocks in that size)

    15 1.0M 48 2.0M 40 3.0M 48 4.0M 37 5.0M 33 6.0M 48 7.0M 51 8.0M
    30 9.0M 49 10M 40 11M 65 12M 33 13M 18 14M 43 15M 46 16M 38 17M
    20 18M 46 19M 32 20M 5 21M 54 22M 58 23M 33 24M 25 25M 39 26M
    44 27M 48 28M 25 29M 18 30M 34 31M 42 32M 22 33M 23 34M 27 35M
    26 36M 33 37M 27 38M 30 39M 60 40M 25 41M 27 42M 26 43M 20 44M
    13 45M 18 46M 40 47M 27 48M 25 49M 15 50M 40 51M 26 52M 41 53M
    41 54M 9 55M 11 56M 11 57M 19 58M 30 59M 28 60M 44 61M 36 62M
    21 63M 14 64M 19 65M 14 66M 23 67M 33 68M 40 69M 34 70M 17 71M
    10 72M 35 73M 28 74M 24 75M 21 76M 34 77M 26 78M 35 79M 18 80M
    27 81M 26 82M 14 83M 19 84M 23 85M 29 86M 4 87M 23 88M 37 89M
    11 90M 23 91M 38 92M 16 93M 12 94M 18 95M 21 96M 27 97M 19 98M
    35 99M 2099 100M

Container size before the change:

    $ ./ozone admin container list -c 10000 | grep usedBytes | awk '{print $3}' 
| sort | xargs echo
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1001390080,
    1002438656, 1003487232, 1003487232, 1004535808, 1004535808, 1004535808,
    1004535808, 1006632960, 1007681536, 1010827264, 1011875840, 1011875840,
    1011875840, 1013972992, 1016070144, 1016070144, 1016070144, 1019215872,
    1024458752, 1028653056, 1028653056, 1031798784, 1032847360, 1032847360,
    1032847360, 1033895936, 1035993088, 1044381696, 1046478848, 1050673152,
    1062207488, 1092616192, 1096810496, 968884224, 968884224, 970981376,
    970981376, 972029952, 972029952, 973078528, 973078528, 974127104,
    974127104, 975175680, 976224256, 976224256, 976224256, 976224256,
    976224256, 976224256, 976224256, 976224256, 979369984, 980418560,
    980418560, 980418560, 981467136, 981467136, 983564288, 983564288,
    983564288, 984612864, 984612864, 984612864, 985661440, 985661440,
    985661440, 985661440, 986710016, 986710016, 987758592, 987758592,
    988807168, 988807168, 989855744, 989855744, 989855744, 989855744,
    990904320, 990904320, 990904320, 990904320, 990904320, 990904320,
    991952896, 991952896, 993001472, 994050048, 996147200, 997195776,
    998244352, 998244352,


After the change (commit 52c903ccc644aba63bbd5354bae98bc8bbe13675):
    (Occasionally, there are a few blocks breaked into smaller ones)

    3571 100M

Container sizes after the change:

    **Note: "ozone.scm.container.size" was set to 1G**
    **Note: "hdds.datanode.storage.utilization.critical.threshold" was set to 
0.99**

    $ ./ozone admin container list -c 10000 | grep usedBytes | awk '{print $3}' 
| sort | xargs echo
    0, 1258291200, 1258291200, 1363148800, 1468006400, 1782579200, 1887436800,
    1887436800, 1992294400, 2306867200, 2621440000, 2621440000, 2726297600,
    2831155200, 2831155200, 2936012800, 2936012800, 3040870400, 3040870400,
    3040870400, 3040870400, 3040870400, 3145728000, 3250585600, 3250585600,
    3355443200, 3355443200, 3460300800, 3565158400, 3565158400, 3670016000,
    3670016000, 3774873600, 3879731200, 3879731200, 4404019200, 4404019200,

I've also done tests in RATIS/THREE, the results looks similiar.


What I've implemented in POC is basically don't let DN close a
container if it is recently written to. And it could be implemented
solely in DN by a lastUpdated timestamp in containers.
So we won't need extra RPCs to achieve this, what do you think?

Please help verify and give feedbacks and suggestions.

Thanks,
Kaijie

---

[1]: https://github.com/kaijchen/ozone/tree/container-lease

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
For additional commands, e-mail: dev-h...@ozone.apache.org

Reply via email to