=====Few Edits Below===================

Dear Ozone Devs,

As you may know, we have been actively developing Ozone Erasure Coding
support in a separate branch HDDS-3816-ec.

We have finished the development of EC key write and read functionality.
The support of offline recovery( Recovering replica from node loss) will be
part of second phase work.

Since the code has already grown and increasingly started seeing merge
complications, we would like to merge the current EC branch into master.

We filed the new JIRA(HDDS-6462) for the second phase of work and continued
the offline recovery work there. (we have uploaded the design doc there)

Details on Changes:

   -

   Most of the EC core logic went to newly extended classes. Key changes
   went into EC*OutputStream and EC*InputStream classes for write and read
   respectively. Based on replication type, ECPipelineProvider will be chosen
   for creating EC pipelines.



   -

   Since we cannot represent the EC replication in the existing replication
   factor, we have introduced ECReplicationConfig. The ReplicationConfig
   interface is already pushed to master, so it’s not a new idea coming
   through this branch merge now. What is newly coming here is the
   ECReplicationConfig class which can be used to express EC replication
   configuration.



   -

   We wanted to provide the support to enable EC at bucket level. To
   simplify some complications, we have moved the default replication
   configurations from client to server.



   -

   Client side replication type and replication factor removed from the
   configuration files and introduced the ozone.server.default.replication
   and ozone.server.default.replication.type.We would continue to respect if
   one configures at client side explicitly or passed through APIs, otherwise
   server side bucket level properties or server side default configuration
   would take effect.



   -

   Other than this change, the rest of EC side code should not impact any
   of the existing code flows.


We have finished documentation JIRA(HDDS-6172) for covering this feature
and we will continue to improve further in master.

Git Branch Name : HDDS-3816-ec

JIRAs: HDDS-3816 and HDDS-5351

Completed tasks: ~ 142

+ We are covering the following two mandatory JIRAs to come in:

1. HDDS-6209: EC: [Forward compatibility issue] New client to older server
could fail due to the unavailability for client default replication config

2. HDDS-5909: EC: Onboard EC into upgrade framework.

PRs reviews in-progress and expected to close in a day or two.

Few other JIRAs in HDDS-3816 are still open but I believe they're not
blockers for merge.

In short what you can do now with this feature:

   -

   You can enable EC at bucket level and cluster level.

How to enable it at bucket level? Just create the bucket by passing the ec
replication options.

   -

   You can create EC keys and read the same back.
   -

   You should be able to continue writing even when chosen nodes are
   failing. (Of Course minimum of Data+Parity live nodes should be available
   in cluster for complete the write)
   -

   You should be able to read the file back even if a few nodes failed in
   the same ec block group(Failures should not be more than parity number of
   nodes.).

What is pending? Offline recovery of lost/missing EC containers. As
mentioned above, post merge of this branch, I will create a separate JIRA
for starting the work for OfflineRecovery.


There are automated acceptance test cases already added. HDDS-6231

In addition to that, we have also performed basic Acceptance Testing in
physical cluster:

   1.

   Installed 10 nodes cluster and created EC bucket (3:2).

Uploaded 10GB key.

Downloaded the same key and checked the md5sum.

   1.

   Uploaded 8GB key.

Downloaded the same key and checked the md5sum.

   1.

   Uploaded 3MB key

Downloaded the same and verified md5sum.

   1.

   Changed bucket to (6:3)

Uploaded 8GB key

Download the same.

Also verified the new key should be in 6:3 policy and old keys must be
3:2.Verified
with several different size key writes and reads.



Since the merge discussion thread, we have well stabilized code and fixed
several bugs.


Merge checklist items assessment is here:
https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist

Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth
<pi...@cloudera.com> for great efforts in core development and also thanks
a lot  to Sammi, Mingchao Zhao, Mark Gui, Kaijie, Attila for collaborating
on some of the EC tasks.

Thanks to Marton for design discussion and on some dev tasks as well.

Thanks to many others who were involved in design discussions, Arpit, Sidd,
Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, Rakesh,
Yiqun Lin.
Sorry if I miss anyone here, but your efforts are much appreciated. Without
your tremendous help, we would have not reached this position yet.



To start with, here is my +1

The vote will run for 5 days.

Regards,
Uma



On Tue, Apr 5, 2022 at 10:58 PM Uma gangumalla <umamah...@apache.org> wrote:

> Dear Ozone Devs,
>
> As you may know, we have been actively developing Ozone Erasure Coding
> support in a separate branch HDDS-3816-ec.
>
> We have finished the development of EC key write and read functionality.
> The support of offline recovery( Recovering replica from node loss) will be
> part of second phase work.
>
> Since the code has already grown and increasingly started seeing merge
> complications, we would like to propose to merge the current EC branch into
> master.
>
> We filed the new JIRA(HDDS-6462) for the second phase of work and
> continued the offline recovery work there.
>
> Details on Changes:
>
>    -
>
>    Most of the EC core logic went to newly extended classes. Key changes
>    went into EC*OutputStream and EC*InputStream classes for write and read
>    respectively. Based on replication type, ECPipelineProvider will be chosen
>    for creating EC pipelines.
>
>
>
>    -
>
>    Since we cannot represent the EC replication in the existing
>    replication factor, we have introduced ECReplicationConfig. The
>    ReplicationConfig interface is already pushed to master, so it’s not a new
>    idea coming through this branch merge now. What is newly coming here is the
>    ECReplicationConfig class which can be used to express EC replication
>    configuration.
>
>
>
>    -
>
>    We wanted to provide the support to enable EC at bucket level. To
>    simplify some complications, we have moved the default replication
>    configurations from client to server.
>
>
>
>    -
>
>    Client side replication type and replication factor removed from the
>    configuration files and introduced the ozone.server.default.replication
>    and ozone.server.default.replication.type.We would continue to respect if
>    one configures at client side explicitly or passed through APIs, otherwise
>    server side bucket level properties or server side default configuration
>    would take effect.
>
>
>
>    -
>
>    Other than this change, the rest of EC side code should not impact any
>    of the existing code flows.
>
>
> We have finished documentation JIRA(HDDS-6172) for covering this feature
> and we will continue to improve further in master.
>
> Git Branch Name : HDDS-3816-ec
>
> JIRAs: HDDS-3816 and HDDS-5351
>
> Completed tasks: ~ 142
>
> + We are covering the following two mandatory JIRAs:
>
> 1. HDDS-6209: EC: [Forward compatibility issue] New client to older
> server could fail due to the unavailability for client default replication
> config
>
> 2. HDDS-5909: EC: Onboard EC into upgrade framework.
>
> PRs reviews in-progress and expected to close in a day or two.
>
> Few other JIRAs in HDDS-3816 are still open but I believe they're not
> blockers for merge.
>
> In short what you can do now with this feature:
>
>    -
>
>    You can enable EC at bucket level and cluster level.
>
> How to enable it at bucket level? Just create the bucket by passing the ec
> replication options.
>
>    -
>
>    You can create EC keys and read the same back.
>    -
>
>    You should be able to continue writing even when chosen nodes are
>    failing. (Of Course minimum of Data+Parity live nodes should be available
>    in cluster for complete the write)
>    -
>
>    You should be able to read the file back even if a few nodes failed in
>    the same ec block group(Failures should not be more than parity number of
>    nodes.).
>
> What is pending? Offline recovery of lost/missing EC containers. As
> mentioned above, post merge of this branch, I will create a separate JIRA
> for starting the work for OfflineRecovery.
>
>
> There are automated acceptance test cases already added. HDDS-6231
>
> In addition to that, we have also performed basic Acceptance Testing in
> physical cluster:
>
>    1.
>
>    Installed 10 nodes cluster and created EC bucket (3:2).
>
> Uploaded 10GB key.
>
> Downloaded the same key and checked the md5sum.
>
>    1.
>
>    Uploaded 8GB key.
>
> Downloaded the same key and checked the md5sum.
>
>    1.
>
>    Uploaded 3MB key
>
> Downloaded the same and verified md5sum.
>
>    1.
>
>    Changed bucket to (6:3)
>
> Uploaded 8GB key
>
> Download the same.
>
> Also verified the new key should be in 6:3 policy and old keys must be 
> 3:2.Verified
> with several different size key writes and reads.
>
> Merge checklist items assessment is here:
> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist
>
> Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth
> <pi...@cloudera.com> for great efforts in core development and also
> thanks a lot  to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating
> on some of the EC tasks.
>
> Thanks to Marton for design discussion and on some dev tasks as well.
>
> Thanks to many others who were involved in design discussions, Arpit,
> Sidd, Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth,
> Rakesh, Yiqun Lin.
> Sorry if I miss anyone here, but your efforts are much appreciated.
> Without your tremendous help, we would have not reached this position yet.
>
> If there are no objections for the merge, I will start the official vote
> later.
>
> Regards,
>
> EC Branch Devs
>

Reply via email to