I have been working on the code on this branch for some time, and I believe
it is in a good state to merge now. It is mostly new code, and if nothing
attempts to use EC, none of the EC code paths will be executed.

+1 to merge from me.

Stephen.

On Wed, Apr 6, 2022 at 7:11 AM Uma gangumalla <umamah...@apache.org> wrote:

> =====Few Edits Below===================
>
> Dear Ozone Devs,
>
> As you may know, we have been actively developing Ozone Erasure Coding
> support in a separate branch HDDS-3816-ec.
>
> We have finished the development of EC key write and read functionality.
> The support of offline recovery( Recovering replica from node loss) will be
> part of second phase work.
>
> Since the code has already grown and increasingly started seeing merge
> complications, we would like to merge the current EC branch into master.
>
> We filed the new JIRA(HDDS-6462) for the second phase of work and continued
> the offline recovery work there. (we have uploaded the design doc there)
>
> Details on Changes:
>
>    -
>
>    Most of the EC core logic went to newly extended classes. Key changes
>    went into EC*OutputStream and EC*InputStream classes for write and read
>    respectively. Based on replication type, ECPipelineProvider will be
> chosen
>    for creating EC pipelines.
>
>
>
>    -
>
>    Since we cannot represent the EC replication in the existing replication
>    factor, we have introduced ECReplicationConfig. The ReplicationConfig
>    interface is already pushed to master, so it’s not a new idea coming
>    through this branch merge now. What is newly coming here is the
>    ECReplicationConfig class which can be used to express EC replication
>    configuration.
>
>
>
>    -
>
>    We wanted to provide the support to enable EC at bucket level. To
>    simplify some complications, we have moved the default replication
>    configurations from client to server.
>
>
>
>    -
>
>    Client side replication type and replication factor removed from the
>    configuration files and introduced the ozone.server.default.replication
>    and ozone.server.default.replication.type.We would continue to respect
> if
>    one configures at client side explicitly or passed through APIs,
> otherwise
>    server side bucket level properties or server side default configuration
>    would take effect.
>
>
>
>    -
>
>    Other than this change, the rest of EC side code should not impact any
>    of the existing code flows.
>
>
> We have finished documentation JIRA(HDDS-6172) for covering this feature
> and we will continue to improve further in master.
>
> Git Branch Name : HDDS-3816-ec
>
> JIRAs: HDDS-3816 and HDDS-5351
>
> Completed tasks: ~ 142
>
> + We are covering the following two mandatory JIRAs to come in:
>
> 1. HDDS-6209: EC: [Forward compatibility issue] New client to older server
> could fail due to the unavailability for client default replication config
>
> 2. HDDS-5909: EC: Onboard EC into upgrade framework.
>
> PRs reviews in-progress and expected to close in a day or two.
>
> Few other JIRAs in HDDS-3816 are still open but I believe they're not
> blockers for merge.
>
> In short what you can do now with this feature:
>
>    -
>
>    You can enable EC at bucket level and cluster level.
>
> How to enable it at bucket level? Just create the bucket by passing the ec
> replication options.
>
>    -
>
>    You can create EC keys and read the same back.
>    -
>
>    You should be able to continue writing even when chosen nodes are
>    failing. (Of Course minimum of Data+Parity live nodes should be
> available
>    in cluster for complete the write)
>    -
>
>    You should be able to read the file back even if a few nodes failed in
>    the same ec block group(Failures should not be more than parity number
> of
>    nodes.).
>
> What is pending? Offline recovery of lost/missing EC containers. As
> mentioned above, post merge of this branch, I will create a separate JIRA
> for starting the work for OfflineRecovery.
>
>
> There are automated acceptance test cases already added. HDDS-6231
>
> In addition to that, we have also performed basic Acceptance Testing in
> physical cluster:
>
>    1.
>
>    Installed 10 nodes cluster and created EC bucket (3:2).
>
> Uploaded 10GB key.
>
> Downloaded the same key and checked the md5sum.
>
>    1.
>
>    Uploaded 8GB key.
>
> Downloaded the same key and checked the md5sum.
>
>    1.
>
>    Uploaded 3MB key
>
> Downloaded the same and verified md5sum.
>
>    1.
>
>    Changed bucket to (6:3)
>
> Uploaded 8GB key
>
> Download the same.
>
> Also verified the new key should be in 6:3 policy and old keys must be
> 3:2.Verified
> with several different size key writes and reads.
>
>
>
> Since the merge discussion thread, we have well stabilized code and fixed
> several bugs.
>
>
> Merge checklist items assessment is here:
>
> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist
>
> Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth
> <pi...@cloudera.com> for great efforts in core development and also thanks
> a lot  to Sammi, Mingchao Zhao, Mark Gui, Kaijie, Attila for collaborating
> on some of the EC tasks.
>
> Thanks to Marton for design discussion and on some dev tasks as well.
>
> Thanks to many others who were involved in design discussions, Arpit, Sidd,
> Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, Rakesh,
> Yiqun Lin.
> Sorry if I miss anyone here, but your efforts are much appreciated. Without
> your tremendous help, we would have not reached this position yet.
>
>
>
> To start with, here is my +1
>
> The vote will run for 5 days.
>
> Regards,
> Uma
>
>
>
> On Tue, Apr 5, 2022 at 10:58 PM Uma gangumalla <umamah...@apache.org>
> wrote:
>
> > Dear Ozone Devs,
> >
> > As you may know, we have been actively developing Ozone Erasure Coding
> > support in a separate branch HDDS-3816-ec.
> >
> > We have finished the development of EC key write and read functionality.
> > The support of offline recovery( Recovering replica from node loss) will
> be
> > part of second phase work.
> >
> > Since the code has already grown and increasingly started seeing merge
> > complications, we would like to propose to merge the current EC branch
> into
> > master.
> >
> > We filed the new JIRA(HDDS-6462) for the second phase of work and
> > continued the offline recovery work there.
> >
> > Details on Changes:
> >
> >    -
> >
> >    Most of the EC core logic went to newly extended classes. Key changes
> >    went into EC*OutputStream and EC*InputStream classes for write and
> read
> >    respectively. Based on replication type, ECPipelineProvider will be
> chosen
> >    for creating EC pipelines.
> >
> >
> >
> >    -
> >
> >    Since we cannot represent the EC replication in the existing
> >    replication factor, we have introduced ECReplicationConfig. The
> >    ReplicationConfig interface is already pushed to master, so it’s not
> a new
> >    idea coming through this branch merge now. What is newly coming here
> is the
> >    ECReplicationConfig class which can be used to express EC replication
> >    configuration.
> >
> >
> >
> >    -
> >
> >    We wanted to provide the support to enable EC at bucket level. To
> >    simplify some complications, we have moved the default replication
> >    configurations from client to server.
> >
> >
> >
> >    -
> >
> >    Client side replication type and replication factor removed from the
> >    configuration files and introduced the
> ozone.server.default.replication
> >    and ozone.server.default.replication.type.We would continue to
> respect if
> >    one configures at client side explicitly or passed through APIs,
> otherwise
> >    server side bucket level properties or server side default
> configuration
> >    would take effect.
> >
> >
> >
> >    -
> >
> >    Other than this change, the rest of EC side code should not impact any
> >    of the existing code flows.
> >
> >
> > We have finished documentation JIRA(HDDS-6172) for covering this feature
> > and we will continue to improve further in master.
> >
> > Git Branch Name : HDDS-3816-ec
> >
> > JIRAs: HDDS-3816 and HDDS-5351
> >
> > Completed tasks: ~ 142
> >
> > + We are covering the following two mandatory JIRAs:
> >
> > 1. HDDS-6209: EC: [Forward compatibility issue] New client to older
> > server could fail due to the unavailability for client default
> replication
> > config
> >
> > 2. HDDS-5909: EC: Onboard EC into upgrade framework.
> >
> > PRs reviews in-progress and expected to close in a day or two.
> >
> > Few other JIRAs in HDDS-3816 are still open but I believe they're not
> > blockers for merge.
> >
> > In short what you can do now with this feature:
> >
> >    -
> >
> >    You can enable EC at bucket level and cluster level.
> >
> > How to enable it at bucket level? Just create the bucket by passing the
> ec
> > replication options.
> >
> >    -
> >
> >    You can create EC keys and read the same back.
> >    -
> >
> >    You should be able to continue writing even when chosen nodes are
> >    failing. (Of Course minimum of Data+Parity live nodes should be
> available
> >    in cluster for complete the write)
> >    -
> >
> >    You should be able to read the file back even if a few nodes failed in
> >    the same ec block group(Failures should not be more than parity
> number of
> >    nodes.).
> >
> > What is pending? Offline recovery of lost/missing EC containers. As
> > mentioned above, post merge of this branch, I will create a separate JIRA
> > for starting the work for OfflineRecovery.
> >
> >
> > There are automated acceptance test cases already added. HDDS-6231
> >
> > In addition to that, we have also performed basic Acceptance Testing in
> > physical cluster:
> >
> >    1.
> >
> >    Installed 10 nodes cluster and created EC bucket (3:2).
> >
> > Uploaded 10GB key.
> >
> > Downloaded the same key and checked the md5sum.
> >
> >    1.
> >
> >    Uploaded 8GB key.
> >
> > Downloaded the same key and checked the md5sum.
> >
> >    1.
> >
> >    Uploaded 3MB key
> >
> > Downloaded the same and verified md5sum.
> >
> >    1.
> >
> >    Changed bucket to (6:3)
> >
> > Uploaded 8GB key
> >
> > Download the same.
> >
> > Also verified the new key should be in 6:3 policy and old keys must be
> 3:2.Verified
> > with several different size key writes and reads.
> >
> > Merge checklist items assessment is here:
> >
> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist
> >
> > Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth
> > <pi...@cloudera.com> for great efforts in core development and also
> > thanks a lot  to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating
> > on some of the EC tasks.
> >
> > Thanks to Marton for design discussion and on some dev tasks as well.
> >
> > Thanks to many others who were involved in design discussions, Arpit,
> > Sidd, Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth,
> > Rakesh, Yiqun Lin.
> > Sorry if I miss anyone here, but your efforts are much appreciated.
> > Without your tremendous help, we would have not reached this position
> yet.
> >
> > If there are no objections for the merge, I will start the official vote
> > later.
> >
> > Regards,
> >
> > EC Branch Devs
> >
>

Reply via email to