Great news! +1 to merge.
At 2022-04-06 22:18:31, "Stephen O'Donnell" <sodonn...@cloudera.com.INVALID> wrote: >I have been working on the code on this branch for some time, and I believe >it is in a good state to merge now. It is mostly new code, and if nothing >attempts to use EC, none of the EC code paths will be executed. > >+1 to merge from me. > >Stephen. > >On Wed, Apr 6, 2022 at 7:11 AM Uma gangumalla <umamah...@apache.org> wrote: > >> =====Few Edits Below=================== >> >> Dear Ozone Devs, >> >> As you may know, we have been actively developing Ozone Erasure Coding >> support in a separate branch HDDS-3816-ec. >> >> We have finished the development of EC key write and read functionality. >> The support of offline recovery( Recovering replica from node loss) will be >> part of second phase work. >> >> Since the code has already grown and increasingly started seeing merge >> complications, we would like to merge the current EC branch into master. >> >> We filed the new JIRA(HDDS-6462) for the second phase of work and continued >> the offline recovery work there. (we have uploaded the design doc there) >> >> Details on Changes: >> >> - >> >> Most of the EC core logic went to newly extended classes. Key changes >> went into EC*OutputStream and EC*InputStream classes for write and read >> respectively. Based on replication type, ECPipelineProvider will be >> chosen >> for creating EC pipelines. >> >> >> >> - >> >> Since we cannot represent the EC replication in the existing replication >> factor, we have introduced ECReplicationConfig. The ReplicationConfig >> interface is already pushed to master, so it’s not a new idea coming >> through this branch merge now. What is newly coming here is the >> ECReplicationConfig class which can be used to express EC replication >> configuration. >> >> >> >> - >> >> We wanted to provide the support to enable EC at bucket level. To >> simplify some complications, we have moved the default replication >> configurations from client to server. >> >> >> >> - >> >> Client side replication type and replication factor removed from the >> configuration files and introduced the ozone.server.default.replication >> and ozone.server.default.replication.type.We would continue to respect >> if >> one configures at client side explicitly or passed through APIs, >> otherwise >> server side bucket level properties or server side default configuration >> would take effect. >> >> >> >> - >> >> Other than this change, the rest of EC side code should not impact any >> of the existing code flows. >> >> >> We have finished documentation JIRA(HDDS-6172) for covering this feature >> and we will continue to improve further in master. >> >> Git Branch Name : HDDS-3816-ec >> >> JIRAs: HDDS-3816 and HDDS-5351 >> >> Completed tasks: ~ 142 >> >> + We are covering the following two mandatory JIRAs to come in: >> >> 1. HDDS-6209: EC: [Forward compatibility issue] New client to older server >> could fail due to the unavailability for client default replication config >> >> 2. HDDS-5909: EC: Onboard EC into upgrade framework. >> >> PRs reviews in-progress and expected to close in a day or two. >> >> Few other JIRAs in HDDS-3816 are still open but I believe they're not >> blockers for merge. >> >> In short what you can do now with this feature: >> >> - >> >> You can enable EC at bucket level and cluster level. >> >> How to enable it at bucket level? Just create the bucket by passing the ec >> replication options. >> >> - >> >> You can create EC keys and read the same back. >> - >> >> You should be able to continue writing even when chosen nodes are >> failing. (Of Course minimum of Data+Parity live nodes should be >> available >> in cluster for complete the write) >> - >> >> You should be able to read the file back even if a few nodes failed in >> the same ec block group(Failures should not be more than parity number >> of >> nodes.). >> >> What is pending? Offline recovery of lost/missing EC containers. As >> mentioned above, post merge of this branch, I will create a separate JIRA >> for starting the work for OfflineRecovery. >> >> >> There are automated acceptance test cases already added. HDDS-6231 >> >> In addition to that, we have also performed basic Acceptance Testing in >> physical cluster: >> >> 1. >> >> Installed 10 nodes cluster and created EC bucket (3:2). >> >> Uploaded 10GB key. >> >> Downloaded the same key and checked the md5sum. >> >> 1. >> >> Uploaded 8GB key. >> >> Downloaded the same key and checked the md5sum. >> >> 1. >> >> Uploaded 3MB key >> >> Downloaded the same and verified md5sum. >> >> 1. >> >> Changed bucket to (6:3) >> >> Uploaded 8GB key >> >> Download the same. >> >> Also verified the new key should be in 6:3 policy and old keys must be >> 3:2.Verified >> with several different size key writes and reads. >> >> >> >> Since the merge discussion thread, we have well stabilized code and fixed >> several bugs. >> >> >> Merge checklist items assessment is here: >> >> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist >> >> Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth >> <pi...@cloudera.com> for great efforts in core development and also thanks >> a lot to Sammi, Mingchao Zhao, Mark Gui, Kaijie, Attila for collaborating >> on some of the EC tasks. >> >> Thanks to Marton for design discussion and on some dev tasks as well. >> >> Thanks to many others who were involved in design discussions, Arpit, Sidd, >> Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, Rakesh, >> Yiqun Lin. >> Sorry if I miss anyone here, but your efforts are much appreciated. Without >> your tremendous help, we would have not reached this position yet. >> >> >> >> To start with, here is my +1 >> >> The vote will run for 5 days. >> >> Regards, >> Uma >> >> >> >> On Tue, Apr 5, 2022 at 10:58 PM Uma gangumalla <umamah...@apache.org> >> wrote: >> >> > Dear Ozone Devs, >> > >> > As you may know, we have been actively developing Ozone Erasure Coding >> > support in a separate branch HDDS-3816-ec. >> > >> > We have finished the development of EC key write and read functionality. >> > The support of offline recovery( Recovering replica from node loss) will >> be >> > part of second phase work. >> > >> > Since the code has already grown and increasingly started seeing merge >> > complications, we would like to propose to merge the current EC branch >> into >> > master. >> > >> > We filed the new JIRA(HDDS-6462) for the second phase of work and >> > continued the offline recovery work there. >> > >> > Details on Changes: >> > >> > - >> > >> > Most of the EC core logic went to newly extended classes. Key changes >> > went into EC*OutputStream and EC*InputStream classes for write and >> read >> > respectively. Based on replication type, ECPipelineProvider will be >> chosen >> > for creating EC pipelines. >> > >> > >> > >> > - >> > >> > Since we cannot represent the EC replication in the existing >> > replication factor, we have introduced ECReplicationConfig. The >> > ReplicationConfig interface is already pushed to master, so it’s not >> a new >> > idea coming through this branch merge now. What is newly coming here >> is the >> > ECReplicationConfig class which can be used to express EC replication >> > configuration. >> > >> > >> > >> > - >> > >> > We wanted to provide the support to enable EC at bucket level. To >> > simplify some complications, we have moved the default replication >> > configurations from client to server. >> > >> > >> > >> > - >> > >> > Client side replication type and replication factor removed from the >> > configuration files and introduced the >> ozone.server.default.replication >> > and ozone.server.default.replication.type.We would continue to >> respect if >> > one configures at client side explicitly or passed through APIs, >> otherwise >> > server side bucket level properties or server side default >> configuration >> > would take effect. >> > >> > >> > >> > - >> > >> > Other than this change, the rest of EC side code should not impact any >> > of the existing code flows. >> > >> > >> > We have finished documentation JIRA(HDDS-6172) for covering this feature >> > and we will continue to improve further in master. >> > >> > Git Branch Name : HDDS-3816-ec >> > >> > JIRAs: HDDS-3816 and HDDS-5351 >> > >> > Completed tasks: ~ 142 >> > >> > + We are covering the following two mandatory JIRAs: >> > >> > 1. HDDS-6209: EC: [Forward compatibility issue] New client to older >> > server could fail due to the unavailability for client default >> replication >> > config >> > >> > 2. HDDS-5909: EC: Onboard EC into upgrade framework. >> > >> > PRs reviews in-progress and expected to close in a day or two. >> > >> > Few other JIRAs in HDDS-3816 are still open but I believe they're not >> > blockers for merge. >> > >> > In short what you can do now with this feature: >> > >> > - >> > >> > You can enable EC at bucket level and cluster level. >> > >> > How to enable it at bucket level? Just create the bucket by passing the >> ec >> > replication options. >> > >> > - >> > >> > You can create EC keys and read the same back. >> > - >> > >> > You should be able to continue writing even when chosen nodes are >> > failing. (Of Course minimum of Data+Parity live nodes should be >> available >> > in cluster for complete the write) >> > - >> > >> > You should be able to read the file back even if a few nodes failed in >> > the same ec block group(Failures should not be more than parity >> number of >> > nodes.). >> > >> > What is pending? Offline recovery of lost/missing EC containers. As >> > mentioned above, post merge of this branch, I will create a separate JIRA >> > for starting the work for OfflineRecovery. >> > >> > >> > There are automated acceptance test cases already added. HDDS-6231 >> > >> > In addition to that, we have also performed basic Acceptance Testing in >> > physical cluster: >> > >> > 1. >> > >> > Installed 10 nodes cluster and created EC bucket (3:2). >> > >> > Uploaded 10GB key. >> > >> > Downloaded the same key and checked the md5sum. >> > >> > 1. >> > >> > Uploaded 8GB key. >> > >> > Downloaded the same key and checked the md5sum. >> > >> > 1. >> > >> > Uploaded 3MB key >> > >> > Downloaded the same and verified md5sum. >> > >> > 1. >> > >> > Changed bucket to (6:3) >> > >> > Uploaded 8GB key >> > >> > Download the same. >> > >> > Also verified the new key should be in 6:3 policy and old keys must be >> 3:2.Verified >> > with several different size key writes and reads. >> > >> > Merge checklist items assessment is here: >> > >> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist >> > >> > Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth >> > <pi...@cloudera.com> for great efforts in core development and also >> > thanks a lot to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating >> > on some of the EC tasks. >> > >> > Thanks to Marton for design discussion and on some dev tasks as well. >> > >> > Thanks to many others who were involved in design discussions, Arpit, >> > Sidd, Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, >> > Rakesh, Yiqun Lin. >> > Sorry if I miss anyone here, but your efforts are much appreciated. >> > Without your tremendous help, we would have not reached this position >> yet. >> > >> > If there are no objections for the merge, I will start the official vote >> > later. >> > >> > Regards, >> > >> > EC Branch Devs >> > >>