+1 Great works from Uma, Stephen, Istvan, Markton and all Ozone EC developers!
We'll keep helping fix BUGs of EC to make the EC feature more and more robust. And we are really looking forward to start detailed design and discussions about the Offline Recovery feature, which is very important for a productional EC implementation. Cheers! At 2022-02-15 16:17:55, "Uma gangumalla" <umamah...@apache.org> wrote: >Dear Ozone Devs, > >As you may know, we have been actively developing Ozone Erasure Coding >support in a separate branch HDDS-3816-ec. > >We have finished the development of EC key write and read functionality. >The support of offline recovery( Recovering replica from node loss) will be >part of second phase work. > >Since the code has already grown and increasingly started seeing merge >complications, we would like to propose to merge the current EC branch into >master. > >We will file the new JIRA for the second phase of work and continue the >offline recovery work there. > >Details on Changes: > > - > > Most of the EC core logic went to newly extended classes. Key changes > went into EC*OutputStream and EC*InputStream classes for write and read > respectively. Based on replication type, ECPipelineProvider will be chosen > for creating EC pipelines. > > > > - > > Since we cannot represent the EC replication in the existing replication > factor, we have introduced ECReplicationConfig. The ReplicationConfig > interface is already pushed to master, so it’s not a new idea coming > through this branch merge now. What is newly coming here is the > ECReplicationConfig class which can be used to express EC replication > configuration. > > > > - > > We wanted to provide the support to enable EC at bucket level. To > simplify some complications, we have moved the default replication > configurations from client to server. > > > > - > > Client side replication type and replication factor removed from the > configuration files and introduced the ozone.server.default.replication > and ozone.server.default.replication.type.We would continue to respect if > one configures at client side explicitly or passed through APIs, otherwise > server side bucket level properties or server side default configuration > would take effect. > > > > - > > Other than this change, the rest of EC side code should not impact any > of the existing code flows. > > >We have finished documentation JIRA(HDDS-6172) for covering this feature >and we will continue to improve further in master. > >JIRA: HDDS-3816 > >Completed tasks: ~ 90 > >We wanted to cover the following compatibility issue before the merge: > >HDDS-6209: EC: [Forward compatibility issue] New client to older server >could fail due to the unavailability for client default replication config > >Few other JIRAs in HDDS-3816 are still open but I believe they're not >blockers for merge. > >In short what you can do now with this feature: > > - > > You can enable EC at bucket level and cluster level. > >How to enable it at bucket level? Just create the bucket by passing the ec >replication options. > > - > > You can create EC keys and read the same back. > - > > You should be able to continue writing even when chosen nodes are > failing. (Of Course minimum of Data+Parity live nodes should be available > in cluster for complete the write) > - > > You should be able to read the file back even if a few nodes failed in > the same ec block group(Failures should not be more than parity number of > nodes.). > >What is pending? Offline recovery of lost/missing EC containers. As >mentioned above, post merge of this branch, I will create a separate JIRA >for starting the work for OfflineRecovery. > > >There are automated acceptance test cases already added. HDDS-6231 > >In addition to that, we have also performed basic Acceptance Testing in >physical cluster: > > 1. > > Installed 10 nodes cluster and created EC bucket (3:2). > >Uploaded 10GB key. > >Downloaded the same key and checked the md5sum. > > > 1. > > Uploaded 8GB key. > >Downloaded the same key and checked the md5sum. > > > 1. > > Uploaded 3MB key > >Downloaded the same and verified md5sum. > > > 1. > > Changed bucket to (6:3) > >Uploaded 8GB key > >Download the same. > >Also verified the new key should be in 6:3 policy and old keys must be 3:2. > > > > 1. > > Verified with several different size key writes and reads. > > >Merge checklist items assessment is here: >https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist > >Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth ><pi...@cloudera.com> for great efforts in core development and also thanks >a lot to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating on some >of the EC tasks. > >Thanks to Marton for design discussion and on some dev tasks as well. > >Thanks to many others who were involved in design discussions, Arpit, Sidd, >Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, Rakesh, >Yiqun Lin. >Sorry if I miss anyone here, but your efforts are much appreciated. Without >your tremendous help, we would have not reached this position yet. > >If there are no objections for the merge, I will start the official vote >later. > >Regards, > >EC Branch Devs