Thanks Sidd for pointing that out. Yes I used ref doc from FSO, my bad some link left out I think. I will fix it. Thanks Arpit.
Others, please provide your feedback if any, otherwise we move to official vote soon after we track down the compat related issues. Regards, Uma On Thu, Feb 17, 2022 at 8:48 AM Siddharth Wagle <swa...@apache.org> wrote: > +1 to merge the EC feature with read/write and online recovery part > complete. > > Thanks, Uma for putting the wiki page with all of the details. > Minor nit: The documentation link points to the wrong markdown (FSO) > instead of the EC documentation. > > - Sid > > On Thu, Feb 17, 2022 at 8:29 AM Arpit Agarwal > <aagar...@cloudera.com.invalid> > wrote: > > > Thanks for the detailed explanation. > > > > +1 to merge with HDDS-6209 addressed. > > > > > > > On Feb 16, 2022, at 10:49 AM, Uma gangumalla <umamah...@apache.org> > > wrote: > > > > > > Thanks a lot Arpit for your feedback. > > > > > > [Arpit Wrote] - New client writing to old server with 3-way and 1-way > > > replication. > > > [Uma] As mentioned in the proposal mail, we have a forward > > > compatibility issue (HDDS-6209) as we have removed the client side > > default > > > configurations. One that is in, this should work. > > > We will make sure to get this in before merge. > > > > > > [Arpit Wrote] - Old client writing to new server in bucket without EC > > > policy [both 1-way and 3-way] > > > [Uma] Old client alway passed the replication configs. Irrespective of > > > bucket policy, we respect client passed replication config. so, this is > > > fine. > > > > > > [Arpit Wrote] - Old client writing to new server in bucket with EC > policy > > > [both 1-way and 3-way] > > > [Uma] As mentioned above, Old clients always passed non ec replication > > > options while creating keys. Even when a call comes to the EC policy > > > bucket, we allow non EC keys to be created on EC buckets. > > > > > > Also when a newer client writing EC option keys on an old server would > be > > > rejected. That should be covered as part of HDDS-6209. We are using a > > > server, client versioning mechanism to detect the old server which > cannot > > > support EC. > > > > > > @Pifta, you may want to add your thoughts if any? > > > > > > Regards, > > > Uma > > > > > > On Wed, Feb 16, 2022 at 8:23 AM Arpit Agarwal > > <aagar...@cloudera.com.invalid> > > > wrote: > > > > > >> Thanks Uma for starting this discussion. Excited to see EC support for > > >> Ozone coming together at last. > > >> > > >> We should verify the the compatibility matrix prior to merge: > > >> > > >> - New client writing to old server with 3-way and 1-way replication. > > >> - Old client writing to new server in bucket without EC policy [both > > 1-way > > >> and 3-way] > > >> - Old client writing to new server in bucket with EC policy [both > 1-way > > >> and 3-way] > > >> > > >> > > >> Arpit > > >> > > >> > > >>> On Feb 15, 2022, at 12:17 AM, Uma gangumalla <umamah...@apache.org> > > >> wrote: > > >>> > > >>> Dear Ozone Devs, > > >>> > > >>> As you may know, we have been actively developing Ozone Erasure > Coding > > >>> support in a separate branch HDDS-3816-ec. > > >>> > > >>> We have finished the development of EC key write and read > > functionality. > > >>> The support of offline recovery( Recovering replica from node loss) > > will > > >> be > > >>> part of second phase work. > > >>> > > >>> Since the code has already grown and increasingly started seeing > merge > > >>> complications, we would like to propose to merge the current EC > branch > > >> into > > >>> master. > > >>> > > >>> We will file the new JIRA for the second phase of work and continue > the > > >>> offline recovery work there. > > >>> > > >>> Details on Changes: > > >>> > > >>> - > > >>> > > >>> Most of the EC core logic went to newly extended classes. Key > changes > > >>> went into EC*OutputStream and EC*InputStream classes for write and > > read > > >>> respectively. Based on replication type, ECPipelineProvider will be > > >> chosen > > >>> for creating EC pipelines. > > >>> > > >>> > > >>> > > >>> - > > >>> > > >>> Since we cannot represent the EC replication in the existing > > >> replication > > >>> factor, we have introduced ECReplicationConfig. The > ReplicationConfig > > >>> interface is already pushed to master, so it’s not a new idea coming > > >>> through this branch merge now. What is newly coming here is the > > >>> ECReplicationConfig class which can be used to express EC > replication > > >>> configuration. > > >>> > > >>> > > >>> > > >>> - > > >>> > > >>> We wanted to provide the support to enable EC at bucket level. To > > >>> simplify some complications, we have moved the default replication > > >>> configurations from client to server. > > >>> > > >>> > > >>> > > >>> - > > >>> > > >>> Client side replication type and replication factor removed from the > > >>> configuration files and introduced the > > ozone.server.default.replication > > >>> and ozone.server.default.replication.type.We would continue to > respect > > >> if > > >>> one configures at client side explicitly or passed through APIs, > > >> otherwise > > >>> server side bucket level properties or server side default > > >> configuration > > >>> would take effect. > > >>> > > >>> > > >>> > > >>> - > > >>> > > >>> Other than this change, the rest of EC side code should not impact > any > > >>> of the existing code flows. > > >>> > > >>> > > >>> We have finished documentation JIRA(HDDS-6172) for covering this > > feature > > >>> and we will continue to improve further in master. > > >>> > > >>> JIRA: HDDS-3816 > > >>> > > >>> Completed tasks: ~ 90 > > >>> > > >>> We wanted to cover the following compatibility issue before the > merge: > > >>> > > >>> HDDS-6209: EC: [Forward compatibility issue] New client to older > server > > >>> could fail due to the unavailability for client default replication > > >> config > > >>> > > >>> Few other JIRAs in HDDS-3816 are still open but I believe they're not > > >>> blockers for merge. > > >>> > > >>> In short what you can do now with this feature: > > >>> > > >>> - > > >>> > > >>> You can enable EC at bucket level and cluster level. > > >>> > > >>> How to enable it at bucket level? Just create the bucket by passing > the > > >> ec > > >>> replication options. > > >>> > > >>> - > > >>> > > >>> You can create EC keys and read the same back. > > >>> - > > >>> > > >>> You should be able to continue writing even when chosen nodes are > > >>> failing. (Of Course minimum of Data+Parity live nodes should be > > >> available > > >>> in cluster for complete the write) > > >>> - > > >>> > > >>> You should be able to read the file back even if a few nodes failed > in > > >>> the same ec block group(Failures should not be more than parity > number > > >> of > > >>> nodes.). > > >>> > > >>> What is pending? Offline recovery of lost/missing EC containers. As > > >>> mentioned above, post merge of this branch, I will create a separate > > JIRA > > >>> for starting the work for OfflineRecovery. > > >>> > > >>> > > >>> There are automated acceptance test cases already added. HDDS-6231 > > >>> > > >>> In addition to that, we have also performed basic Acceptance Testing > in > > >>> physical cluster: > > >>> > > >>> 1. > > >>> > > >>> Installed 10 nodes cluster and created EC bucket (3:2). > > >>> > > >>> Uploaded 10GB key. > > >>> > > >>> Downloaded the same key and checked the md5sum. > > >>> > > >>> > > >>> 1. > > >>> > > >>> Uploaded 8GB key. > > >>> > > >>> Downloaded the same key and checked the md5sum. > > >>> > > >>> > > >>> 1. > > >>> > > >>> Uploaded 3MB key > > >>> > > >>> Downloaded the same and verified md5sum. > > >>> > > >>> > > >>> 1. > > >>> > > >>> Changed bucket to (6:3) > > >>> > > >>> Uploaded 8GB key > > >>> > > >>> Download the same. > > >>> > > >>> Also verified the new key should be in 6:3 policy and old keys must > be > > >> 3:2. > > >>> > > >>> > > >>> > > >>> 1. > > >>> > > >>> Verified with several different size key writes and reads. > > >>> > > >>> > > >>> Merge checklist items assessment is here: > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist > > >>> > > >>> Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan > > Fajth > > >>> <pi...@cloudera.com> for great efforts in core development and also > > >> thanks > > >>> a lot to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating on > > >> some > > >>> of the EC tasks. > > >>> > > >>> Thanks to Marton for design discussion and on some dev tasks as well. > > >>> > > >>> Thanks to many others who were involved in design discussions, Arpit, > > >> Sidd, > > >>> Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, > > >> Rakesh, > > >>> Yiqun Lin. > > >>> Sorry if I miss anyone here, but your efforts are much appreciated. > > >> Without > > >>> your tremendous help, we would have not reached this position yet. > > >>> > > >>> If there are no objections for the merge, I will start the official > > vote > > >>> later. > > >>> > > >>> Regards, > > >>> > > >>> EC Branch Devs > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > >> For additional commands, e-mail: dev-h...@ozone.apache.org > > >> > > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > For additional commands, e-mail: dev-h...@ozone.apache.org > > > > >