Thanks a lot Arpit for your feedback. [Arpit Wrote] - New client writing to old server with 3-way and 1-way replication. [Uma] As mentioned in the proposal mail, we have a forward compatibility issue (HDDS-6209) as we have removed the client side default configurations. One that is in, this should work. We will make sure to get this in before merge.
[Arpit Wrote] - Old client writing to new server in bucket without EC policy [both 1-way and 3-way] [Uma] Old client alway passed the replication configs. Irrespective of bucket policy, we respect client passed replication config. so, this is fine. [Arpit Wrote] - Old client writing to new server in bucket with EC policy [both 1-way and 3-way] [Uma] As mentioned above, Old clients always passed non ec replication options while creating keys. Even when a call comes to the EC policy bucket, we allow non EC keys to be created on EC buckets. Also when a newer client writing EC option keys on an old server would be rejected. That should be covered as part of HDDS-6209. We are using a server, client versioning mechanism to detect the old server which cannot support EC. @Pifta, you may want to add your thoughts if any? Regards, Uma On Wed, Feb 16, 2022 at 8:23 AM Arpit Agarwal <aagar...@cloudera.com.invalid> wrote: > Thanks Uma for starting this discussion. Excited to see EC support for > Ozone coming together at last. > > We should verify the the compatibility matrix prior to merge: > > - New client writing to old server with 3-way and 1-way replication. > - Old client writing to new server in bucket without EC policy [both 1-way > and 3-way] > - Old client writing to new server in bucket with EC policy [both 1-way > and 3-way] > > > Arpit > > > > On Feb 15, 2022, at 12:17 AM, Uma gangumalla <umamah...@apache.org> > wrote: > > > > Dear Ozone Devs, > > > > As you may know, we have been actively developing Ozone Erasure Coding > > support in a separate branch HDDS-3816-ec. > > > > We have finished the development of EC key write and read functionality. > > The support of offline recovery( Recovering replica from node loss) will > be > > part of second phase work. > > > > Since the code has already grown and increasingly started seeing merge > > complications, we would like to propose to merge the current EC branch > into > > master. > > > > We will file the new JIRA for the second phase of work and continue the > > offline recovery work there. > > > > Details on Changes: > > > > - > > > > Most of the EC core logic went to newly extended classes. Key changes > > went into EC*OutputStream and EC*InputStream classes for write and read > > respectively. Based on replication type, ECPipelineProvider will be > chosen > > for creating EC pipelines. > > > > > > > > - > > > > Since we cannot represent the EC replication in the existing > replication > > factor, we have introduced ECReplicationConfig. The ReplicationConfig > > interface is already pushed to master, so it’s not a new idea coming > > through this branch merge now. What is newly coming here is the > > ECReplicationConfig class which can be used to express EC replication > > configuration. > > > > > > > > - > > > > We wanted to provide the support to enable EC at bucket level. To > > simplify some complications, we have moved the default replication > > configurations from client to server. > > > > > > > > - > > > > Client side replication type and replication factor removed from the > > configuration files and introduced the ozone.server.default.replication > > and ozone.server.default.replication.type.We would continue to respect > if > > one configures at client side explicitly or passed through APIs, > otherwise > > server side bucket level properties or server side default > configuration > > would take effect. > > > > > > > > - > > > > Other than this change, the rest of EC side code should not impact any > > of the existing code flows. > > > > > > We have finished documentation JIRA(HDDS-6172) for covering this feature > > and we will continue to improve further in master. > > > > JIRA: HDDS-3816 > > > > Completed tasks: ~ 90 > > > > We wanted to cover the following compatibility issue before the merge: > > > > HDDS-6209: EC: [Forward compatibility issue] New client to older server > > could fail due to the unavailability for client default replication > config > > > > Few other JIRAs in HDDS-3816 are still open but I believe they're not > > blockers for merge. > > > > In short what you can do now with this feature: > > > > - > > > > You can enable EC at bucket level and cluster level. > > > > How to enable it at bucket level? Just create the bucket by passing the > ec > > replication options. > > > > - > > > > You can create EC keys and read the same back. > > - > > > > You should be able to continue writing even when chosen nodes are > > failing. (Of Course minimum of Data+Parity live nodes should be > available > > in cluster for complete the write) > > - > > > > You should be able to read the file back even if a few nodes failed in > > the same ec block group(Failures should not be more than parity number > of > > nodes.). > > > > What is pending? Offline recovery of lost/missing EC containers. As > > mentioned above, post merge of this branch, I will create a separate JIRA > > for starting the work for OfflineRecovery. > > > > > > There are automated acceptance test cases already added. HDDS-6231 > > > > In addition to that, we have also performed basic Acceptance Testing in > > physical cluster: > > > > 1. > > > > Installed 10 nodes cluster and created EC bucket (3:2). > > > > Uploaded 10GB key. > > > > Downloaded the same key and checked the md5sum. > > > > > > 1. > > > > Uploaded 8GB key. > > > > Downloaded the same key and checked the md5sum. > > > > > > 1. > > > > Uploaded 3MB key > > > > Downloaded the same and verified md5sum. > > > > > > 1. > > > > Changed bucket to (6:3) > > > > Uploaded 8GB key > > > > Download the same. > > > > Also verified the new key should be in 6:3 policy and old keys must be > 3:2. > > > > > > > > 1. > > > > Verified with several different size key writes and reads. > > > > > > Merge checklist items assessment is here: > > > https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist > > > > Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth > > <pi...@cloudera.com> for great efforts in core development and also > thanks > > a lot to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating on > some > > of the EC tasks. > > > > Thanks to Marton for design discussion and on some dev tasks as well. > > > > Thanks to many others who were involved in design discussions, Arpit, > Sidd, > > Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth, > Rakesh, > > Yiqun Lin. > > Sorry if I miss anyone here, but your efforts are much appreciated. > Without > > your tremendous help, we would have not reached this position yet. > > > > If there are no objections for the merge, I will start the official vote > > later. > > > > Regards, > > > > EC Branch Devs > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > For additional commands, e-mail: dev-h...@ozone.apache.org > >