Thanks a lot Arpit for your feedback.
[Arpit Wrote] - New client writing to old server with 3-way and 1-way
replication.
[Uma] As mentioned in the proposal mail, we have a forward
compatibility issue (HDDS-6209) as we have removed the client side default
configurations. One that is in, this should work.
We will make sure to get this in before merge.
[Arpit Wrote] - Old client writing to new server in bucket without EC
policy [both 1-way and 3-way]
[Uma] Old client alway passed the replication configs. Irrespective of
bucket policy, we respect client passed replication config. so, this is
fine.
[Arpit Wrote] - Old client writing to new server in bucket with EC policy
[both 1-way and 3-way]
[Uma] As mentioned above, Old clients always passed non ec replication
options while creating keys. Even when a call comes to the EC policy
bucket, we allow non EC keys to be created on EC buckets.
Also when a newer client writing EC option keys on an old server would be
rejected. That should be covered as part of HDDS-6209. We are using a
server, client versioning mechanism to detect the old server which cannot
support EC.
@Pifta, you may want to add your thoughts if any?
Regards,
Uma
On Wed, Feb 16, 2022 at 8:23 AM Arpit Agarwal <[email protected]>
wrote:
> Thanks Uma for starting this discussion. Excited to see EC support for
> Ozone coming together at last.
>
> We should verify the the compatibility matrix prior to merge:
>
> - New client writing to old server with 3-way and 1-way replication.
> - Old client writing to new server in bucket without EC policy [both 1-way
> and 3-way]
> - Old client writing to new server in bucket with EC policy [both 1-way
> and 3-way]
>
>
> Arpit
>
>
> > On Feb 15, 2022, at 12:17 AM, Uma gangumalla <[email protected]>
> wrote:
> >
> > Dear Ozone Devs,
> >
> > As you may know, we have been actively developing Ozone Erasure Coding
> > support in a separate branch HDDS-3816-ec.
> >
> > We have finished the development of EC key write and read functionality.
> > The support of offline recovery( Recovering replica from node loss) will
> be
> > part of second phase work.
> >
> > Since the code has already grown and increasingly started seeing merge
> > complications, we would like to propose to merge the current EC branch
> into
> > master.
> >
> > We will file the new JIRA for the second phase of work and continue the
> > offline recovery work there.
> >
> > Details on Changes:
> >
> > -
> >
> > Most of the EC core logic went to newly extended classes. Key changes
> > went into EC*OutputStream and EC*InputStream classes for write and read
> > respectively. Based on replication type, ECPipelineProvider will be
> chosen
> > for creating EC pipelines.
> >
> >
> >
> > -
> >
> > Since we cannot represent the EC replication in the existing
> replication
> > factor, we have introduced ECReplicationConfig. The ReplicationConfig
> > interface is already pushed to master, so it’s not a new idea coming
> > through this branch merge now. What is newly coming here is the
> > ECReplicationConfig class which can be used to express EC replication
> > configuration.
> >
> >
> >
> > -
> >
> > We wanted to provide the support to enable EC at bucket level. To
> > simplify some complications, we have moved the default replication
> > configurations from client to server.
> >
> >
> >
> > -
> >
> > Client side replication type and replication factor removed from the
> > configuration files and introduced the ozone.server.default.replication
> > and ozone.server.default.replication.type.We would continue to respect
> if
> > one configures at client side explicitly or passed through APIs,
> otherwise
> > server side bucket level properties or server side default
> configuration
> > would take effect.
> >
> >
> >
> > -
> >
> > Other than this change, the rest of EC side code should not impact any
> > of the existing code flows.
> >
> >
> > We have finished documentation JIRA(HDDS-6172) for covering this feature
> > and we will continue to improve further in master.
> >
> > JIRA: HDDS-3816
> >
> > Completed tasks: ~ 90
> >
> > We wanted to cover the following compatibility issue before the merge:
> >
> > HDDS-6209: EC: [Forward compatibility issue] New client to older server
> > could fail due to the unavailability for client default replication
> config
> >
> > Few other JIRAs in HDDS-3816 are still open but I believe they're not
> > blockers for merge.
> >
> > In short what you can do now with this feature:
> >
> > -
> >
> > You can enable EC at bucket level and cluster level.
> >
> > How to enable it at bucket level? Just create the bucket by passing the
> ec
> > replication options.
> >
> > -
> >
> > You can create EC keys and read the same back.
> > -
> >
> > You should be able to continue writing even when chosen nodes are
> > failing. (Of Course minimum of Data+Parity live nodes should be
> available
> > in cluster for complete the write)
> > -
> >
> > You should be able to read the file back even if a few nodes failed in
> > the same ec block group(Failures should not be more than parity number
> of
> > nodes.).
> >
> > What is pending? Offline recovery of lost/missing EC containers. As
> > mentioned above, post merge of this branch, I will create a separate JIRA
> > for starting the work for OfflineRecovery.
> >
> >
> > There are automated acceptance test cases already added. HDDS-6231
> >
> > In addition to that, we have also performed basic Acceptance Testing in
> > physical cluster:
> >
> > 1.
> >
> > Installed 10 nodes cluster and created EC bucket (3:2).
> >
> > Uploaded 10GB key.
> >
> > Downloaded the same key and checked the md5sum.
> >
> >
> > 1.
> >
> > Uploaded 8GB key.
> >
> > Downloaded the same key and checked the md5sum.
> >
> >
> > 1.
> >
> > Uploaded 3MB key
> >
> > Downloaded the same and verified md5sum.
> >
> >
> > 1.
> >
> > Changed bucket to (6:3)
> >
> > Uploaded 8GB key
> >
> > Download the same.
> >
> > Also verified the new key should be in 6:3 policy and old keys must be
> 3:2.
> >
> >
> >
> > 1.
> >
> > Verified with several different size key writes and reads.
> >
> >
> > Merge checklist items assessment is here:
> >
> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist
> >
> > Big shoutout to Stephen O'Donnell <[email protected]>, Istvan Fajth
> > <[email protected]> for great efforts in core development and also
> thanks
> > a lot to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating on
> some
> > of the EC tasks.
> >
> > Thanks to Marton for design discussion and on some dev tasks as well.
> >
> > Thanks to many others who were involved in design discussions, Arpit,
> Sidd,
> > Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth,
> Rakesh,
> > Yiqun Lin.
> > Sorry if I miss anyone here, but your efforts are much appreciated.
> Without
> > your tremendous help, we would have not reached this position yet.
> >
> > If there are no objections for the merge, I will start the official vote
> > later.
> >
> > Regards,
> >
> > EC Branch Devs
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>