Thanks for the detailed explanation.

+1 to merge with HDDS-6209 addressed.


> On Feb 16, 2022, at 10:49 AM, Uma gangumalla <umamah...@apache.org> wrote:
> 
> Thanks a lot Arpit for your feedback.
> 
> [Arpit Wrote] - New client writing to old server with 3-way and 1-way
> replication.
> [Uma] As mentioned in the proposal mail, we have a forward
> compatibility issue (HDDS-6209) as we have removed the client side default
> configurations. One that is in, this should work.
>          We will make sure to get this in before merge.
> 
> [Arpit Wrote] - Old client writing to new server in bucket without EC
> policy [both 1-way and 3-way]
> [Uma] Old client alway passed the replication configs. Irrespective of
> bucket policy, we respect client passed replication config. so, this is
> fine.
> 
> [Arpit Wrote] - Old client writing to new server in bucket with EC policy
> [both 1-way and 3-way]
> [Uma] As mentioned above, Old clients always passed non ec replication
> options while creating keys. Even when a call comes to the EC policy
> bucket, we allow non EC keys to be created on EC buckets.
> 
> Also when a newer client writing EC option keys on an old server would be
> rejected. That should be covered as part of HDDS-6209. We are using a
> server, client versioning mechanism to detect the old server which cannot
> support EC.
> 
> @Pifta, you may want to add your thoughts if any?
> 
> Regards,
> Uma
> 
> On Wed, Feb 16, 2022 at 8:23 AM Arpit Agarwal <aagar...@cloudera.com.invalid>
> wrote:
> 
>> Thanks Uma for starting this discussion. Excited to see EC support for
>> Ozone coming together at last.
>> 
>> We should verify the the compatibility matrix prior to merge:
>> 
>> - New client writing to old server with 3-way and 1-way replication.
>> - Old client writing to new server in bucket without EC policy [both 1-way
>> and 3-way]
>> - Old client writing to new server in bucket with EC policy [both 1-way
>> and 3-way]
>> 
>> 
>> Arpit
>> 
>> 
>>> On Feb 15, 2022, at 12:17 AM, Uma gangumalla <umamah...@apache.org>
>> wrote:
>>> 
>>> Dear Ozone Devs,
>>> 
>>> As you may know, we have been actively developing Ozone Erasure Coding
>>> support in a separate branch HDDS-3816-ec.
>>> 
>>> We have finished the development of EC key write and read functionality.
>>> The support of offline recovery( Recovering replica from node loss) will
>> be
>>> part of second phase work.
>>> 
>>> Since the code has already grown and increasingly started seeing merge
>>> complications, we would like to propose to merge the current EC branch
>> into
>>> master.
>>> 
>>> We will file the new JIRA for the second phase of work and continue the
>>> offline recovery work there.
>>> 
>>> Details on Changes:
>>> 
>>>  -
>>> 
>>>  Most of the EC core logic went to newly extended classes. Key changes
>>>  went into EC*OutputStream and EC*InputStream classes for write and read
>>>  respectively. Based on replication type, ECPipelineProvider will be
>> chosen
>>>  for creating EC pipelines.
>>> 
>>> 
>>> 
>>>  -
>>> 
>>>  Since we cannot represent the EC replication in the existing
>> replication
>>>  factor, we have introduced ECReplicationConfig. The ReplicationConfig
>>>  interface is already pushed to master, so it’s not a new idea coming
>>>  through this branch merge now. What is newly coming here is the
>>>  ECReplicationConfig class which can be used to express EC replication
>>>  configuration.
>>> 
>>> 
>>> 
>>>  -
>>> 
>>>  We wanted to provide the support to enable EC at bucket level. To
>>>  simplify some complications, we have moved the default replication
>>>  configurations from client to server.
>>> 
>>> 
>>> 
>>>  -
>>> 
>>>  Client side replication type and replication factor removed from the
>>>  configuration files and introduced the ozone.server.default.replication
>>>  and ozone.server.default.replication.type.We would continue to respect
>> if
>>>  one configures at client side explicitly or passed through APIs,
>> otherwise
>>>  server side bucket level properties or server side default
>> configuration
>>>  would take effect.
>>> 
>>> 
>>> 
>>>  -
>>> 
>>>  Other than this change, the rest of EC side code should not impact any
>>>  of the existing code flows.
>>> 
>>> 
>>> We have finished documentation JIRA(HDDS-6172) for covering this feature
>>> and we will continue to improve further in master.
>>> 
>>> JIRA: HDDS-3816
>>> 
>>> Completed tasks: ~ 90
>>> 
>>> We wanted to cover the following compatibility issue before the merge:
>>> 
>>> HDDS-6209: EC: [Forward compatibility issue] New client to older server
>>> could fail due to the unavailability for client default replication
>> config
>>> 
>>> Few other JIRAs in HDDS-3816 are still open but I believe they're not
>>> blockers for merge.
>>> 
>>> In short what you can do now with this feature:
>>> 
>>>  -
>>> 
>>>  You can enable EC at bucket level and cluster level.
>>> 
>>> How to enable it at bucket level? Just create the bucket by passing the
>> ec
>>> replication options.
>>> 
>>>  -
>>> 
>>>  You can create EC keys and read the same back.
>>>  -
>>> 
>>>  You should be able to continue writing even when chosen nodes are
>>>  failing. (Of Course minimum of Data+Parity live nodes should be
>> available
>>>  in cluster for complete the write)
>>>  -
>>> 
>>>  You should be able to read the file back even if a few nodes failed in
>>>  the same ec block group(Failures should not be more than parity number
>> of
>>>  nodes.).
>>> 
>>> What is pending? Offline recovery of lost/missing EC containers. As
>>> mentioned above, post merge of this branch, I will create a separate JIRA
>>> for starting the work for OfflineRecovery.
>>> 
>>> 
>>> There are automated acceptance test cases already added. HDDS-6231
>>> 
>>> In addition to that, we have also performed basic Acceptance Testing in
>>> physical cluster:
>>> 
>>>  1.
>>> 
>>>  Installed 10 nodes cluster and created EC bucket (3:2).
>>> 
>>> Uploaded 10GB key.
>>> 
>>> Downloaded the same key and checked the md5sum.
>>> 
>>> 
>>>  1.
>>> 
>>>  Uploaded 8GB key.
>>> 
>>> Downloaded the same key and checked the md5sum.
>>> 
>>> 
>>>  1.
>>> 
>>>  Uploaded 3MB key
>>> 
>>> Downloaded the same and verified md5sum.
>>> 
>>> 
>>>  1.
>>> 
>>>  Changed bucket to (6:3)
>>> 
>>> Uploaded 8GB key
>>> 
>>> Download the same.
>>> 
>>> Also verified the new key should be in 6:3 policy and old keys must be
>> 3:2.
>>> 
>>> 
>>> 
>>>  1.
>>> 
>>>  Verified with several different size key writes and reads.
>>> 
>>> 
>>> Merge checklist items assessment is here:
>>> 
>> https://cwiki.apache.org/confluence/display/OZONE/Ozone+EC+Branch%28HDDS-3816-ec%29+Phase-1+%3A+Merge+Checklist
>>> 
>>> Big shoutout to Stephen O'Donnell <sodonn...@cloudera.com>, Istvan Fajth
>>> <pi...@cloudera.com> for great efforts in core development and also
>> thanks
>>> a lot  to Sammi, Mingchao Zhao, Mark Gui, Kaijie for collaborating on
>> some
>>> of the EC tasks.
>>> 
>>> Thanks to Marton for design discussion and on some dev tasks as well.
>>> 
>>> Thanks to many others who were involved in design discussions, Arpit,
>> Sidd,
>>> Jitendra, Mukul, Sanjay, Karthik, Bharat, Nanda, Shashi, Prashanth,
>> Rakesh,
>>> Yiqun Lin.
>>> Sorry if I miss anyone here, but your efforts are much appreciated.
>> Without
>>> your tremendous help, we would have not reached this position yet.
>>> 
>>> If there are no objections for the merge, I will start the official vote
>>> later.
>>> 
>>> Regards,
>>> 
>>> EC Branch Devs
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
>> For additional commands, e-mail: dev-h...@ozone.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
For additional commands, e-mail: dev-h...@ozone.apache.org

Reply via email to