Yeah, so for the issues we recently resolved on trunk and are addressing as 
follow-on tasks in Phase I, we would label them with "erasure coding" and maybe 
also set the target version as "2.9" for the convenience?

-----Original Message-----
From: Jing Zhao [mailto:ji...@apache.org] 
Sent: Tuesday, November 03, 2015 8:04 AM
To: hdfs-dev@hadoop.apache.org
Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 
(erasure coding) branch to trunk]

+1 for the plan about Phase I & II.

BTW, maybe out of the scope of this thread, just want to mention we should 
either move the jira under HDFS-8031 or update the jira component as 
"erasure-coding" when making further improvement or fixing bugs in EC. In this 
way it will be easier for later backporting EC to 2.9.

On Mon, Nov 2, 2015 at 3:48 PM, Vinayakumar B <vinayakumarb.apa...@gmail.com
> wrote:

> +1 for the idea.
> On Nov 3, 2015 07:22, "Zheng, Kai" <kai.zh...@intel.com> wrote:
>
> > Sounds good to me. When it's determined to include EC in 2.9 
> > release, it may be good to have a rough release date as Zhe asked, 
> > so accordingly the scope of EC can be discussed out. We still have 
> > quite a few of things as Phase I follow-on tasks to do before EC can 
> > be deployed in a production system. Phase II to develop non-striping 
> > EC for cold data would possibly
> be
> > started after that. We might consider to include only Phase I and 
> > leave Phase II for next release according to the rough release date.
> >
> > Regards,
> > Kai
> >
> > -----Original Message-----
> > From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
> > Sent: Tuesday, November 03, 2015 5:41 AM
> > To: hdfs-dev@hadoop.apache.org
> > Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge 
> > HDFS-7285 (erasure coding) branch to trunk]
> >
> > +1 for EC to go into 2.9. Yes, 3.x would be long way to go when we 
> > +plan to
> > have 2.8 and 2.9 releases.
> >
> > Regards,
> > Uma
> >
> > On 11/2/15, 11:49 AM, "Vinod Vavilapalli" <vino...@hortonworks.com>
> wrote:
> >
> > >Forking the thread. Started looking at the 2.8 list, various 
> > >features¹ status and arrived here.
> > >
> > >While I understand the pervasive nature of EC and a need for a 
> > >significant bake-in, moving this to a 3.x release is not a good idea.
> > >We will surely get a 2.8 out this year and, as needed, I can even 
> > >spend time getting started on a 2.9. OTOH, 3.x is long ways off, 
> > >and given all the incompatibilities there, it would be a while 
> > >before users can get their hands on EC if it were to be only on 
> > >3.x. At best, this may force sites that want EC to backport the 
> > >entire EC feature to older releases, at worst this will be repeat 
> > >the mess of 0.20 security release
> > forks.
> > >
> > >If we think adding this to 2.8 (even if it switched off) is too 
> > >much risk per our original plan, let¹s move this to 2.9, there by 
> > >leaving enough time for stability, integration testing and bake-in, 
> > >and a realistic chance of having it end up on users¹ clusters soonish.
> > >
> > >+Vinod
> > >
> > >> On Oct 19, 2015, at 1:44 PM, Andrew Wang 
> > >><andrew.w...@cloudera.com>
> > >>wrote:
> > >>
> > >> I think our plan thus far has been to target this for 3.0. I'm 
> > >>okay with  putting it in branch-2 if we've given a hard look at 
> > >>compatibility, but  I'll note though that 2.8 is already looking 
> > >>like quite a large release,  and our release bandwidth has been 
> > >>focused on the 2.6 and 2.7 maintenance  releases. Adding another 
> > >>multi-hundred JIRAs to 2.8 might make it too  unwieldy to get out 
> > >>the door. If we bump EC past that, 3.0 might very well  be our 
> > >>next release vehicle. I do plan to revive the 3.0 schedule some 
> > >>time  next year. With EC and
> > >>JDK8 in a good spot, the only big feature remaining  is classpath 
> > >>isolation.
> > >>
> > >> EC is also a pretty fundamental change to HDFS. Even if it's 
> > >>compatible, in  terms of size and impact it might best belong in a 
> > >>new major release.
> > >>
> > >> Best,
> > >> Andrew
> > >>
> > >> On Fri, Oct 16, 2015 at 7:04 PM, Vinayakumar B < 
> > >> vinayakumarb.apa...@gmail.com> wrote:
> > >>
> > >>> Is anyone else also thinks that feature is ready to goto 
> > >>>branch-2 as well?
> > >>>
> > >>> Its > 2 weeks EC landed on trunk. IMo Looks Its quite stable 
> > >>>since then and  ready to go in branch-2.
> > >>>
> > >>> -Vinay
> > >>> On Oct 6, 2015 12:51 AM, "Zhe Zhang" <zhezh...@cloudera.com> wrote:
> > >>>
> > >>>> Thanks Vinay for capturing the issue and Uma for offering the help.
> > >>>>
> > >>>> ---
> > >>>> Zhe Zhang
> > >>>>
> > >>>> On Mon, Oct 5, 2015 at 12:19 PM, Gangumalla, Uma <
> > >>> uma.ganguma...@intel.com
> > >>>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Vinay,
> > >>>>>
> > >>>>>
> > >>>>> I would merge them as part of HDFS-9182.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Uma
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 10/5/15, 12:48 AM, "Vinayakumar B" 
> > >>>>><vinayakum...@apache.org>
> > >>>>>wrote:
> > >>>>>
> > >>>>>> Hi Andrew,
> > >>>>>> I see CHANGES.txt entries not yet merged from
> > >>> CHANGES-HDFS-EC-7285.txt.
> > >>>>>>
> > >>>>>> Was this intentional?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Vinay
> > >>>>>>
> > >>>>>> On Wed, Sep 30, 2015 at 9:15 PM, Andrew Wang <
> > >>> andrew.w...@cloudera.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Branch has been merged to trunk, thanks again to everyone 
> > >>>>>>>who worked
> > >>>> on
> > >>>>>>> the
> > >>>>>>> feature!
> > >>>>>>>
> > >>>>>>> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang 
> > >>>>>>> <zhezh...@cloudera.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks everyone who has participated in this discussion.
> > >>>>>>>>
> > >>>>>>>> With 7 +1's (5 binding and 2 non-binding), and no -1, this 
> > >>>>>>>> vote
> > >>> has
> > >>>>>>> passed.
> > >>>>>>>> I will do a final 'git merge' with trunk and work with 
> > >>>>>>>> Andrew to
> > >>>> merge
> > >>>>>>> the
> > >>>>>>>> branch to trunk. I'll update on this thread when the merge 
> > >>>>>>>> is
> > >>> done.
> > >>>>>>>>
> > >>>>>>>> ---
> > >>>>>>>> Zhe Zhang
> > >>>>>>>>
> > >>>>>>>> On Thu, Sep 24, 2015 at 11:08 PM, Liu, Yi A 
> > >>>>>>>> <yi.a....@intel.com>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> (Change it to binding.)
> > >>>>>>>>>
> > >>>>>>>>> +1
> > >>>>>>>>> I have been involved in the development and code review on 
> > >>>>>>>>> the
> > >>>>>>> feature
> > >>>>>>>>> branch. It's a great feature and I think it's ready to 
> > >>>>>>>>> merge it
> > >>>> into
> > >>>>>>>> trunk.
> > >>>>>>>>>
> > >>>>>>>>> Thanks all for the contribution.
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Yi Liu
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Liu, Yi A
> > >>>>>>>>> Sent: Friday, September 25, 2015 1:51 PM
> > >>>>>>>>> To: hdfs-dev@hadoop.apache.org
> > >>>>>>>>> Subject: RE: [VOTE] Merge HDFS-7285 (erasure coding) 
> > >>>>>>>>> branch to
> > >>>> trunk
> > >>>>>>>>>
> > >>>>>>>>> +1 (non-binding)
> > >>>>>>>>> I have been involved in the development and code review on 
> > >>>>>>>>> the
> > >>>>>>> feature
> > >>>>>>>>> branch. It's a great feature and I think it's ready to 
> > >>>>>>>>> merge it
> > >>>> into
> > >>>>>>>> trunk.
> > >>>>>>>>>
> > >>>>>>>>> Thanks all for the contribution.
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Yi Liu
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Vinayakumar B [mailto:vinayakum...@apache.org]
> > >>>>>>>>> Sent: Friday, September 25, 2015 12:21 PM
> > >>>>>>>>> To: hdfs-dev@hadoop.apache.org
> > >>>>>>>>> Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) 
> > >>>>>>>>> branch to
> > >>>> trunk
> > >>>>>>>>>
> > >>>>>>>>> +1,
> > >>>>>>>>>
> > >>>>>>>>> I've been involved starting from design and development of
> > >>>>>>> ErasureCoding.
> > >>>>>>>>> I think phase 1 of this development is ready to be merged 
> > >>>>>>>>> to
> > >>>> trunk.
> > >>>>>>>>> It had come a long way to the current state with 
> > >>>>>>>>> significant
> > >>>> effort
> > >>>>>>> of
> > >>>>>>>>> many Contributors and Reviewers for both design and code.
> > >>>>>>>>>
> > >>>>>>>>> Thanks Everyone for the efforts.
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Vinay
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Sep 23, 2015 at 10:53 PM, Jing Zhao 
> > >>>>>>>>> <ji...@apache.org>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> +1
> > >>>>>>>>>>
> > >>>>>>>>>> I've been involved in both development and review on the
> > >>> branch,
> > >>>>>>> and
> > >>>>>>> I
> > >>>>>>>>>> believe it's now ready to get merged into trunk. Many 
> > >>>>>>>>>> thanks
> > >>> to
> > >>>>>>> all
> > >>>>>>>>>> the contributors and reviewers!
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> -Jing
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Sep 22, 2015 at 6:17 PM, Zheng, Kai <
> > >>>> kai.zh...@intel.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Non-binding +1
> > >>>>>>>>>>>
> > >>>>>>>>>>> According to our extensive performance tests, striping +
> > >>> ISA-L
> > >>>>>>> coder
> > >>>>>>>>>> based
> > >>>>>>>>>>> erasure coding not only can save storage, but also can
> > >>>> increase
> > >>>>>>> the
> > >>>>>>>>>>> throughput of a client or a cluster. It will be a great
> > >>>>>>> addition to
> > >>>>>>>>>>> HDFS and its users. Based on the latest branch codes, we
> > >>> also
> > >>>>>>>>>>> observed it's
> > >>>>>>>>>> very
> > >>>>>>>>>>> reliable in the concurrent tests. We'll provide the perf
> > >>> test
> > >>>>>>> report
> > >>>>>>>>>> after
> > >>>>>>>>>>> it's sorted out and hope it helps.
> > >>>>>>>>>>> Thanks!
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Kai
> > >>>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
> > >>>>>>>>>>> Sent: Wednesday, September 23, 2015 8:50 AM
> > >>>>>>>>>>> To: hdfs-dev@hadoop.apache.org;
> > >>> common-...@hadoop.apache.org
> > >>>>>>>>>>> Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) 
> > >>>>>>>>>>> branch
> > >>> to
> > >>>>>>> trunk
> > >>>>>>>>>>>
> > >>>>>>>>>>> +1
> > >>>>>>>>>>>
> > >>>>>>>>>>> Great addition to HDFS. Thanks all contributors for the 
> > >>>>>>>>>>> nice
> > >>>>>>> work.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Uma
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 9/22/15, 3:40 PM, "Zhe Zhang" <zhezh...@cloudera.com>
> > >>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'd like to propose a vote to merge the HDFS-7285 
> > >>>>>>>>>>>> feature
> > >>>>>>> branch
> > >>>>>>>>>>>> back to trunk. Since November 2014 we have been 
> > >>>>>>>>>>>> designing
> > >>> and
> > >>>>>>>>>>>> developing this feature under the umbrella JIRAs 
> > >>>>>>>>>>>> HDFS-7285
> > >>>> and
> > >>>>>>>>>>>> HADOOP-11264, and have committed approximately 210 patches.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The HDFS-7285 feature branch was created to support the
> > >>> first
> > >>>>>>> phase
> > >>>>>>>>>>>> of HDFS erasure coding (HDFS-EC). The objective of 
> > >>>>>>>>>>>> HDFS-EC
> > >>> is
> > >>>>>>> to
> > >>>>>>>>>>>> significantly reduce storage space usage in HDFS clusters.
> > >>>>>>> Instead
> > >>>>>>>>>>>> of always creating 3 replicas of each block with 200%
> > >>> storage
> > >>>>>>> space
> > >>>>>>>>>>>> overhead, HDFS-EC provides data durability through 
> > >>>>>>>>>>>> parity
> > >>>> data
> > >>>>>>>> blocks.
> > >>>>>>>>>>>> With most EC configurations, the storage overhead is no
> > >>> more
> > >>>>>>> than
> > >>>>>>>> 50%.
> > >>>>>>>>>>>> Based on profiling results of production clusters, we
> > >>> decided
> > >>>>>>> to
> > >>>>>>>>>>>> support EC with the striped block layout in the first
> > >>> phase,
> > >>>> so
> > >>>>>>>>>>>> that small files can be better handled. This means 
> > >>>>>>>>>>>> dividing
> > >>>>>>> each
> > >>>>>>>>>>>> logical HDFS file block into smaller units (striping 
> > >>>>>>>>>>>> cells)
> > >>>> and
> > >>>>>>>>>>>> spreading them on a set of DataNodes in round-robin
> > >>> fashion.
> > >>>>>>> Parity
> > >>>>>>>>>>>> cells are generated for each stripe of original data cells.
> > >>>> We
> > >>>>>>> have
> > >>>>>>>>>>>> made changes to NameNode, client, and DataNode to
> > >>> generalize
> > >>>>>>> the
> > >>>>>>>>>>>> block concept and handle the mapping between a logical 
> > >>>>>>>>>>>> file
> > >>>>>>> block
> > >>>>>>>>>>>> and its internal storage blocks. For further details 
> > >>>>>>>>>>>> please
> > >>>> see
> > >>>>>>> the
> > >>>>>>>>>>>> design doc on HDFS-7285.
> > >>>>>>>>>>>> HADOOP-11264 focuses on providing flexible and
> > >>>> high-performance
> > >>>>>>>>>>>> codec calculation support.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The nightly Jenkins job of the branch has reported 
> > >>>>>>>>>>>> several successful runs, and doesn't show new flaky 
> > >>>>>>>>>>>> tests compared
> > >>>> with
> > >>>>>>>>>>>> trunk. We have posted several versions of the test plan
> > >>>>>>> including
> > >>>>>>>>>>>> both unit testing and cluster testing, and have 
> > >>>>>>>>>>>> executed
> > >>> most
> > >>>>>>> tests
> > >>>>>>>>>>>> in the plan. The most basic functionalities have been
> > >>>>>>> extensively
> > >>>>>>>>>>>> tested and verified in several real clusters with 
> > >>>>>>>>>>>> different hardware configurations; results have been 
> > >>>>>>>>>>>> very stable. We
> > >>>> have
> > >>>>>>>>>>>> created follow-on tasks for more advanced error 
> > >>>>>>>>>>>> handling
> > >>> and
> > >>>>>>>>> optimization under the umbrella HDFS-8031.
> > >>>>>>>>>>>> We also plan to implement or harden the integration of 
> > >>>>>>>>>>>> EC
> > >>>> with
> > >>>>>>>>>>>> existing features such as WebHDFS, snapshot, append,
> > >>>> truncate,
> > >>>>>>>>>>>> hflush, hsync, and so forth.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Development of this feature has been a collaboration 
> > >>>>>>>>>>>> across
> > >>>>>>> many
> > >>>>>>>>>>>> companies and institutions. I'd like to thank J. 
> > >>>>>>>>>>>> Andreina,
> > >>>>>>> Takanobu
> > >>>>>>>>>>>> Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma
> > >>> Maheswara
> > >>>>>>> Rao
> > >>>>>>>>>>>> G, Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, 
> > >>>>>>>>>>>> Gao
> > >>>> Rui,
> > >>>>>>> Kai
> > >>>>>>>>>>>> Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, 
> > >>>>>>>>>>>> Yong
> > >>>>>>> Zhang,
> > >>>>>>>>>>>> Jing Zhao, Hui Zheng and Kai Zheng for their code
> > >>>> contributions
> > >>>>>>> and
> > >>>>>>>>> reviews.
> > >>>>>>>>>>>> Andrew and Kai Zheng also made fundamental 
> > >>>>>>>>>>>> contributions to
> > >>>> the
> > >>>>>>>>>>>> initial design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng 
> > >>>>>>>>>>>> and
> > >>>> many
> > >>>>>>>>>>>> other contributors have made great efforts in system
> > >>> testing.
> > >>>>>>> Many
> > >>>>>>>>>>>> thanks go to Weihua Jiang for proposing the JIRA, and 
> > >>>>>>>>>>>> ATM,
> > >>>> Todd
> > >>>>>>>>>>>> Lipcon, Silvius Rus, Suresh, as well as many others for
> > >>>>>>> providing
> > >>>>>>>>> helpful feedbacks.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Following the community convention, this vote will last
> > >>> for 7
> > >>>>>>> days
> > >>>>>>>>>>>> (ending September 29th). Votes from Hadoop committers 
> > >>>>>>>>>>>> are
> > >>>>>>> binding
> > >>>>>>>>>>>> but non-binding votes are very welcome as well. And 
> > >>>>>>>>>>>> here's
> > >>> my
> > >>>>>>>>>>>> non-binding
> > >>>>>>>>>> +1.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>> ---
> > >>>>>>>>>>>> Zhe Zhang
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >
> >
> >
>

Reply via email to