Re: [Proposal]New storage project: HBlock

Ted Dunning Wed, 25 Mar 2020 09:00:21 -0700

Three things are very clear to me:

1) having an open source iSCSI implementation from a mature and experienced
storage stream is a very cool thing, especially if it can be targeted to
non HDFS storage relatively easily. Building such a thing requires very
high levels of experience and expertise that have generally been lacking in
the open source world.


2) this team is very naive about the negative impacts that Apache processes
will have on their development speed and will need lots of mentoring. Given
their release schedule, I think that there are symmetrical risks, first
that the team will be tempted to JFDI when getting features out the door
rather than communicate and share designs and second that if they build a
proper community overcoming language, timezone and large internal team
dynamics that the internal political costs will severe due to slower
development.

3) this team is very enthusiastic about making open source work and that
might be enough to allow them to succeed in spite of the difficulties.

The path to success here is, in my opinion, to require strong and engaged
mentorship and make it very clear before they come in that Apache may not
be a good fit due to the pressures they face to delivery on a schedule. If
incubation with a high risk of exit back to a non-Apache form is acceptable
to the project team, then it should be fine for Apache.



On Mon, Mar 9, 2020 at 7:45 PM Sheng Wu <wu.sheng.841...@gmail.com> wrote:

> Hi
>
> Personally, and basically, I am feeling the team has misunderstood
> the meaning of incubator and the requirements of building the community.
> Same as the last time discussion, I still think they will be in a big
> pressure as they have to deal with the basic feature development, community
> build and following ASF incubator requirements at the same time if they are
> accepted into the incubator. And at the same time, the team lacks the
> experiences of open source community in or out of ASF.
> I am not sure whether this is good for the project. Seem like a little
> hurry to join the incubator.
> More Comments inline.
>
> Willing to listen to what other IPMCs think.
>
> <zhangguoc...@chinatelecom.cn> 于2020年3月10日周二 上午10:21写道：
>
> > Hi, All,
> >
> > We are China Telecom Corporation Limited Cloud Computing Branch
> > Corporation.
> > We hope to contribute one of our projects named 'HBlock' to Apache.
> > Here is the proposal of HBlock project, please feel free to let me know
> > what
> > the concerns and suggestions from you. Thank you so much.
> >
> > HBlock Proposal
> >
> > 1.Abstract
> > The HBlock project will be an enterprise distributed block storage.
> >
> > 2.Proposal
> > HBlock provides a distributed block storage with the following features:
> > 2.1.User-space iSCSI target: HBlock will implement an iSCSI target that
> is
> > RFC-7143 (https://tools.ietf.org/html/rfc7143) compliant written in pure
> > Java designed to run on top of any mainstream Operating System, including
> > Windows and Linux, as a user-space process.
> > 2.2.Enterprise level features: HBlock will implement comprehensive
> > enterprise level features, such as
> > Asymmetric Logical Unit Access (ALUA, Information technology -SCSI
> Primary
> > Commands - 4 (SPC-4),
> https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf),
> >
> > Persistent Reservations (PR, Information technology -SCSI Primary
> Commands
> > -
> > 4 (SPC-4), https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf),
> > VMware vSphere Storage APIs - Array Integration(VAAI,
> >
> >
> https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-int
> > egration-10337.html
> > <
> https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-integration-10337.html
> >
> > ),
> > Offloaded Data Transfer(ODX,
> >
> >
> https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-se
> > rver-2012-R2-and-2012/hh831628(v=ws.11)
> > <
> https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/hh831628(v=ws.11)
> >),
> > so that it will support
> > session-level fail-over,
> > Oracle Real Application Cluster(Oracle RAC,
> > https://www.oracle.com/database/technologies/rac.html) ,
> > Cluster File System (CFS), VMware cluster and Windows cluster.
> > 2.3.Low latency: HBlock will implement in-memory distributed cache to
> > reduce
> > write latency and improve Input / Output Operations Per Second (IOPS),
> and
> > it will leverage storage-class memory to archive even higher durability
> > without IOPS loss.
> > 2.4.Smart Compaction and Garbage Collection(GC): HBlock will convert all
> > the
> > write operations into sequential append operations to improve the random
> > write performance, and it will choose the best timing to compact and
> > collect
> > the garbage per Logic Unit (LU). Comparting to Solid State Drives (SSD's)
> > internal Garbage Collection, such a global GC will reduce the need of
> SSD's
> > internal GC, which indirectly make SSD have more usable space, and have
> > even
> > better GC strategy due to close to application. In essence, flash writes
> > data in block (32MB) order. In order to realize random write, SSD disk
> will
> > reserve a part of space for GC in the disk. Therefore, the more random
> > write
> > and delete, the more space needs to be reserved. HDFS based writes are
> > sequential for SSD, so the space reserved in SSD is small. In short, as
> > long
> > as there is a GC, there must be reserved space, either in the HBlock
> layer
> > or in the controller layer inside the SSD. Because HBlock is closer to
> LU,
> > it can be more efficient GC. For example, a LU dedicated to video
> > monitoring
> > data basically writes video data in sequence, and starts writing again
> when
> > the disk is full. This LU does not need any GC at all. If you do GC in
> the
> > SSD layer, SSD will see the data of various LUs, and unnecessary movement
> > will be made to the LU dedicated for video monitoring.
> > 2.5.Hadoop Distributed File System (HDFS)-based: HBlock leverages HDFS a
> as
> > persistent layer to avoid reinventing wheels. The iSCSI target will run
> on
> > the client side of HDFS and directly read or write data from or to Data
> > Nodes.
> > 2.6.Easy to deploy: HBlock will provide easy-to-use utilities to make the
> > installation process extremely easy. Since HBlock does not rely on any
> > Operating System, deployment is easy unlike other storage systems that
> rely
> > on in-kernel iSCSI module, such as Linux-IO (LIO), or SCST.
> >
>
> I noticed there are a lot of `will`s here in the Proposal section as the
> project core features.
> Are these language issues or all these features not available today?
> Which parts have been implemented?
>
>
> >
> > 3.Background
> > We think block storage is a very general technology.
> > Block storage is the foundation of enterprise IT infrastructure. But
> > unfortunately, there is not any open source and mature distributed block
> > storage at this moment.
> > Ceph is well known and widely adopted, but it is just a storage engine in
> > the same level as HDFS. Ceph does not cover the need for iSCSI. If you
> want
> > to use Ceph as block storage, you must use solutions like LIO to handle
> > iSCSI. Unfortunately, LIO lacks many features and thus cannot be directly
> > used in an enterprise production environment. Additionally, LIO is a
> Linux
> > kernel module and Ceph is a user-space process creating problems to allow
> > LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being
> > worked on (
> https://www.kernel.org/doc/Documentation/target/tcmu-design.txt
> > ),
> > but it looks ugly to make an in-kernel module call a user-space process.
> > That is why we want to create HBlock, which will implement comprehensive
> > enterprise level features completely in user-space including High
> > Availability (HA), distributed cache, VAAI, PR, ODX and so on.
> > HBlock project is based on HDFS and will be an excellent addition to the
> > Apache family of projects.
> >
> > 4.Rationale
> > Block storage is the foundation of enterprise IT infrastructure. But
> > unfortunately, there is not any open source and mature distributed block
> > storage at this moment.
> > Ceph is well known and widely adopted, but it is just a storage engine in
> > the same level as HDFS. Ceph does not cover the need for iSCSI. If you
> want
> > to use Ceph as block storage, you must use solutions like LIO to handle
> > iSCSI. Unfortunately, LIO lacks many features and thus cannot be directly
> > used in an enterprise production environment. Additionally, LIO is a
> Linux
> > kernel module and Ceph is a user-space process creating problems to allow
> > LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being
> > worked on (
> https://www.kernel.org/doc/Documentation/target/tcmu-design.txt
> > ),
> > but it looks ugly to make an in-kernel module call a user-space process.
> > That is why we want to create HBlock, which will implement comprehensive
> > enterprise level features completely in user-space include High
> > Availability
> > (HA), distributed cache, VAAI, PR, ODX and so on.
> > HBlock project is based on HDFS and will be an excellent addition to the
> > Apache family of projects.
> >
> > 5.Initial Goals
> > N/A.
> >
>
> Why this is N/A?
>
>
> >
> > 6.Current Status
> > At present, we have completed the development of HBlock in a stand-alone
> > version. HBlock has been used in the online environment of many
> customers.
> > This standalone version has implemented advanced SCSI functions including
> > PR, VAAI, ODX, etc., among which cross Network Address Translation(NAT)
> NAT
> > support is a key feature of HBlock, which can allow clients in the LAN to
> > access iSCSI targets located on the Internet. HBlock makes it possible to
> > provide iSCSI as a Service. A version with high availability features is
> > also under testing.
> > 6.1 Meritocracy
> > At present, this project is still an internal private project which is
> > operated according to the internal project development technology of the
> > enterprise, so it does not involve this issue. But we are willing to
> follow
> > the rules of the open source community. We will be tracking submissions
> > from
> > patches, accepting the intentional patches of HBlock and increasing the
> > publicity of HBlock. We look to invite more people who show merit to join
> > the project.
> > 6.2 Community
> > At present, the HBlock project is still an internal private project,
> which
> > is operated according to the internal project development technology of
> the
> > enterprise, so it does not involve this issue. But we are willing to
> follow
> > the rules of the open source community.
> > There are several business customers using our HBlock, and we will invite
> > them and their industry partners to join the community. We will
> communicate
> > with China Telecom Cloud Service customers through forums, e-mail,
> instant
> > messages and other ways, and update the product information in time, so
> as
> > to attract more developers to join the project.
> > 6.3 Core Developers
> > At present, the HBlock Project has about 30 people.  Approximately 20
> > internal developers and 10 test engineers, all very experienced
> engineers.
> >
>
> Are the test engineers internal too? I suppose.
>
>
> > There is some brief introduction of the key contributors.
> > Dong Changkun, who is the development team leader with rich JAVA
> > development
> > experience, as the architect of HBlock to control the overall design.
> > Wu Zhimin, who is the R & D expert of cloud storage product line in our
> > company, more than 12 years of storage development experience. In HBlock,
> > he
> > is mainly responsible for the architecture design of the protocol module,
> > the implementation of the SCSI module, and the research of difficult
> > points.
> > Yu Erdong, who is rich JAVA development experience and distributed
> storage
> > system development experience; Mainly responsible for the design of
> HBlock
> > back-end modules and management tool modules, as well as the development
> of
> > back-end cache and master-slave switching.
> > 6.4 Alignment
> > HBlock is the only product in the industry that develops block storage
> > based
> > on HDFS.
> > With the increase in sizes of disk capacity, such as the emergence of
> > Shingled Magnetic Recording (SMR) disk, more and more disks show the
> > negative characteristics of sequential write. Flash memory also has the
> > same
> > characteristics. The underlying particles of flash memory are written
> > sequentially in blocks (32MB), but the SSD disk will reserve 20% space
> for
> > merging so that the file system seems to support random writing. Because
> > HBlock is based on HDFS, HBlock inherently supports sequential write.
> > Combined with thread IO of random write to SSDs being very small, HBlock
> > allows you to reduce 20% of the reserved space to only 5%.
> > In addition, with the large adoption of HDFS, HBlock allows HDFS
> facilities
> > to become highly available, cloud-ready, block storage which is super
> cool!
> >
> > 7.Known Risks
> > The software is not stable and has bugs, which needs continuous
> > improvement.
> > More sophisticated strategies are needed to schedule and optimize the
> time
> > of data merging to avoid merging data during the business peak hours.
> >
> > 8.Project Name
> > HBlock is named because Hadoop is a distributed project in the Apache
> > community, and the database project based on this project is called
> HBase.
> > In order to follow this style as a distributed block storage project, we
> > named it HBlock.
> >
> > 9.Orphaned products
> > Storage is our core business and HBlock is our technical direction.  We
> > will
> > continue to invest it and see value in building a vibrant open source
> > community to improve it. We believe that HBlock, a product based on HDFS,
> > will have more vitality as an open source software project under the
> Apache
> > Software Foundation.
> > 9.1 Inexperience with Open Source
> > We don't have much experience in open source, but we hope to open source
> > HBlock so that more people can use and develop this project. We are
> willing
> > to learn from Apache's experience in open source and apply it to the
> HBlock
> > project.
> > Jiang Feng, who is the founder and team leader of HBlock project,
> submitted
> > code to Hadoop more than 10 years ago.
> >
>
> Is he already a Hadoop committer or PMC? Does he have experience in the ASF
> process?
>
>
> > 9.2 Length of Incubation
> > It is expected that the HBlock project will take one year to complete the
> > incubation process.
> >
>
> One year is a short term for most incubator project. IPMC, please correct
> me if I am wrong.
> How do you get this as an expected conclusion?
>
>
> > While learning the Apache Way, we have an aggressive release calendar:
> >
>
> Why the following features have anything related to the Apache Way?
> These look like feature roadmap only to me. These are development plans,
> not like the community build.
> Confused for me, could you explain?
>
>
> > In April 2020, we will complete the version of HBlock with high
> > availability.
> > In June 2020, we will complete the development of the web portal and
> > "green"
> > installation that can be installed with existing applications and support
> > x86 and ARM servers.
> > In September 2020, we will complete advanced SCSI functions, including
> PR,
> > VAAI, ODX, etc.
> > 9.3 Homogenous Developers
> > At present, HBlock has approximately 20 developers, all of whom are very
> > experienced engineers. They work in Beijing, Shanghai, Inner Mongolia and
> > other regions, and they are experienced with working in a distributed
> > environment for the same company.
> > We will expand our existing team through campus recruitment and social
> > recruitment, and attract more developers from the community to join the
> > HBlock project. HDFS is a widely used project. We have confident that the
> > block storage project based on HDFS will attract more volunteers.
> > 9.4 Reliance on Salaried Developers
> > HBlock is reliant on China Telecom's salaried developers. China Telecom
> > will
> > not easily change its market strategy. This is the first time for China
> > Telecom to share the project with the open source community, so it will
> pay
> > attention to the investment in this project. At the same time, the
> project
> > will be widely used in China Telecom. With the support of resources of
> > China
> > Telecom and the verification of the actual project, the continuity and
> > quality of the project will be guaranteed. We also have been developing
> in
> > the storage field for seven and a half years and will continue to work in
> > this field. At the same time, block storage based on HDFS will definitely
> > attract more volunteers to join. We will support volunteers being
> involved
> > and our developers are committed to doing so.
> > 9.5 Relationships with Other Apache Products
> > HBlock uses Apache HDFS, Apache commons-IO, commons-collections,
> > commons-configuration, commons-email, commons-logging, Apache log4j, and
> > Apache Hadoop-common.
> > 9.6 An Excessive Fascination with the Apache Brand
> > We have chosen the Apache Software Foundation as the home to open source
> > HBlock because HBlock is based on HDFS.  We believe there is a very
> natural
> > synergy with Apache.
> >
> > 10.Documentation
> > About the user guide, please refer to "China Telecom HBlock User
> > Guide_20200121.docx". (There is only a doc version right now)
> >
> > 11.Initial Source
> > HBlock has been developed since the second half of 2018. HBlock is based
> on
> > HDFS and the internal source code will be donated to the Foundation.
> China
> > Telecom is prepared to execute the paperwork required for the donation.
> >
> > 12.Source and Intellectual Property Submission Plan
> > The HBlock specification and content on www.ctyun.cn are from China
> > Telecom
> > Co., Ltd. The HBlock library uses the Java language. There is no
> complexity
> > in the code base donation process and we are ready to move the
> repositories
> > over.
> > 12.1 External Dependencies
> > HBlock use Apache commons-IO, commons-collections, commons-configuration,
> > Apache log4j,commons-email,commons-logging,org.json, jline,pty4j, Apache
> > hadoop-hdfs, hadoop-common, netty-all, and Apache zookeeper. These are
> all
> > under Apache or BSD licenses.
>
> 12.2 Cryptography
> > The HBlock project does not involve encryption code.
> >
> > 13.Required Resources
> > 13.1 Mailing lists:
> > priv...@hblock.incubator.apache.org
> > d...@hblock.incubator.apache.org
> > us...@hblock.incubator.apache.org
>
>
> user ml is not recommended. As you don't have users today. Recommend to
> share it with the dev.
>
> Sheng Wu 吴晟
> Twitter, wusheng1108
>
>
> >
> > comm...@hblock.incubator.apache.org
> > 13.2 Subversion Directory
> > https://svn.apache.org/repos/asf/incubator/hblock
> > (According to Apache rules)
> > 13.3 Git Repositories
> > https://gitbox.apache.org/repos/asf/incubator-hblock.git
> > (According to Apache rules)
> > 13.4 Issue Tracking
> > JIRA HBlock(HBLOCK)
> > (According to Apache rules)
> > 13.5 Other Resources
> > N/A.
> >
> > 14.Initial Committers
> > Yu Erdong (yued at chinatelecom dot cn)
> > Wu Zhimin (wuzhimin at chinatelecom dot cn)
> > Yang Chao (yangchao1 at chinatelecom dot cn)
> > Dong Changkun (dongck at chinatelecom dot cn)
> > Guo Yong (guoyong1 at chinatelecom dot cn)
> > Zhao Wentao(zhaowt at chinatelecom dot cn)
> > Cui Meng (cuimeng at chinatelecom dot cn)
> > Wei Wei (weiwei2 at chinatelecom dot cn)
> >
> > 15.Sponsors
> > 15.1 Champion
> > Kevin A. McGrail
> > 15.2 Nominated Mentors
> > Kevin A. McGrail
> > 15.3 Sponsoring Entity
> > The Incubator
> > (END)
> >
> > Best Wishes.
> >
> >
> ----------------------------------------------------------------------------
> > ------------------
> > Zhang Guochen  Project Manager
> > China Telecom Corporation Limited Cloud Computing Branch Corporation
> > Mail: zhangguoc...@chinatelecom.cn
> > Phone: 86-17301021225
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>

Re: [Proposal]New storage project: HBlock

Reply via email to