I have committed to champion and I think the points you make are good, Ted. Do you have the bandwidth to be a mentor?
I will work with them to set expectations about the process. I have also asked for them to do some community building now, too. -- Kevin A. McGrail Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Wed, Mar 25, 2020 at 12:00 PM Ted Dunning <ted.dunn...@gmail.com> wrote: > Three things are very clear to me: > > 1) having an open source iSCSI implementation from a mature and experienced > storage stream is a very cool thing, especially if it can be targeted to > non HDFS storage relatively easily. Building such a thing requires very > high levels of experience and expertise that have generally been lacking in > the open source world. > > 2) this team is very naive about the negative impacts that Apache processes > will have on their development speed and will need lots of mentoring. Given > their release schedule, I think that there are symmetrical risks, first > that the team will be tempted to JFDI when getting features out the door > rather than communicate and share designs and second that if they build a > proper community overcoming language, timezone and large internal team > dynamics that the internal political costs will severe due to slower > development. > > 3) this team is very enthusiastic about making open source work and that > might be enough to allow them to succeed in spite of the difficulties. > > The path to success here is, in my opinion, to require strong and engaged > mentorship and make it very clear before they come in that Apache may not > be a good fit due to the pressures they face to delivery on a schedule. If > incubation with a high risk of exit back to a non-Apache form is acceptable > to the project team, then it should be fine for Apache. > > > > On Mon, Mar 9, 2020 at 7:45 PM Sheng Wu <wu.sheng.841...@gmail.com> wrote: > > > Hi > > > > Personally, and basically, I am feeling the team has misunderstood > > the meaning of incubator and the requirements of building the community. > > Same as the last time discussion, I still think they will be in a big > > pressure as they have to deal with the basic feature development, > community > > build and following ASF incubator requirements at the same time if they > are > > accepted into the incubator. And at the same time, the team lacks the > > experiences of open source community in or out of ASF. > > I am not sure whether this is good for the project. Seem like a little > > hurry to join the incubator. > > More Comments inline. > > > > Willing to listen to what other IPMCs think. > > > > <zhangguoc...@chinatelecom.cn> 于2020年3月10日周二 上午10:21写道: > > > > > Hi, All, > > > > > > We are China Telecom Corporation Limited Cloud Computing Branch > > > Corporation. > > > We hope to contribute one of our projects named 'HBlock' to Apache. > > > Here is the proposal of HBlock project, please feel free to let me know > > > what > > > the concerns and suggestions from you. Thank you so much. > > > > > > HBlock Proposal > > > > > > 1.Abstract > > > The HBlock project will be an enterprise distributed block storage. > > > > > > 2.Proposal > > > HBlock provides a distributed block storage with the following > features: > > > 2.1.User-space iSCSI target: HBlock will implement an iSCSI target that > > is > > > RFC-7143 (https://tools.ietf.org/html/rfc7143) compliant written in > pure > > > Java designed to run on top of any mainstream Operating System, > including > > > Windows and Linux, as a user-space process. > > > 2.2.Enterprise level features: HBlock will implement comprehensive > > > enterprise level features, such as > > > Asymmetric Logical Unit Access (ALUA, Information technology -SCSI > > Primary > > > Commands - 4 (SPC-4), > > https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf), > > > > > > Persistent Reservations (PR, Information technology -SCSI Primary > > Commands > > > - > > > 4 (SPC-4), https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf), > > > VMware vSphere Storage APIs - Array Integration(VAAI, > > > > > > > > > https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-int > > > egration-10337.html > > > < > > > https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-integration-10337.html > > > > > > ), > > > Offloaded Data Transfer(ODX, > > > > > > > > > https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-se > > > rver-2012-R2-and-2012/hh831628(v=ws.11) > > > < > > > https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/hh831628(v=ws.11) > > >), > > > so that it will support > > > session-level fail-over, > > > Oracle Real Application Cluster(Oracle RAC, > > > https://www.oracle.com/database/technologies/rac.html) , > > > Cluster File System (CFS), VMware cluster and Windows cluster. > > > 2.3.Low latency: HBlock will implement in-memory distributed cache to > > > reduce > > > write latency and improve Input / Output Operations Per Second (IOPS), > > and > > > it will leverage storage-class memory to archive even higher durability > > > without IOPS loss. > > > 2.4.Smart Compaction and Garbage Collection(GC): HBlock will convert > all > > > the > > > write operations into sequential append operations to improve the > random > > > write performance, and it will choose the best timing to compact and > > > collect > > > the garbage per Logic Unit (LU). Comparting to Solid State Drives > (SSD's) > > > internal Garbage Collection, such a global GC will reduce the need of > > SSD's > > > internal GC, which indirectly make SSD have more usable space, and have > > > even > > > better GC strategy due to close to application. In essence, flash > writes > > > data in block (32MB) order. In order to realize random write, SSD disk > > will > > > reserve a part of space for GC in the disk. Therefore, the more random > > > write > > > and delete, the more space needs to be reserved. HDFS based writes are > > > sequential for SSD, so the space reserved in SSD is small. In short, as > > > long > > > as there is a GC, there must be reserved space, either in the HBlock > > layer > > > or in the controller layer inside the SSD. Because HBlock is closer to > > LU, > > > it can be more efficient GC. For example, a LU dedicated to video > > > monitoring > > > data basically writes video data in sequence, and starts writing again > > when > > > the disk is full. This LU does not need any GC at all. If you do GC in > > the > > > SSD layer, SSD will see the data of various LUs, and unnecessary > movement > > > will be made to the LU dedicated for video monitoring. > > > 2.5.Hadoop Distributed File System (HDFS)-based: HBlock leverages HDFS > a > > as > > > persistent layer to avoid reinventing wheels. The iSCSI target will run > > on > > > the client side of HDFS and directly read or write data from or to Data > > > Nodes. > > > 2.6.Easy to deploy: HBlock will provide easy-to-use utilities to make > the > > > installation process extremely easy. Since HBlock does not rely on any > > > Operating System, deployment is easy unlike other storage systems that > > rely > > > on in-kernel iSCSI module, such as Linux-IO (LIO), or SCST. > > > > > > > I noticed there are a lot of `will`s here in the Proposal section as the > > project core features. > > Are these language issues or all these features not available today? > > Which parts have been implemented? > > > > > > > > > > 3.Background > > > We think block storage is a very general technology. > > > Block storage is the foundation of enterprise IT infrastructure. But > > > unfortunately, there is not any open source and mature distributed > block > > > storage at this moment. > > > Ceph is well known and widely adopted, but it is just a storage engine > in > > > the same level as HDFS. Ceph does not cover the need for iSCSI. If you > > want > > > to use Ceph as block storage, you must use solutions like LIO to handle > > > iSCSI. Unfortunately, LIO lacks many features and thus cannot be > directly > > > used in an enterprise production environment. Additionally, LIO is a > > Linux > > > kernel module and Ceph is a user-space process creating problems to > allow > > > LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being > > > worked on ( > > https://www.kernel.org/doc/Documentation/target/tcmu-design.txt > > > ), > > > but it looks ugly to make an in-kernel module call a user-space > process. > > > That is why we want to create HBlock, which will implement > comprehensive > > > enterprise level features completely in user-space including High > > > Availability (HA), distributed cache, VAAI, PR, ODX and so on. > > > HBlock project is based on HDFS and will be an excellent addition to > the > > > Apache family of projects. > > > > > > 4.Rationale > > > Block storage is the foundation of enterprise IT infrastructure. But > > > unfortunately, there is not any open source and mature distributed > block > > > storage at this moment. > > > Ceph is well known and widely adopted, but it is just a storage engine > in > > > the same level as HDFS. Ceph does not cover the need for iSCSI. If you > > want > > > to use Ceph as block storage, you must use solutions like LIO to handle > > > iSCSI. Unfortunately, LIO lacks many features and thus cannot be > directly > > > used in an enterprise production environment. Additionally, LIO is a > > Linux > > > kernel module and Ceph is a user-space process creating problems to > allow > > > LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being > > > worked on ( > > https://www.kernel.org/doc/Documentation/target/tcmu-design.txt > > > ), > > > but it looks ugly to make an in-kernel module call a user-space > process. > > > That is why we want to create HBlock, which will implement > comprehensive > > > enterprise level features completely in user-space include High > > > Availability > > > (HA), distributed cache, VAAI, PR, ODX and so on. > > > HBlock project is based on HDFS and will be an excellent addition to > the > > > Apache family of projects. > > > > > > 5.Initial Goals > > > N/A. > > > > > > > Why this is N/A? > > > > > > > > > > 6.Current Status > > > At present, we have completed the development of HBlock in a > stand-alone > > > version. HBlock has been used in the online environment of many > > customers. > > > This standalone version has implemented advanced SCSI functions > including > > > PR, VAAI, ODX, etc., among which cross Network Address Translation(NAT) > > NAT > > > support is a key feature of HBlock, which can allow clients in the LAN > to > > > access iSCSI targets located on the Internet. HBlock makes it possible > to > > > provide iSCSI as a Service. A version with high availability features > is > > > also under testing. > > > 6.1 Meritocracy > > > At present, this project is still an internal private project which is > > > operated according to the internal project development technology of > the > > > enterprise, so it does not involve this issue. But we are willing to > > follow > > > the rules of the open source community. We will be tracking submissions > > > from > > > patches, accepting the intentional patches of HBlock and increasing the > > > publicity of HBlock. We look to invite more people who show merit to > join > > > the project. > > > 6.2 Community > > > At present, the HBlock project is still an internal private project, > > which > > > is operated according to the internal project development technology of > > the > > > enterprise, so it does not involve this issue. But we are willing to > > follow > > > the rules of the open source community. > > > There are several business customers using our HBlock, and we will > invite > > > them and their industry partners to join the community. We will > > communicate > > > with China Telecom Cloud Service customers through forums, e-mail, > > instant > > > messages and other ways, and update the product information in time, so > > as > > > to attract more developers to join the project. > > > 6.3 Core Developers > > > At present, the HBlock Project has about 30 people. Approximately 20 > > > internal developers and 10 test engineers, all very experienced > > engineers. > > > > > > > Are the test engineers internal too? I suppose. > > > > > > > There is some brief introduction of the key contributors. > > > Dong Changkun, who is the development team leader with rich JAVA > > > development > > > experience, as the architect of HBlock to control the overall design. > > > Wu Zhimin, who is the R & D expert of cloud storage product line in our > > > company, more than 12 years of storage development experience. In > HBlock, > > > he > > > is mainly responsible for the architecture design of the protocol > module, > > > the implementation of the SCSI module, and the research of difficult > > > points. > > > Yu Erdong, who is rich JAVA development experience and distributed > > storage > > > system development experience; Mainly responsible for the design of > > HBlock > > > back-end modules and management tool modules, as well as the > development > > of > > > back-end cache and master-slave switching. > > > 6.4 Alignment > > > HBlock is the only product in the industry that develops block storage > > > based > > > on HDFS. > > > With the increase in sizes of disk capacity, such as the emergence of > > > Shingled Magnetic Recording (SMR) disk, more and more disks show the > > > negative characteristics of sequential write. Flash memory also has the > > > same > > > characteristics. The underlying particles of flash memory are written > > > sequentially in blocks (32MB), but the SSD disk will reserve 20% space > > for > > > merging so that the file system seems to support random writing. > Because > > > HBlock is based on HDFS, HBlock inherently supports sequential write. > > > Combined with thread IO of random write to SSDs being very small, > HBlock > > > allows you to reduce 20% of the reserved space to only 5%. > > > In addition, with the large adoption of HDFS, HBlock allows HDFS > > facilities > > > to become highly available, cloud-ready, block storage which is super > > cool! > > > > > > 7.Known Risks > > > The software is not stable and has bugs, which needs continuous > > > improvement. > > > More sophisticated strategies are needed to schedule and optimize the > > time > > > of data merging to avoid merging data during the business peak hours. > > > > > > 8.Project Name > > > HBlock is named because Hadoop is a distributed project in the Apache > > > community, and the database project based on this project is called > > HBase. > > > In order to follow this style as a distributed block storage project, > we > > > named it HBlock. > > > > > > 9.Orphaned products > > > Storage is our core business and HBlock is our technical direction. We > > > will > > > continue to invest it and see value in building a vibrant open source > > > community to improve it. We believe that HBlock, a product based on > HDFS, > > > will have more vitality as an open source software project under the > > Apache > > > Software Foundation. > > > 9.1 Inexperience with Open Source > > > We don't have much experience in open source, but we hope to open > source > > > HBlock so that more people can use and develop this project. We are > > willing > > > to learn from Apache's experience in open source and apply it to the > > HBlock > > > project. > > > Jiang Feng, who is the founder and team leader of HBlock project, > > submitted > > > code to Hadoop more than 10 years ago. > > > > > > > Is he already a Hadoop committer or PMC? Does he have experience in the > ASF > > process? > > > > > > > 9.2 Length of Incubation > > > It is expected that the HBlock project will take one year to complete > the > > > incubation process. > > > > > > > One year is a short term for most incubator project. IPMC, please correct > > me if I am wrong. > > How do you get this as an expected conclusion? > > > > > > > While learning the Apache Way, we have an aggressive release calendar: > > > > > > > Why the following features have anything related to the Apache Way? > > These look like feature roadmap only to me. These are development plans, > > not like the community build. > > Confused for me, could you explain? > > > > > > > In April 2020, we will complete the version of HBlock with high > > > availability. > > > In June 2020, we will complete the development of the web portal and > > > "green" > > > installation that can be installed with existing applications and > support > > > x86 and ARM servers. > > > In September 2020, we will complete advanced SCSI functions, including > > PR, > > > VAAI, ODX, etc. > > > 9.3 Homogenous Developers > > > At present, HBlock has approximately 20 developers, all of whom are > very > > > experienced engineers. They work in Beijing, Shanghai, Inner Mongolia > and > > > other regions, and they are experienced with working in a distributed > > > environment for the same company. > > > We will expand our existing team through campus recruitment and social > > > recruitment, and attract more developers from the community to join the > > > HBlock project. HDFS is a widely used project. We have confident that > the > > > block storage project based on HDFS will attract more volunteers. > > > 9.4 Reliance on Salaried Developers > > > HBlock is reliant on China Telecom's salaried developers. China Telecom > > > will > > > not easily change its market strategy. This is the first time for China > > > Telecom to share the project with the open source community, so it will > > pay > > > attention to the investment in this project. At the same time, the > > project > > > will be widely used in China Telecom. With the support of resources of > > > China > > > Telecom and the verification of the actual project, the continuity and > > > quality of the project will be guaranteed. We also have been developing > > in > > > the storage field for seven and a half years and will continue to work > in > > > this field. At the same time, block storage based on HDFS will > definitely > > > attract more volunteers to join. We will support volunteers being > > involved > > > and our developers are committed to doing so. > > > 9.5 Relationships with Other Apache Products > > > HBlock uses Apache HDFS, Apache commons-IO, commons-collections, > > > commons-configuration, commons-email, commons-logging, Apache log4j, > and > > > Apache Hadoop-common. > > > 9.6 An Excessive Fascination with the Apache Brand > > > We have chosen the Apache Software Foundation as the home to open > source > > > HBlock because HBlock is based on HDFS. We believe there is a very > > natural > > > synergy with Apache. > > > > > > 10.Documentation > > > About the user guide, please refer to "China Telecom HBlock User > > > Guide_20200121.docx". (There is only a doc version right now) > > > > > > 11.Initial Source > > > HBlock has been developed since the second half of 2018. HBlock is > based > > on > > > HDFS and the internal source code will be donated to the Foundation. > > China > > > Telecom is prepared to execute the paperwork required for the donation. > > > > > > 12.Source and Intellectual Property Submission Plan > > > The HBlock specification and content on www.ctyun.cn are from China > > > Telecom > > > Co., Ltd. The HBlock library uses the Java language. There is no > > complexity > > > in the code base donation process and we are ready to move the > > repositories > > > over. > > > 12.1 External Dependencies > > > HBlock use Apache commons-IO, commons-collections, > commons-configuration, > > > Apache log4j,commons-email,commons-logging,org.json, jline,pty4j, > Apache > > > hadoop-hdfs, hadoop-common, netty-all, and Apache zookeeper. These are > > all > > > under Apache or BSD licenses. > > > > 12.2 Cryptography > > > The HBlock project does not involve encryption code. > > > > > > 13.Required Resources > > > 13.1 Mailing lists: > > > priv...@hblock.incubator.apache.org > > > d...@hblock.incubator.apache.org > > > us...@hblock.incubator.apache.org > > > > > > user ml is not recommended. As you don't have users today. Recommend to > > share it with the dev. > > > > Sheng Wu 吴晟 > > Twitter, wusheng1108 > > > > > > > > > > comm...@hblock.incubator.apache.org > > > 13.2 Subversion Directory > > > https://svn.apache.org/repos/asf/incubator/hblock > > > (According to Apache rules) > > > 13.3 Git Repositories > > > https://gitbox.apache.org/repos/asf/incubator-hblock.git > > > (According to Apache rules) > > > 13.4 Issue Tracking > > > JIRA HBlock(HBLOCK) > > > (According to Apache rules) > > > 13.5 Other Resources > > > N/A. > > > > > > 14.Initial Committers > > > Yu Erdong (yued at chinatelecom dot cn) > > > Wu Zhimin (wuzhimin at chinatelecom dot cn) > > > Yang Chao (yangchao1 at chinatelecom dot cn) > > > Dong Changkun (dongck at chinatelecom dot cn) > > > Guo Yong (guoyong1 at chinatelecom dot cn) > > > Zhao Wentao(zhaowt at chinatelecom dot cn) > > > Cui Meng (cuimeng at chinatelecom dot cn) > > > Wei Wei (weiwei2 at chinatelecom dot cn) > > > > > > 15.Sponsors > > > 15.1 Champion > > > Kevin A. McGrail > > > 15.2 Nominated Mentors > > > Kevin A. McGrail > > > 15.3 Sponsoring Entity > > > The Incubator > > > (END) > > > > > > Best Wishes. > > > > > > > > > ---------------------------------------------------------------------------- > > > ------------------ > > > Zhang Guochen Project Manager > > > China Telecom Corporation Limited Cloud Computing Branch Corporation > > > Mail: zhangguoc...@chinatelecom.cn > > > Phone: 86-17301021225 > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > >