thanks very mutch
------------------ ???????? ------------------ ??????: "general" <wu.sheng.841...@gmail.com>; ????????: 2021??2??27??(??????) ????10:50 ??????: "fp"<f...@lucene.cn>;"Incubator"<general@incubator.apache.org>; ????: Re: [Proposal] lxdb - proposal for Apache Incubation I forwarded the private reply to the mail list. But deleted his cellphone number for privacy protection. Sheng Wu ???? Twitter, wusheng1108 fp <f...@lucene.cn> ??2021??2??27?????? ????10:00?????? > sorry ???????? > > Hi ???? > Thank you for your reply,In response to your question, my answers are as follows.(??????????????????????????.) > > 1.Since you are proposing a new project to a global foundation, you should at > least keep your documentation in English. > >Of course, if Apache accepts this project, I will complete all the documents and translate them into English. Although my English is not very good, many of our company come back from Australia. This should not be a problem > 2:Your provided links are Chinese,which for most IPMC people, it is not readable. > >In addition to the source code, what other documents are needed? Do you want me to provide some basic project use or introduction first? > 3:And since this project is close-source, please provide the dependencies. > >The version to be open source is 100% rewritten. It relies on Hadoop, HBase, spark, zookeeper, and does not rely on any code from my previous company > 4:And as you repeated said the original projects, is this project created 100% on your own, is it including something from Alibaba/Tencent? > >the current version of lxdb is 100% created on my own . it isn`t including anything form Alibaba/Tencent. > >The previous version of lxdb relies on the mdrill of Alibaba. I am the author of mdrill project and mdrill is an open source project. > >About Tencent Hermes is my work in Tencent, but after I started my business, I didn't use the source code of Hermes, and I informed Tencent before I started my business > 5:As there is no open-source, I can't verify. > >If you are interested, I can provide the source code to PMC members separately for auditing > 6:Due to this is close-source, we also need you to be clear about whether you > are going to submit SGA and open source to the public. > >I haven't open source the project yet, mainly to see if PMC is interested in my project. If interested, I will open source. In this way, I can persuade my investors. If PMC is not interested, I may consider opening source later. At present, the project has about 100000 lines of code, which can be provided to PMC for review > 7:The most important, `lucene` is an Apache trademark and Apache project,this makes me have concerns about the branding violation. > >I just like Lucene. If the name offends PMC, I can correct it for the right name. > 8:At last, typically, we(incubator) expect you to have open-sourced the project, and at least have a small community and first adoption out of your company. > Our company is a commercial company. The community of previous projects here may be different from what you said. We have organized a QQ communication group with about 1000 people. Many students here have been our users for many years, and they are looking forward to the development of our project > 9:To join the incubator, you also need at least 3 IPMC members and 1 Champion(Apache member or officer) to help you understand the incubator. > Can you help me? I really have language problems. There is less communication in this area. I have done a lot of sharing in China before. I hope you can help me if you can > my telnum is ------ > > > > ------------------ ???????? ------------------ > *??????:* "fp" <f...@lucene.cn>; > *????????:* 2021??2??27??(??????) ????9:57 > *??????:* "wu.sheng.841108"<wu.sheng.841...@gmail.com>; > *????:* ?????? [Proposal] lxdb - proposal for Apache Incubation > > > > > ------------------ ???????? ------------------ > *??????:* "fp" <f...@lucene.cn>; > *????????:* 2021??2??27??(??????) ????9:55 > *??????:* "Incubator"<general@incubator.apache.org>; > *????:* ?????? [Proposal] lxdb - proposal for Apache Incubation > > Hi ???? > Thank you for your reply,In response to your question, my answers are as > follows.(??????????????????????????.) > > 1.Since you are proposing a new project to a global foundation, you should > at > least keep your documentation in English. > >Of course, if Apache accepts this project, I will complete all the > documents and translate them into English. Although my English is not very > good, many of our company come back from Australia. This should not be a > problem > 2:Your provided links are Chinese,which for most IPMC people, it is not > readable. > >In addition to the source code, what other documents are needed? Do you > want me to provide some basic project use or introduction first? > 3:And since this project is close-source, please provide the dependencies. > >The version to be open source is 100% rewritten. It relies on Hadoop, > HBase, spark, zookeeper, and does not rely on any code from my previous > company > 4:And as you repeated said the original projects, is this project created > 100% on your own, is it including something from Alibaba/Tencent? > >the current version of lxdb is 100% created on my own . it isn`t > including anything form Alibaba/Tencent. > >The previous version of lxdb relies on the mdrill of Alibaba. I am the > author of mdrill project and mdrill is an open source project. > >About Tencent Hermes is my work in Tencent, but after I started my > business, I didn't use the source code of Hermes, and I informed Tencent > before I started my business > 5:As there is no open-source, I can't verify. > >If you are interested, I can provide the source code to PMC members > separately for auditing > 6:Due to this is close-source, we also need you to be clear about whether > you > are going to submit SGA and open source to the public. > >I haven't open source the project yet, mainly to see if PMC is interested > in my project. If interested, I will open source. In this way, I can > persuade my investors. If PMC is not interested, I may consider opening > source later. At present, the project has about 100000 lines of code, which > can be provided to PMC for review > 7:The most important, `lucene` is an Apache trademark and Apache > project,this makes me have concerns about the branding violation. > >I just like Lucene. If the name offends PMC, I can correct it for the > right name. > 8:At last, typically, we(incubator) expect you to have open-sourced the > project, and at least have a small community and first adoption out of your > company. > Our company is a commercial company. The community of previous projects > here may be different from what you said. We have organized a QQ > communication group with about 1000 people. Many students here have been > our users for many years, and they are looking forward to the development > of our project > 9:To join the incubator, you also need at least 3 IPMC members and 1 > Champion(Apache member or officer) to help you understand the incubator. > Can you help me? I really have language problems. There is less > communication in this area. I have done a lot of sharing in China before. I > hope you can help me if you can.If you like this project, you can also join > us. It's a very good opportunity in China's database market > my telnum is 17099831107 > > > yannian mu ?????? > luxin,muyannian > > > ------------------ ???????? ------------------ > *??????:* "general" <wu.sheng.841...@gmail.com>; > *????????:* 2021??2??27??(??????) ????9:06 > *??????:* "Incubator"<general@incubator.apache.org>; > *????:* Re: [Proposal] lxdb - proposal for Apache Incubation > > Hi > > Since you are proposing a new project to a global foundation, you should at > least keep your documentation in English. Your provided links are Chinese, > which for most IPMC people, it is not readable. > And since this project is close-source, please provide the dependencies. > And as you repeated said the original projects, is this project created > 100% on your own, is it including something from Alibaba/Tencent? As there > is no open-source, I can't verify. > Due to this is close-source, we also need you to be clear about whether you > are going to submit SGA and open source to the public. > > The most important, `lucene` is an Apache trademark and Apache project, > this makes me have concerns about the branding violation. > > At last, typically, we(incubator) expect you to have open-sourced the > project, and at least have a small community and first adoption out of your > company. > > To join the incubator, you also need at least 3 IPMC members and 1 > Champion(Apache member or officer) to help you understand the incubator. > > Sheng Wu ???? > Twitter, wusheng1108 > > > fp <f...@lucene.cn> ??2021??2??27?????? ????6:40?????? > > > Dear Apache Incubator Community, > > > > > > Please accept the following proposal for presentation and discussion: > > https://github.com/lucene-cn/lxdb/wiki > > > > > > LXDB is a high-performance,OLAP,full text search database.it`s base on > > hbase,but replaced hfile with lucene index to support more effective > > secondary indexes,it`s also base on spark sql,so that you can used sql > api > > to visit data and do olap calculate. and also the lucene index is store > on > > hdfs (not local disk). > > > > > > In our Production System, LXDB supported 200+ clusters,some of the single > > cluster is 1000+ nodes,insert 200 billion rows&nbsp; per day ( 20000 > > billion rows for total), one of the biggest single table has 200million > > lucene index on LXDB. > > > > > > Hadoop`s father Doug Cutting cut nutch into HBase, MapReduce (hive), > HDFS, > > Lucene.We have merged these separated projects again,LXDB equals spark > > sql+hbase+lucene+parquet+hdfs,it is a super database.It took me 10 years > to > > complete these merging operations.But the purpose is no longer a search > > engine, but a database. > > > > > > > > > > Best regards > > &nbsp; yannian mu > > > > > > > > > > LXDB Proposal > > == Abstract == > > LXDB is a high-performance,OLAP,full text search database. > > > > > > === it`s base on hbase,but replaced hfile with lucene index to support > > more effective secondary indexes.=== > > we modify hbase region server ,we&nbsp; change hfile to lucene,when put > > data we put&nbsp; document to lucene instande of&nbsp; put data to hfile > > lucene index store on region server&nbsp; (it is not sote in different > > cluster like elstice search+hbase ,it takes to copy of data) > > > > > > === it`s base on spark sql for olap=== > > we Integrated spark and hbase together ,it`s useage like this , > > 1.unpackage lxdb.tar.gz > > 2.config hadoop_config path, > > 3.run start-all.sh to start cluster. > > lxdb can startup spark through hadoop yarn ,and then spark executor > > process Embedded start hbase region server service . > > > > > > you can operate lxdb database throuth spark sql api(hive) or mysql api. > > 1.the sql used spark rdd+hbase scaner&nbsp; to visit hbase . > > 2.the sql`s condition (filter or group by agg) will predicate to hbase , > > 3.hbase used lucene index to filter data in region server. > > all of the spark,hbase,lucene is Embedded Integrated together,it is > > not&nbsp; a&nbsp; seperate cluster ,that is the different with solr/es + > > hbase+spark Solution. > > > > > > == Background == > > === Multiple copies of data === > > Apache HBase+Elastic Search is the most popular Solution on full text > > search ,but it`s weak on Online AnalyticalProcessing. > > so most of the time the Production System used spark(or hive or impala or > > presto) ,hbase,solr/es at the same time.Multiple copies of data are > stored > > in multiple systems,multiple systems has different Api .Data consistency > is > > difficult to guarantee.For the above reasons we merger > spark,hbase,elastic > > into one project .it`s target is used one copy of data,one cluster,one > api > > to solve olap,kv,full text...database scenarios. > > > > > > === Merging and splitting of lucene indexes(hstore) acrocess different > > machine on hdfs === > > As we all know solr/es store file in local fileSystem,it`s shard num must > > be a fix num,but if we store index on hdfs,the index can split able like > > hbase hstore,it can split or merge acorss machine nodes ,this is very > > usefull for distribute database ,it depend malloc how much resource on a > > table,most of time the records of a table is different by time by time so > > the num of shards always need adjust,if index store local it can`t split > > acroces throw different machine ,but lucene index store on hdfs it`s can > do > > it. > > whether the number of pieces can be flexibly adjusted, whether it has the > > ability of elastic scaling, in a distributed database is particularly > > important > > > > > > === solved Insufficient of&nbsp; secondary indexes === > > some people use hbase secondary index like Phoenix prjoect. but those > > programme base on the hbase rowkey has a lot of redundancy,He can't > create > > too many indexes,Data inflation rate is too high,so used lucene index > > instand of secondary is the best chooses. > > > > > > === we add an lucene index for spark olap=== > > Most of OLAP systems has violent scanning problems and Poor timeliness of > > data like hive,spark sql,impala or some of the mpp database. > > 1.They used violent scans to calculate the data.but another choice is add > > index to the big data.some of the time using index can greatly improve > the > > performance of the original brute force scanning. i think&nbsp; that just > > like the traditional database, indexing technology can greatly improve > the > > performance of the speed database. > > 2.Another problem of thoses database or system, Most of them are an > > offline system or batch system,lxdb `s target is realtime append > ,realtime > > kv update just like hbase. > > > > > > ==future== > > === lucene on parquet === > > recenetly i will change lucene&nbsp; tim,tip(invert index) ,dvd,dvm files > > to&nbsp; like parquet or orc format. > > To solve the performance problem of traversing Lucene index.To solve the > > problem that opening Lucene file needs to load files such as tip into > > memory, which leads to slow opening Lucene index file,To enable Lucene to > > store multi column joint index by column, which is used to handle some > > logic such as multi table join and materialized view ,mulity fields group > > by by invert index,The current Lucene index has many problems because of > > too many file pointers and single column problems,We want to modify > Lucene > > to make it more suitable for HDFS, not only for full-text retrieval, but > > also better at statistical analysis, which is a real database level > > index,We want Lucene to be splitable, which can separate storage from > > computation. > > > > > > ===&nbsp; supporting all kinds of Predicate pushdown calculation === > > We find that if we can combine the calculation method with the data > > closely, we can give more play to the performance of the database. Index > is > > only a way of calculating push down. For example, storage push down, we > can > > store the index on the SSD device, and the data part on the SATA device. > We > > can store the data that are often grouped together in advance, instead of > > calculating line by line, We can give important tables or columns to > > dedicated devices and resources, but these hbases are still lacking, > which > > we need to further improve > > > > > > === Distribution of intervention data === > > we can used row key to intervention data to different nodes ,it can do > > many interestest things > > > > > > === Resource control, resource isolation === > > lucene recent is not support resource isolation,but&nbsp; on hdfs&nbsp; > we > > can do it , I can control the priority of SQL so that Lucene with higher > > priority can get faster IO resources. > > > > > > == Status == > > since 2011 I released the first open source version on Alibaba&nbsp; ,At > > that time, mdrill used 10 nodes 48g machines to support 400 billion data. > > the first index on hdfs is from this version.it`s one year ahead of the > > community.&nbsp; https://github.com/alibaba/mdrill . > > > > > > since 2014 i stoped mdrill project update for the reason of i join into > > tencent . in our team we developed&nbsp; hermes project ,we also build > > lucene on hdfs , hermes now realtime import 1000 billion rows of data per > > day.It's the largest database I've ever developed , > > https://plus.tencent.com/bigdata/hermes > > > > > > since 2018 I set up my own company called luxin, Lu Xin is the Chinese > > pronunciation of Lucene. as a funs of lucene ,luxin company`s domain is > > lucene.xin ,mail domain is lucene.cn. > > luxin`s first version of lxdb is called lsql,it`s means lucene sql.&nbsp; > > it used lucene(2.5.3)+hdfs+spark(1.6.3),it is stable, about 200+ of > cluster > > use lsql. it`s process about 200 billions per day ,amount of 20000 > billions > > rows in one&nbsp; single cluster. (1000 nodes) > > > > > > since 2010 In the case of COVID-19 our team decide to developed the next > > generation of lsql called lxdb(lx=lucene pronunciation ). we add hbase to > > lsql To solve the update problem.nowadays we have finish the first > version > > of lxdb. https://github.com/lucene-cn/lxdb/wiki > > > > > > > > > > == Known Risks == > > ==Meritocracy == > > > > > > lxdb has been deployed in production and is applying more than 200 lines > > of business. It has demonstrated great performance benefits and has > proved > > to be a better way for reporting and analysis based big data. Still We > look > > forward to growing a rich user and developer community. > > === Orphaned products === > > > > > > The core developers currently work full-time for Luxin. > > lxdb is widely adopted by many companies and individuals. There's no > > realistic chance of it becoming orphaned. and we have a number of 1000 > > person tencent qq Instant messaging group > > > > > > === Inexperience with Open Source=== > > The core developers are all active users and followers of open source. > > They are already committers and contributors to the lxdb project.&nbsp; > > developed yannian mu has tens years on open source project,&nbsp; jstorm > > https://github.com/alibaba/jstorm and mdrill > > https://github.com/alibaba/mdrill > > > > > > > > > > === Homogenous Developers === > > > > > > The most of core developers are from luxin for the Closed source products > > reason, but when lxdb was open sourced, lxdb will received a lot of bug > > fixes and enhancements from other developers not working at luxin.Where > did > > you learn it from and where did you return it. > > > > > > > > > > ===Reliance on Salaried Developers === > > > > > > Lxin invested in lxdb as the&nbsp; solution and some of its key engineers > > are working full time on the project. In addition, since there is a > growing > > Big Data need for scalable solutions, we look forward to other Apache > > developers and researchers to contribute to the project. Also key to > > addressing the risk associated with relying on Salaried developers from a > > single entity is to increase the diversity of the contributors and > actively > > lobby , Apache lxdb intends to do this. > > > > > > === An Excessive Fascination with the Apache Brand === > > > > > > Lxdb is proposing to enter incubation at Apache in order to help efforts > > to diversify the committer-base, not so much to capitalize on the Apache > > brand. The Lxdb project is in production use already inside lxdb, but is > > not expected to be an lxdb product for external customers. As such, the > > lxdb project is not seeking to use the Apache brand as a marketing tool. > > > > > > > > > > === Documentation=== > > > > > > Information about Palo can be found at https://github.com/lucene-cn/lxdb > . > > The following links provide more information about lxdb in open source: > > > > > > * wiki site: https://github.com/lucene-cn/lxdb/wiki > > * Issue Tracking: https://github.com/lucene-cn/lxdb/issues > > * Overview: https://github.com/lucene-cn/lxdb/wiki/intro > > * lxin home page: http://www.lucene.xin > > * lsql document: http://docs.lucene.xin/lsql/v21/ > > > > > > ##Initial Source > > > > > > lxdb will development source code under an Apache license at > > https://github.com/lucene-cn/lxdb. > > > > > > > > > > === Core Developers === > > > > > > Currently most of the core developers of LXDB are working in the research > > Team of luxin. > > > > > > - yannian mu (dev) > > - yu chen (dev) > > - guangshi hao (dev) > > - wei sun (dev) > > - qihua zheng (dev) > > - xin wang (dev) > > - qingsong liu (dev) > > - anxing zhou (Tester) > > - jiajun duan (Tester) > > > > > > == External Dependencies == > > As all dependencies are managed using Apache Maven > > Dependency&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; License&nbsp; &nbsp; &nbsp; > > &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Optional? > > lucene&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; > > &nbsp; &nbsp; &nbsp; true > > zookeeper&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Apache License > 2.0&nbsp; > > &nbsp; &nbsp; &nbsp; &nbsp; true > > hbase&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Apache License 2.0&nbsp; > > &nbsp; &nbsp; &nbsp; &nbsp; true > > spark&nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; > true > > hadoop&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Apache > > License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; true > > hive&nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; > true > > > > > > == Required Resources == > > > > > > === Mailing lists === > > > > > > &nbsp;* lxdb-private (PMC discussion) > > &nbsp;* lxdb-dev (developer discussion) > > &nbsp;* lxdb-user (user discussion) > > &nbsp;* lxdb-commits (SCM commits) > > &nbsp;* lxdb-issues (JIRA issue feed) > > > > > > === Subversion Directory === > > > > > > Instead of subversion, LXDB prefers to git as source control > > management system: git://git.apache.org/lxdb > >