Re: [Proposal] lxdb - proposal for Apache Incubation

Sheng Wu Sat, 27 Feb 2021 06:50:26 -0800

I forwarded the private reply to the mail list.
But deleted his cellphone number for privacy protection.


Sheng Wu 吴晟
Twitter, wusheng1108


fp <f...@lucene.cn> 于2021年2月27日周六 下午10:00写道：

> sorry 我修改下
>
> Hi 吴晟
> Thank you for your reply,In response to your question, my answers are as 
> follows.(我英语不怎么好请您多多包涵.)
>
> 1.Since you are proposing a new project to a global foundation, you should at
> least keep your documentation in English.
> >Of course, if Apache accepts this project, I will complete all the documents 
> >and translate them into English. Although my English is not very good, many 
> >of our company come back from Australia. This should not be a problem
> 2:Your provided links are Chinese,which for most IPMC people, it is not 
> readable.
> >In addition to the source code, what other documents are needed? Do you want 
> >me to provide some basic project use or introduction first?
> 3:And since this project is close-source, please provide the dependencies.
> >The version to be open source is 100% rewritten. It relies on Hadoop, HBase, 
> >spark, zookeeper, and does not rely on any code from my previous company
> 4:And as you repeated said the original projects, is this project created 
> 100% on your own, is it including something from Alibaba/Tencent?
> >the current version of lxdb is 100% created on my own . it isn`t including 
> >anything form Alibaba/Tencent.
> >The previous version of lxdb relies on the mdrill of Alibaba. I am the 
> >author of mdrill project and mdrill is an open source project.
> >About Tencent Hermes is my work in Tencent, but after I started my business, 
> >I didn't use the source code of Hermes, and I informed Tencent before I 
> >started my business
> 5:As there is no open-source, I can't verify.
> >If you are interested, I can provide the source code to PMC members 
> >separately for auditing
> 6:Due to this is close-source, we also need you to be clear about whether you
> are going to submit SGA and open source to the public.
> >I haven't open source the project yet, mainly to see if PMC is interested in 
> >my project. If interested, I will open source. In this way, I can persuade 
> >my investors. If PMC is not interested, I may consider opening source later. 
> >At present, the project has about 100000 lines of code, which can be 
> >provided to PMC for review
> 7:The most important, `lucene` is an Apache trademark and Apache project,this 
> makes me have concerns about the branding violation.
> >I just like Lucene. If the name offends PMC, I can correct it for the right 
> >name.
> 8:At last, typically, we(incubator) expect you to have open-sourced the 
> project, and at least have a small community and first adoption out of your 
> company.
> Our company is a commercial company. The community of previous projects here 
> may be different from what you said. We have organized a QQ communication 
> group with about 1000 people. Many students here have been our users for many 
> years, and they are looking forward to the development of our project
> 9:To join the incubator, you also need at least 3 IPMC members and 1 
> Champion(Apache member or officer) to help you understand the incubator.
> Can you help me? I really have language problems. There is less communication 
> in this area. I have done a lot of sharing in China before. I hope you can 
> help me if you can
> my telnum is ------
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "fp" <f...@lucene.cn>;
> *发送时间:* 2021年2月27日(星期六) 晚上9:57
> *收件人:* "wu.sheng.841108"<wu.sheng.841...@gmail.com>;
> *主题:* 回复： [Proposal] lxdb - proposal for Apache Incubation
>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "fp" <f...@lucene.cn>;
> *发送时间:* 2021年2月27日(星期六) 晚上9:55
> *收件人:* "Incubator"<general@incubator.apache.org>;
> *主题:* 回复： [Proposal] lxdb - proposal for Apache Incubation
>
> Hi 吴晟
> Thank you for your reply,In response to your question, my answers are as
> follows.(我英语不怎么好请您多多包涵.)
>
> 1.Since you are proposing a new project to a global foundation, you should
> at
> least keep your documentation in English.
> >Of course, if Apache accepts this project, I will complete all the
> documents and translate them into English. Although my English is not very
> good, many of our company come back from Australia. This should not be a
> problem
> 2:Your provided links are Chinese,which for most IPMC people, it is not
> readable.
> >In addition to the source code, what other documents are needed? Do you
> want me to provide some basic project use or introduction first?
> 3:And since this project is close-source, please provide the dependencies.
> >The version to be open source is 100% rewritten. It relies on Hadoop,
> HBase, spark, zookeeper, and does not rely on any code from my previous
> company
> 4:And as you repeated said the original projects, is this project created
> 100% on your own, is it including something from Alibaba/Tencent?
> >the current version of lxdb is 100% created on my own . it isn`t
> including anything form Alibaba/Tencent.
> >The previous version of lxdb relies on the mdrill of Alibaba. I am the
> author of mdrill project and mdrill is an open source project.
> >About Tencent Hermes is my work in Tencent, but after I started my
> business, I didn't use the source code of Hermes, and I informed Tencent
> before I started my business
> 5:As there is no open-source, I can't verify.
> >If you are interested, I can provide the source code to PMC members
> separately for auditing
> 6:Due to this is close-source, we also need you to be clear about whether
> you
> are going to submit SGA and open source to the public.
> >I haven't open source the project yet, mainly to see if PMC is interested
> in my project. If interested, I will open source. In this way, I can
> persuade my investors. If PMC is not interested, I may consider opening
> source later. At present, the project has about 100000 lines of code, which
> can be provided to PMC for review
> 7:The most important, `lucene` is an Apache trademark and Apache
> project,this makes me have concerns about the branding violation.
> >I just like Lucene. If the name offends PMC, I can correct it for the
> right name.
> 8:At last, typically, we(incubator) expect you to have open-sourced the
> project, and at least have a small community and first adoption out of your
> company.
> Our company is a commercial company. The community of previous projects
> here may be different from what you said. We have organized a QQ
> communication group with about 1000 people. Many students here have been
> our users for many years, and they are looking forward to the development
> of our project
> 9:To join the incubator, you also need at least 3 IPMC members and 1
> Champion(Apache member or officer) to help you understand the incubator.
> Can you help me? I really have language problems. There is less
> communication in this area. I have done a lot of sharing in China before. I
> hope you can help me if you can.If you like this project, you can also join
> us. It's a very good opportunity in China's database market
> my telnum is 17099831107
>
>
> yannian mu 母延年
> luxin,muyannian
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "general" <wu.sheng.841...@gmail.com>;
> *发送时间:* 2021年2月27日(星期六) 晚上9:06
> *收件人:* "Incubator"<general@incubator.apache.org>;
> *主题:* Re: [Proposal] lxdb - proposal for Apache Incubation
>
> Hi
>
> Since you are proposing a new project to a global foundation, you should at
> least keep your documentation in English. Your provided links are Chinese,
> which for most IPMC people, it is not readable.
> And since this project is close-source, please provide the dependencies.
> And as you repeated said the original projects, is this project created
> 100% on your own, is it including something from Alibaba/Tencent? As there
> is no open-source, I can't verify.
> Due to this is close-source, we also need you to be clear about whether you
> are going to submit SGA and open source to the public.
>
> The most important, `lucene` is an Apache trademark and Apache project,
> this makes me have concerns about the branding violation.
>
> At last, typically, we(incubator) expect you to have open-sourced the
> project, and at least have a small community and first adoption out of your
> company.
>
> To join the incubator, you also need at least 3 IPMC members and 1
> Champion(Apache member or officer) to help you understand the incubator.
>
> Sheng Wu 吴晟
> Twitter, wusheng1108
>
>
> fp <f...@lucene.cn> 于2021年2月27日周六 下午6:40写道：
>
> > Dear Apache Incubator Community,
> >
> >
> > Please accept the following proposal for presentation and discussion:
> > https://github.com/lucene-cn/lxdb/wiki
> >
> >
> > LXDB is a high-performance,OLAP,full text search database.it`s base on
> > hbase,but replaced hfile with lucene index to support more effective
> > secondary indexes,it`s also base on spark sql,so that you can used sql
> api
> > to visit data and do olap calculate. and also the lucene index is store
> on
> > hdfs (not local disk).
> >
> >
> > In our Production System, LXDB supported 200+ clusters,some of the single
> > cluster is 1000+ nodes,insert 200 billion rows&nbsp; per day ( 20000
> > billion rows for total), one of the biggest single table has 200million
> > lucene index on LXDB.
> >
> >
> > Hadoop`s father Doug Cutting cut nutch into HBase, MapReduce (hive),
> HDFS,
> > Lucene.We have merged these separated projects again,LXDB equals spark
> > sql+hbase+lucene+parquet+hdfs,it is a super database.It took me 10 years
> to
> > complete these merging operations.But the purpose is no longer a search
> > engine, but a database.
> >
> >
> >
> >
> > Best regards
> > &nbsp; yannian mu
> >
> >
> >
> >
> > LXDB Proposal
> > == Abstract ==
> > LXDB is a high-performance,OLAP,full text search database.
> >
> >
> > === it`s base on hbase,but replaced hfile with lucene index to support
> > more effective secondary indexes.===
> > we modify hbase region server ,we&nbsp; change hfile to lucene,when put
> > data we put&nbsp; document to lucene instande of&nbsp; put data to hfile
> > lucene index store on region server&nbsp; (it is not sote in different
> > cluster like elstice search+hbase ,it takes to copy of data)
> >
> >
> > === it`s base on spark sql for olap===
> > we Integrated spark and hbase together ,it`s useage like this ,
> > 1.unpackage lxdb.tar.gz
> > 2.config hadoop_config path,
> > 3.run start-all.sh to start cluster.
> > lxdb can startup spark through hadoop yarn ,and then spark executor
> > process Embedded start hbase region server service .
> >
> >
> > you can operate lxdb database throuth spark sql api(hive) or mysql api.
> > 1.the sql used spark rdd+hbase scaner&nbsp; to visit hbase .
> > 2.the sql`s condition (filter or group by agg) will predicate to hbase ,
> > 3.hbase used lucene index to filter data in region server.
> > all of the spark,hbase,lucene is Embedded Integrated together,it is
> > not&nbsp; a&nbsp; seperate cluster ,that is the different with solr/es +
> > hbase+spark Solution.
> >
> >
> > == Background ==
> > === Multiple copies of data ===
> > Apache HBase+Elastic Search is the most popular Solution on full text
> > search ,but it`s weak on Online AnalyticalProcessing.
> > so most of the time the Production System used spark(or hive or impala or
> > presto) ,hbase,solr/es at the same time.Multiple copies of data are
> stored
> > in multiple systems,multiple systems has different Api .Data consistency
> is
> > difficult to guarantee.For the above reasons we merger
> spark,hbase,elastic
> > into one project .it`s target is used one copy of data,one cluster,one
> api
> > to solve olap,kv,full text...database scenarios.
> >
> >
> > === Merging and splitting of lucene indexes(hstore) acrocess different
> > machine on hdfs ===
> > As we all know solr/es store file in local fileSystem,it`s shard num must
> > be a fix num,but if we store index on hdfs,the index can split able like
> > hbase hstore,it can split or merge acorss machine nodes ,this is very
> > usefull for distribute database ,it depend malloc how much resource on a
> > table,most of time the records of a table is different by time by time so
> > the num of shards always need adjust,if index store local it can`t split
> > acroces throw different machine ,but lucene index store on hdfs it`s can
> do
> > it.
> > whether the number of pieces can be flexibly adjusted, whether it has the
> > ability of elastic scaling, in a distributed database is particularly
> > important
> >
> >
> > === solved Insufficient of&nbsp; secondary indexes ===
> > some people use hbase secondary index like Phoenix prjoect. but those
> > programme base on the hbase rowkey has a lot of redundancy,He can't
> create
> > too many indexes,Data inflation rate is too high,so used lucene index
> > instand of secondary is the best chooses.
> >
> >
> > === we add an lucene index for spark olap===
> > Most of OLAP systems has violent scanning problems and Poor timeliness of
> > data like hive,spark sql,impala or some of the mpp database.
> > 1.They used violent scans to calculate the data.but another choice is add
> > index to the big data.some of the time using index can greatly improve
> the
> > performance of the original brute force scanning. i think&nbsp; that just
> > like the traditional database, indexing technology can greatly improve
> the
> > performance of the speed database.
> > 2.Another problem of thoses database or system, Most of them are an
> > offline system or batch system,lxdb `s target is realtime append
> ,realtime
> > kv update just like hbase.
> >
> >
> > ==future==
> > === lucene on parquet ===
> > recenetly i will change lucene&nbsp; tim,tip(invert index) ,dvd,dvm files
> > to&nbsp; like parquet or orc format.
> > To solve the performance problem of traversing Lucene index.To solve the
> > problem that opening Lucene file needs to load files such as tip into
> > memory, which leads to slow opening Lucene index file,To enable Lucene to
> > store multi column joint index by column, which is used to handle some
> > logic such as multi table join and materialized view ,mulity fields group
> > by by invert index,The current Lucene index has many problems because of
> > too many file pointers and single column problems,We want to modify
> Lucene
> > to make it more suitable for HDFS, not only for full-text retrieval, but
> > also better at statistical analysis, which is a real database level
> > index,We want Lucene to be splitable, which can separate storage from
> > computation.
> >
> >
> > ===&nbsp; supporting all kinds of Predicate pushdown calculation ===
> > We find that if we can combine the calculation method with the data
> > closely, we can give more play to the performance of the database. Index
> is
> > only a way of calculating push down. For example, storage push down, we
> can
> > store the index on the SSD device, and the data part on the SATA device.
> We
> > can store the data that are often grouped together in advance, instead of
> > calculating line by line, We can give important tables or columns to
> > dedicated devices and resources, but these hbases are still lacking,
> which
> > we need to further improve
> >
> >
> > === Distribution of intervention data ===
> > we can used row key to intervention data to different nodes ,it can do
> > many interestest things
> >
> >
> > === Resource control, resource isolation ===
> > lucene recent is not support resource isolation,but&nbsp; on hdfs&nbsp;
> we
> > can do it , I can control the priority of SQL so that Lucene with higher
> > priority can get faster IO resources.
> >
> >
> > == Status ==
> > since 2011 I released the first open source version on Alibaba&nbsp; ,At
> > that time, mdrill used 10 nodes 48g machines to support 400 billion data.
> > the first index on hdfs is from this version.it`s one year ahead of the
> > community.&nbsp; https://github.com/alibaba/mdrill .
> >
> >
> > since 2014 i stoped mdrill project update for the reason of i join into
> > tencent . in our team we developed&nbsp; hermes project ,we also build
> > lucene on hdfs , hermes now realtime import 1000 billion rows of data per
> > day.It's the largest database I've ever developed ,
> > https://plus.tencent.com/bigdata/hermes
> >
> >
> > since 2018 I set up my own company called luxin, Lu Xin is the Chinese
> > pronunciation of Lucene. as a funs of lucene ,luxin company`s domain is
> > lucene.xin ,mail domain is lucene.cn.
> > luxin`s first version of lxdb is called lsql,it`s means lucene sql.&nbsp;
> > it used lucene(2.5.3)+hdfs+spark(1.6.3),it is stable, about 200+ of
> cluster
> > use lsql. it`s process about 200 billions per day ,amount of 20000
> billions
> > rows in one&nbsp; single cluster. (1000 nodes)
> >
> >
> > since 2010 In the case of COVID-19 our team decide to developed the next
> > generation of lsql called lxdb(lx=lucene pronunciation ). we add hbase to
> > lsql To solve the update problem.nowadays we have finish the first
> version
> > of lxdb. https://github.com/lucene-cn/lxdb/wiki
> >
> >
> >
> >
> > == Known Risks ==
> > ==Meritocracy ==
> >
> >
> > lxdb has been deployed in production and is applying more than 200 lines
> > of business. It has demonstrated great performance benefits and has
> proved
> > to be a better way for reporting and analysis based big data. Still We
> look
> > forward to growing a rich user and developer community.
> > === Orphaned products ===
> >
> >
> > The core developers currently work full-time for Luxin.
> > lxdb is widely adopted by many companies and individuals. There's no
> > realistic chance of it becoming orphaned. and we have a number of 1000
> > person tencent qq Instant messaging group
> >
> >
> > === Inexperience with Open Source===
> > The core developers are all active users and followers of open source.
> > They are already committers and contributors to the lxdb project.&nbsp;
> > developed yannian mu has tens years on open source project,&nbsp; jstorm
> > https://github.com/alibaba/jstorm and mdrill
> > https://github.com/alibaba/mdrill
> >
> >
> >
> >
> > === Homogenous Developers ===
> >
> >
> > The most of core developers are from luxin for the Closed source products
> > reason, but when lxdb was open sourced, lxdb will received a lot of bug
> > fixes and enhancements from other developers not working at luxin.Where
> did
> > you learn it from and where did you return it.
> >
> >
> >
> >
> > ===Reliance on Salaried Developers ===
> >
> >
> > Lxin invested in lxdb as the&nbsp; solution and some of its key engineers
> > are working full time on the project. In addition, since there is a
> growing
> > Big Data need for scalable solutions, we look forward to other Apache
> > developers and researchers to contribute to the project. Also key to
> > addressing the risk associated with relying on Salaried developers from a
> > single entity is to increase the diversity of the contributors and
> actively
> > lobby , Apache lxdb intends to do this.
> >
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> >
> > Lxdb is proposing to enter incubation at Apache in order to help efforts
> > to diversify the committer-base, not so much to capitalize on the Apache
> > brand. The Lxdb project is in production use already inside lxdb, but is
> > not expected to be an lxdb product for external customers. As such, the
> > lxdb project is not seeking to use the Apache brand as a marketing tool.
> >
> >
> >
> >
> > === Documentation===
> >
> >
> > Information about Palo can be found at https://github.com/lucene-cn/lxdb
> .
> > The following links provide more information about lxdb in open source:
> >
> >
> > * wiki site: https://github.com/lucene-cn/lxdb/wiki
> > * Issue Tracking: https://github.com/lucene-cn/lxdb/issues
> > * Overview: https://github.com/lucene-cn/lxdb/wiki/intro
> > * lxin home page: http://www.lucene.xin
> > * lsql document: http://docs.lucene.xin/lsql/v21/
> >
> >
> > ##Initial Source
> >
> >
> > lxdb will development source code under an Apache license at
> > https://github.com/lucene-cn/lxdb.
> >
> >
> >
> >
> > === Core Developers ===
> >
> >
> > Currently most of the core developers of LXDB are working in the research
> > Team of luxin.
> >
> >
> > - yannian mu (dev)
> > - yu chen (dev)
> > - guangshi hao (dev)
> > - wei sun (dev)
> > - qihua zheng (dev)
> > - xin wang (dev)
> > - qingsong liu (dev)
> > - anxing zhou (Tester)
> > - jiajun duan (Tester)
> >
> >
> > == External Dependencies ==
> > As all dependencies are managed using Apache Maven
> > Dependency&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; License&nbsp; &nbsp; &nbsp;
> > &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Optional?
> > lucene&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp;
> > &nbsp; &nbsp; &nbsp; true
> > zookeeper&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Apache License
> 2.0&nbsp;
> > &nbsp; &nbsp; &nbsp; &nbsp; true
> > hbase&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Apache License 2.0&nbsp;
> > &nbsp; &nbsp; &nbsp; &nbsp; true
> > spark&nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
> true
> > hadoop&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Apache
> > License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; true
> > hive&nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
> true
> >
> >
> > == Required Resources ==
> >
> >
> > === Mailing lists ===
> >
> >
> > &nbsp;* lxdb-private (PMC discussion)
> > &nbsp;* lxdb-dev (developer discussion)
> > &nbsp;* lxdb-user (user discussion)
> > &nbsp;* lxdb-commits (SCM commits)
> > &nbsp;* lxdb-issues (JIRA issue feed)
> >
> >
> > === Subversion Directory ===
> >
> >
> > Instead of subversion, LXDB prefers to git as source control
> > management system: git://git.apache.org/lxdb
>
>

Re: [Proposal] lxdb - proposal for Apache Incubation

Reply via email to