Hi, fp, Your email is hard to read. Please change to a normal mail client first. Back to your proposal, the key concern is not technology, but IPMC can not evaluate a project when we can see anything.
Thanks, Ming Wen, Apache APISIX PMC Chair Twitter: _WenMing f...@lucene.cn <f...@lucene.cn> 于2021年2月28日周日 下午9:02写道: > Hi Furkan Kamaci > > > Thank you for your proposal, I will start to improve and prepare > > > > > 1.Find an experienced mentor to guide you. > > > > todo > > > > 2.Start to translate your documentation to English. > > > > 3.Open source your project. How can we have a comment on your project if > > > > we cannot see anything about it? > > > > > > > > give me some time,I discussed with my team, my English is too poor. > > > > > > > > 4) Gain contributors to your project. At least you should show your > > > > intention to have committers/contributors out of your company. Eliminate > > > > the risk of being non-meritocratic management of the project. > > > > > > > > That's what I have to do > > > > > > > > 5) Structure your proposal. Explain why people need this project, which > > > > problems do current projects have and how you managed to handle them. We > > > > should understand is it a bundle of other projects, a completely new > > > > project, or a wrapper of other projects which eliminates the shortcomings > > > > of them. > > > > 6) Find a suitable name for your project in order to not try to solve > > > > trademark problems that may lose your time if you enter the incubation. > > > > > > > > ok i thike a new name ,for example like hydrogen sql > > > > > > > > > > > > > > > > f...@lucene.cn yannian mu > > > > > > > > From: Furkan KAMACI > > > > Date: 2021-02-28 18:51 > > > > To: general > > > > Subject: Re: [Proposal] lxdb - proposal for Apache Incubation > > > > Hi, > > > > > > > > Actually you have a detailed documentation which explains which approach > > > > you have compared to similar systems and performance metrics of following > > > > them i.e. reducing storage 10 to the 100 times or having low latency > > > > queries. > > > > > > > > My advices are (some of them are same with Sheng's and Liang's ): > > > > > > > > 1) Find an experienced mentor to guide you. > > > > > > > > 2) Start to translate your documentation to English. > > > > > > > > 3) Open source your project. How can we have a comment on your project if > > > > we cannot see anything about it? > > > > > > > > 4) Gain contributors to your project. At least you should show your > > > > intention to have committers/contributors out of your company. Eliminate > > > > the risk of being non-meritocratic management of the project. > > > > > > > > 5) Structure your proposal. Explain why people need this project, which > > > > problems do current projects have and how you managed to handle them. We > > > > should understand is it a bundle of other projects, a completely new > > > > project, or a wrapper of other projects which eliminates the shortcomings > > > > of them. > > > > > > > > 6) Find a suitable name for your project in order to not try to solve > > > > trademark problems that may lose your time if you enter the incubation. > > > > > > > > Kind Regards, > > > > Furkan KAMACI > > > > > > > > > > > > On Sun, Feb 28, 2021 at 1:02 PM Liang Chen <chenliang6...@gmail.com> > wrote: > > > > > > > > > Hi > > > > > > > > > > It would be better if you could find an experienced IPMC member to help > you > > > > > for preparing the proposal. > > > > > Based on Sheng Wu input, i have one more comment : can you please explain > > > > > what are the different with other similar data analysis DB? you can > > > > > consider explaining from use cases perspective. > > > > > > > > > > Regards > > > > > Liang > > > > > > > > > > > > > > > fp wrote > > > > > > Dear Apache Incubator Community, > > > > > > > > > > > > > > > > > > Please accept the following proposal for presentation and discussion: > > > > > > https://github.com/lucene-cn/lxdb/wiki > > > > > > > > > > > > > > > > > > LXDB is a high-performance,OLAP,full text search database.it`s base on > > > > > > hbase,but replaced hfile with lucene index to support more effective > > > > > > secondary indexes,it`s also base on spark sql,so that you can used sql > > > > > api > > > > > > to visit data and do olap calculate. and also the lucene index is store > > > > > on > > > > > > hdfs (not local disk). > > > > > > > > > > > > > > > > > > In our Production System, LXDB supported 200+ clusters,some of the > single > > > > > > cluster is 1000+ nodes,insert 200 billion rows per day ( 20000 > > > > > > billion rows for total), one of the biggest single table has 200million > > > > > > lucene index on LXDB. > > > > > > > > > > > > > > > > > > Hadoop`s father Doug Cutting cut nutch into HBase, MapReduce (hive), > > > > > HDFS, > > > > > > Lucene.We have merged these separated projects again,LXDB equals > > > > > > spark sql+hbase+lucene+parquet+hdfs,it is a super database.It took me > 10 > > > > > > years to complete these merging operations.But the purpose is no > longer a > > > > > > search engine, but a database. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards > > > > > > yannian mu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > LXDB Proposal > > > > > > == Abstract == > > > > > > LXDB is a high-performance,OLAP,full text search database. > > > > > > > > > > > > > > > > > > === it`s base on hbase,but replaced hfile with lucene index to support > > > > > > more effective secondary indexes.=== > > > > > > we modify hbase region server ,we change hfile to lucene,when put > > > > > > data we put document to lucene instande of put data to > hfile > > > > > > lucene index store on region server (it is not sote in > > > > > > different cluster like elstice search+hbase ,it takes to copy of data) > > > > > > > > > > > > > > > > > > === it`s base on spark sql for olap=== > > > > > > we Integrated spark and hbase together ,it`s useage like this , > > > > > > 1.unpackage lxdb.tar.gz > > > > > > 2.config hadoop_config path, > > > > > > 3.run start-all.sh to start cluster. > > > > > > lxdb can startup spark through hadoop yarn ,and then spark executor > > > > > > process Embedded start hbase region server service . > > > > > > > > > > > > > > > > > > you can operate lxdb database throuth spark sql api(hive) or mysql api. > > > > > > 1.the sql used spark rdd+hbase scaner to visit hbase . > > > > > > 2.the sql`s condition (filter or group by agg) will predicate to hbase > , > > > > > > 3.hbase used lucene index to filter data in region server. > > > > > > all of the spark,hbase,lucene is Embedded Integrated together,it is > > > > > > not a seperate cluster ,that is the different with solr/es > + > > > > > > hbase+spark Solution. > > > > > > > > > > > > > > > > > > == Background == > > > > > > === Multiple copies of data === > > > > > > Apache HBase+Elastic Search is the most popular Solution on full text > > > > > > search ,but it`s weak on Online AnalyticalProcessing. > > > > > > so most of the time the Production System used spark(or hive or impala > or > > > > > > presto) ,hbase,solr/es at the same time.Multiple copies of data are > > > > > stored > > > > > > in multiple systems,multiple systems has different Api .Data > consistency > > > > > > is difficult to guarantee.For the above reasons we merger > > > > > > spark,hbase,elastic into one project .it`s target is used one copy of > > > > > > data,one cluster,one api to solve olap,kv,full text...database > scenarios. > > > > > > > > > > > > > > > > > > === Merging and splitting of lucene indexes(hstore) acrocess different > > > > > > machine on hdfs === > > > > > > As we all know solr/es store file in local fileSystem,it`s shard num > must > > > > > > be a fix num,but if we store index on hdfs,the index can split able > like > > > > > > hbase hstore,it can split or merge acorss machine nodes ,this is very > > > > > > usefull for distribute database ,it depend malloc how much resource on > a > > > > > > table,most of time the records of a table is different by time by time > so > > > > > > the num of shards always need adjust,if index store local it can`t > split > > > > > > acroces throw different machine ,but lucene index store on hdfs it`s > can > > > > > > do it. > > > > > > whether the number of pieces can be flexibly adjusted, whether it has > the > > > > > > ability of elastic scaling, in a distributed database is particularly > > > > > > important > > > > > > > > > > > > > > > > > > > > > > > > === solved Insufficient of secondary indexes === > > > > > > some people use hbase secondary index like Phoenix prjoect. but those > > > > > > programme base on the hbase rowkey has a lot of redundancy,He can't > > > > > create > > > > > > too many indexes,Data inflation rate is too high,so used lucene index > > > > > > instand of secondary is the best chooses. > > > > > > > > > > > > > > > > > > === we add an lucene index for spark olap=== > > > > > > Most of OLAP systems has violent scanning problems and Poor timeliness > of > > > > > > data like hive,spark sql,impala or some of the mpp database. > > > > > > 1.They used violent scans to calculate the data.but another choice is > add > > > > > > index to the big data.some of the time using index can greatly improve > > > > > the > > > > > > performance of the original brute force scanning. i think that > just > > > > > > like the traditional database, indexing technology can greatly improve > > > > > the > > > > > > performance of the speed database. > > > > > > 2.Another problem of thoses database or system, Most of them are an > > > > > > offline system or batch system,lxdb `s target is realtime append > > > > > ,realtime > > > > > > kv update just like hbase. > > > > > > > > > > > > > > > > > > ==future== > > > > > > === lucene on parquet === > > > > > > recenetly i will change lucene tim,tip(invert index) ,dvd,dvm > files > > > > > > to like parquet or orc format. > > > > > > To solve the performance problem of traversing Lucene index.To solve > the > > > > > > problem that opening Lucene file needs to load files such as tip into > > > > > > memory, which leads to slow opening Lucene index file,To enable Lucene > to > > > > > > store multi column joint index by column, which is used to handle some > > > > > > logic such as multi table join and materialized view ,mulity fields > group > > > > > > by by invert index,The current Lucene index has many problems because > of > > > > > > too many file pointers and single column problems,We want to modify > > > > > Lucene > > > > > > to make it more suitable for HDFS, not only for full-text retrieval, > but > > > > > > also better at statistical analysis, which is a real database level > > > > > > index,We want Lucene to be splitable, which can separate storage from > > > > > > computation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > === supporting all kinds of Predicate pushdown > calculation === > > > > > > We find that if we can combine the calculation method with the data > > > > > > closely, we can give more play to the performance of the database. > Index > > > > > > is only a way of calculating push down. For example, storage push down, > > > > > we > > > > > > can store the index on the SSD device, and the data part on the SATA > > > > > > device. We can store the data that are often grouped together in > advance, > > > > > > instead of calculating line by line, We can give important tables or > > > > > > columns to dedicated devices and resources, but these hbases are still > > > > > > lacking, which we need to further improve > > > > > > > > > > > > > > > > > > === Distribution of intervention data === > > > > > > we can used row key to intervention data to different nodes ,it can do > > > > > > many interestest things > > > > > > > > > > > > > > > > > > === Resource control, resource isolation === > > > > > > lucene recent is not support resource isolation,but on hdfs > > > > > we > > > > > > can do it , I can control the priority of SQL so that Lucene with > higher > > > > > > priority can get faster IO resources. > > > > > > > > > > > > > > > > > > == Status == > > > > > > since 2011 I released the first open source version on Alibaba > ,At > > > > > > that time, mdrill used 10 nodes 48g machines to support 400 billion > data. > > > > > > the first index on hdfs is from this version.it`s one year ahead of > the > > > > > > community. https://github.com/alibaba/mdrill . > > > > > > > > > > > > > > > > > > since 2014 i stoped mdrill project update for the reason of i join into > > > > > > tencent . in our team we developed hermes project ,we also build > > > > > > lucene on hdfs , hermes now realtime import 1000 billion rows of data > per > > > > > > day.It's the largest database I've ever developed , > > > > > > https://plus.tencent.com/bigdata/hermes > > > > > > > > > > > > > > > > > > since 2018 I set up my own company called luxin, Lu Xin is the Chinese > > > > > > pronunciation of Lucene. as a funs of lucene ,luxin company`s domain is > > > > > > lucene.xin ,mail domain is lucene.cn. > > > > > > luxin`s first version of lxdb is called lsql,it`s means lucene > sql. > > > > > > it used lucene(2.5.3)+hdfs+spark(1.6.3),it is stable, about 200+ of > > > > > > cluster use lsql. it`s process about 200 billions per day ,amount of > > > > > 20000 > > > > > > billions rows in one single cluster. (1000 nodes) > > > > > > > > > > > > > > > > > > since 2010 In the case of COVID-19 our team decide to developed the > next > > > > > > generation of lsql called lxdb(lx=lucene pronunciation ). we add > > > > > > hbase to lsql To solve the update problem.nowadays we have finish the > > > > > > first version of lxdb. https://github.com/lucene-cn/lxdb/wiki > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > == Known Risks == > > > > > > ==Meritocracy == > > > > > > > > > > > > > > > > > > lxdb has been deployed in production and is applying more than 200 > lines > > > > > > of business. It has demonstrated great performance benefits and has > > > > > proved > > > > > > to be a better way for reporting and analysis based big data. Still We > > > > > > look forward to growing a rich user and developer community. > > > > > > > > > > > > > > > > > > === Orphaned products === > > > > > > > > > > > > > > > > > > The core developers currently work full-time for Luxin. > > > > > > lxdb is widely adopted by many companies and individuals. There's no > > > > > > realistic chance of it becoming orphaned. and we have a number of 1000 > > > > > > person tencent qq Instant messaging group > > > > > > > > > > > > > > > > > > > > > > > > === Inexperience with Open Source=== > > > > > > > > > > > > The core developers are all active users and followers of open source. > > > > > > They are already committers and contributors to the lxdb project. > > > > > > developed yannian mu has tens years on open source project, > jstorm > > > > > > https://github.com/alibaba/jstorm and > > > > > > mdrill https://github.com/alibaba/mdrill > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > === Homogenous Developers === > > > > > > > > > > > > > > > > > > The most of core developers are from luxin for the Closed source > products > > > > > > reason, but when lxdb was open sourced, lxdb will received a lot of bug > > > > > > fixes and enhancements from other developers not working at luxin.Where > > > > > > did you learn it from and where did you return it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ===Reliance on Salaried Developers === > > > > > > > > > > > > > > > > > > Lxin invested in lxdb as the solution and some of its key > engineers > > > > > > are working full time on the project. In addition, since there is a > > > > > > growing Big Data need for scalable solutions, we look forward to other > > > > > > Apache developers and researchers to contribute to the project. Also > key > > > > > > to addressing the risk associated with relying on Salaried developers > > > > > from > > > > > > a single entity is to increase the diversity of the contributors and > > > > > > actively lobby , Apache lxdb intends to do this. > > > > > > > > > > > > > > > > > > === An Excessive Fascination with the Apache Brand === > > > > > > > > > > > > > > > > > > Lxdb is proposing to enter incubation at Apache in order to help > efforts > > > > > > to diversify the committer-base, not so much to capitalize on the > Apache > > > > > > brand. The Lxdb project is in production use already inside lxdb, but > is > > > > > > not expected to be an lxdb product for external customers. As such, the > > > > > > lxdb project is not seeking to use the Apache brand as a marketing > tool. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > === Documentation=== > > > > > > > > > > > > > > > > > > Information about Palo can be found at > https://github.com/lucene-cn/lxdb > > > > > . > > > > > > The following links provide more information about lxdb in open source: > > > > > > > > > > > > > > > > > > * wiki site: https://github.com/lucene-cn/lxdb/wiki > > > > > > * Issue Tracking: https://github.com/lucene-cn/lxdb/issues > > > > > > * Overview: https://github.com/lucene-cn/lxdb/wiki/intro > > > > > > * lxin home page: http://www.lucene.xin > > > > > > > > > > > > * lsql document: http://docs.lucene.xin/lsql/v21/ > > > > > > > > > > > > > > > > > > > > > > > > ##Initial Source > > > > > > > > > > > > > > > > > > lxdb will development source code under an Apache license at > > > > > > https://github.com/lucene-cn/lxdb. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > === Core Developers === > > > > > > > > > > > > > > > > > > > > > > > > Currently most of the core developers of LXDB are working in the > research > > > > > > Team of luxin. > > > > > > > > > > > > > > > > > > - yannian mu (dev) > > > > > > - yu chen (dev) > > > > > > - guangshi hao (dev) > > > > > > - wei sun (dev) > > > > > > - qihua zheng (dev) > > > > > > - xin wang (dev) > > > > > > - qingsong liu (dev) > > > > > > - anxing zhou (Tester) > > > > > > - jiajun duan (Tester) > > > > > > > > > > > > > > > > > > > > > > > > == External Dependencies == > > > > > > > > > > > > As all dependencies are managed using Apache Maven > > > > > > Dependency License > > > > > > > Optional? > > > > > > lucene Apache License 2.0 > > > > > > true > > > > > > zookeeper Apache License > > > > > 2.0 > > > > > > true > > > > > > hbase Apache License 2.0 > > > > > > true > > > > > > spark Apache License 2.0 > > > > > > true > > > > > > hadoop Apache > > > > > > License 2.0 true > > > > > > hive Apache License 2.0 > > > > > true > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > == Required Resources == > > > > > > > > > > > > > > > > > > === Mailing lists === > > > > > > > > > > > > > > > > > > * lxdb-private (PMC discussion) > > > > > > * lxdb-dev (developer discussion) > > > > > > * lxdb-user (user discussion) > > > > > > * lxdb-commits (SCM commits) > > > > > > * lxdb-issues (JIRA issue feed) > > > > > > > > > > > > > > > > > > === Subversion Directory === > > > > > > > > > > > > > > > > > > Instead of subversion, LXDB prefers to git as source control > > > > > > management system: git://git.apache.org/lxdb > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sent from: http://apache-incubator-general.996316.n3.nabble.com/ > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > > > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > > > > > >