Hi, fp,
Your email is hard to read.
Please change to a normal mail client first.
Back to your proposal, the key concern is not technology, but IPMC can not
evaluate a project when we can see anything.

Thanks,
Ming Wen, Apache APISIX PMC Chair
Twitter: _WenMing


f...@lucene.cn <f...@lucene.cn> 于2021年2月28日周日 下午9:02写道:

> Hi Furkan Kamaci
>
>
> Thank you for your proposal, I will start to improve and prepare
>
>
>
>
> 1.Find an experienced mentor to guide you.
>
>
>
>      todo
>
>
>
> 2.Start to translate your documentation to English.
>
>
>
> 3.Open source your project. How can we have a comment on your project if
>
>
>
> we cannot see anything about it?
>
>
>
>
>
>
>
>      give me some time,I discussed with my team, my English is too poor.
>
>
>
>
>
>
>
> 4) Gain contributors to your project. At least you should show your
>
>
>
> intention to have committers/contributors out of your company. Eliminate
>
>
>
> the risk of being non-meritocratic management of the project.
>
>
>
>
>
>
>
> That's what I have to do
>
>
>
>
>
>
>
> 5) Structure your proposal. Explain why people need this project, which
>
>
>
> problems do current projects have and how you managed to handle them. We
>
>
>
> should understand is it a bundle of other projects, a completely new
>
>
>
> project, or a wrapper of other projects which eliminates the shortcomings
>
>
>
> of them.
>
>
>
> 6) Find a suitable name for your project in order to not try to solve
>
>
>
> trademark problems that may lose your time if you enter the incubation.
>
>
>
>
>
>
>
> ok i thike a new name ,for example like hydrogen sql
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> f...@lucene.cn  yannian mu
>
>
>
>
>
>
>
> From: Furkan KAMACI
>
>
>
> Date: 2021-02-28 18:51
>
>
>
> To: general
>
>
>
> Subject: Re: [Proposal] lxdb - proposal for Apache Incubation
>
>
>
> Hi,
>
>
>
>
>
>
>
> Actually you have a detailed documentation which explains which approach
>
>
>
> you have compared to similar systems and performance metrics of following
>
>
>
> them i.e. reducing storage 10 to the 100 times or having low latency
>
>
>
> queries.
>
>
>
>
>
>
>
> My advices are (some of them are same with Sheng's and Liang's ):
>
>
>
>
>
>
>
> 1) Find an experienced mentor to guide you.
>
>
>
>
>
>
>
> 2) Start to translate your documentation to English.
>
>
>
>
>
>
>
> 3) Open source your project. How can we have a comment on your project if
>
>
>
> we cannot see anything about it?
>
>
>
>
>
>
>
> 4) Gain contributors to your project. At least you should show your
>
>
>
> intention to have committers/contributors out of your company. Eliminate
>
>
>
> the risk of being non-meritocratic management of the project.
>
>
>
>
>
>
>
> 5) Structure your proposal. Explain why people need this project, which
>
>
>
> problems do current projects have and how you managed to handle them. We
>
>
>
> should understand is it a bundle of other projects, a completely new
>
>
>
> project, or a wrapper of other projects which eliminates the shortcomings
>
>
>
> of them.
>
>
>
>
>
>
>
> 6) Find a suitable name for your project in order to not try to solve
>
>
>
> trademark problems that may lose your time if you enter the incubation.
>
>
>
>
>
>
>
> Kind Regards,
>
>
>
> Furkan KAMACI
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Feb 28, 2021 at 1:02 PM Liang Chen <chenliang6...@gmail.com>
> wrote:
>
>
>
>
>
>
>
> > Hi
>
>
>
> >
>
>
>
> > It would be better if you could find an experienced IPMC member to help
> you
>
>
>
> > for preparing the proposal.
>
>
>
> > Based on Sheng Wu input, i have one more comment : can you please explain
>
>
>
> > what are the different with other similar data analysis DB?  you can
>
>
>
> > consider explaining from use cases perspective.
>
>
>
> >
>
>
>
> > Regards
>
>
>
> > Liang
>
>
>
> >
>
>
>
> >
>
>
>
> > fp wrote
>
>
>
> > > Dear Apache Incubator Community,
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Please accept the following proposal for presentation and discussion:
>
>
>
> > > https://github.com/lucene-cn/lxdb/wiki
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > LXDB is a high-performance,OLAP,full text search database.it`s base on
>
>
>
> > > hbase,but replaced hfile with lucene index to support more effective
>
>
>
> > > secondary indexes,it`s also base on spark sql,so that you can used sql
>
>
>
> > api
>
>
>
> > > to visit data and do olap calculate. and also the lucene index is store
>
>
>
> > on
>
>
>
> > > hdfs (not local disk).
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > In our Production System, LXDB supported 200+ clusters,some of the
> single
>
>
>
> > > cluster is 1000+ nodes,insert 200 billion rows&nbsp; per day ( 20000
>
>
>
> > > billion rows for total), one of the biggest single table has 200million
>
>
>
> > > lucene index on LXDB.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Hadoop`s father Doug Cutting cut nutch into HBase, MapReduce (hive),
>
>
>
> > HDFS,
>
>
>
> > > Lucene.We have merged these separated projects again,LXDB&nbsp;equals
>
>
>
> > > spark sql+hbase+lucene+parquet+hdfs,it is a super database.It took me
> 10
>
>
>
> > > years to complete these merging operations.But the purpose is no
> longer a
>
>
>
> > > search engine, but a database.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Best regards
>
>
>
> > > &nbsp; yannian mu
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > LXDB Proposal
>
>
>
> > > == Abstract ==
>
>
>
> > > LXDB is a high-performance,OLAP,full text search database.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === it`s base on hbase,but replaced hfile with lucene index to support
>
>
>
> > > more effective secondary indexes.===&nbsp;
>
>
>
> > > we modify hbase region server ,we&nbsp; change hfile to lucene,when put
>
>
>
> > > data we put&nbsp; document to lucene instande of&nbsp; put data to
> hfile
>
>
>
> > > lucene index store on region server&nbsp;&nbsp;(it is not sote in
>
>
>
> > > different cluster like elstice search+hbase ,it takes to copy of data)
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === it`s base on spark sql for olap===&nbsp;
>
>
>
> > > we Integrated spark and hbase together ,it`s useage like this ,
>
>
>
> > > 1.unpackage lxdb.tar.gz&nbsp;
>
>
>
> > > 2.config hadoop_config path,
>
>
>
> > > 3.run start-all.sh to start cluster.&nbsp;
>
>
>
> > > lxdb can startup spark through hadoop yarn ,and then spark executor
>
>
>
> > > process Embedded start hbase region server service .&nbsp;
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > you can operate lxdb database throuth spark sql api(hive) or mysql api.
>
>
>
> > > 1.the sql used spark rdd+hbase scaner&nbsp; to visit hbase .
>
>
>
> > > 2.the sql`s condition (filter or group by agg) will predicate to hbase
> ,
>
>
>
> > > 3.hbase used lucene index to filter data in region server.
>
>
>
> > > all of the spark,hbase,lucene is Embedded Integrated together,it is
>
>
>
> > > not&nbsp; a&nbsp; seperate cluster ,that is the different with solr/es
> +
>
>
>
> > > hbase+spark Solution.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > == Background ==
>
>
>
> > > === Multiple copies of data ===
>
>
>
> > > Apache HBase+Elastic Search is the most popular Solution on full text
>
>
>
> > > search ,but it`s weak on Online AnalyticalProcessing.
>
>
>
> > > so most of the time the Production System used spark(or hive or impala
> or
>
>
>
> > > presto) ,hbase,solr/es at the same time.Multiple copies of data are
>
>
>
> > stored
>
>
>
> > > in multiple systems,multiple systems has different Api .Data
> consistency
>
>
>
> > > is difficult to guarantee.For the above reasons we merger
>
>
>
> > > spark,hbase,elastic into one project .it`s target is used one copy of
>
>
>
> > > data,one cluster,one api to solve olap,kv,full text...database
> scenarios.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Merging and splitting of lucene indexes(hstore) acrocess different
>
>
>
> > > machine on hdfs ===
>
>
>
> > > As we all know solr/es store file in local fileSystem,it`s shard num
> must
>
>
>
> > > be a fix num,but if we store index on hdfs,the index can split able
> like
>
>
>
> > > hbase hstore,it can split or merge acorss machine nodes ,this is very
>
>
>
> > > usefull for distribute database ,it depend malloc how much resource on
> a
>
>
>
> > > table,most of time the records of a table is different by time by time
> so
>
>
>
> > > the num of shards always need adjust,if index store local it can`t
> split
>
>
>
> > > acroces throw different machine ,but lucene index store on hdfs it`s
> can
>
>
>
> > > do it.
>
>
>
> > > whether the number of pieces can be flexibly adjusted, whether it has
> the
>
>
>
> > > ability of elastic scaling, in a distributed database is particularly
>
>
>
> > > important
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === solved Insufficient of&nbsp; secondary indexes ===
>
>
>
> > > some people use hbase secondary index like Phoenix prjoect. but those
>
>
>
> > > programme base on the hbase rowkey has a lot of redundancy,He can't
>
>
>
> > create
>
>
>
> > > too many indexes,Data inflation rate is too high,so used lucene index
>
>
>
> > > instand of secondary is the best chooses.&nbsp;
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === we add an lucene index for spark olap===&nbsp;
>
>
>
> > > Most of OLAP systems has violent scanning problems and Poor timeliness
> of
>
>
>
> > > data like hive,spark sql,impala or some of the mpp database.
>
>
>
> > > 1.They used violent scans to calculate the data.but another choice is
> add
>
>
>
> > > index to the big data.some of the time using index can greatly improve
>
>
>
> > the
>
>
>
> > > performance of the original brute force scanning. i think&nbsp; that
> just
>
>
>
> > > like the traditional database, indexing technology can greatly improve
>
>
>
> > the
>
>
>
> > > performance of the speed database.
>
>
>
> > > 2.Another problem of thoses database or system, Most of them are an
>
>
>
> > > offline system or batch system,lxdb `s target is realtime append
>
>
>
> > ,realtime
>
>
>
> > > kv update just like hbase.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > ==future==
>
>
>
> > > === lucene on parquet ===
>
>
>
> > > recenetly i will change lucene&nbsp; tim,tip(invert index) ,dvd,dvm
> files
>
>
>
> > > to&nbsp; like parquet or orc format.
>
>
>
> > > To solve the performance problem of traversing Lucene index.To solve
> the
>
>
>
> > > problem that opening Lucene file needs to load files such as tip into
>
>
>
> > > memory, which leads to slow opening Lucene index file,To enable Lucene
> to
>
>
>
> > > store multi column joint index by column, which is used to handle some
>
>
>
> > > logic such as multi table join and materialized view ,mulity fields
> group
>
>
>
> > > by by invert index,The current Lucene index has many problems because
> of
>
>
>
> > > too many file pointers and single column problems,We want to modify
>
>
>
> > Lucene
>
>
>
> > > to make it more suitable for HDFS, not only for full-text retrieval,
> but
>
>
>
> > > also better at statistical analysis, which is a real database level
>
>
>
> > > index,We want Lucene to be splitable, which can separate storage from
>
>
>
> > > computation.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > ===&nbsp; supporting all kinds of Predicate pushdown
> calculation&nbsp;===
>
>
>
> > > We find that if we can combine the calculation method with the data
>
>
>
> > > closely, we can give more play to the performance of the database.
> Index
>
>
>
> > > is only a way of calculating push down. For example, storage push down,
>
>
>
> > we
>
>
>
> > > can store the index on the SSD device, and the data part on the SATA
>
>
>
> > > device. We can store the data that are often grouped together in
> advance,
>
>
>
> > > instead of calculating line by line, We can give important tables or
>
>
>
> > > columns to dedicated devices and resources, but these hbases are still
>
>
>
> > > lacking, which we need to further improve
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Distribution of intervention data ===
>
>
>
> > > we can used row key to intervention data to different nodes ,it can do
>
>
>
> > > many interestest things
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Resource control, resource isolation ===
>
>
>
> > > lucene recent is not support resource isolation,but&nbsp; on hdfs&nbsp;
>
>
>
> > we
>
>
>
> > > can do it , I can control the priority of SQL so that Lucene with
> higher
>
>
>
> > > priority can get faster IO resources.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > == Status ==
>
>
>
> > > since 2011 I released the first open source version on Alibaba&nbsp;
> ,At
>
>
>
> > > that time, mdrill used 10 nodes 48g machines to support 400 billion
> data.
>
>
>
> > > the first index on hdfs is from this version.it`s one year ahead of
> the
>
>
>
> > > community.&nbsp; https://github.com/alibaba/mdrill .
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > since 2014 i stoped mdrill project update for the reason of i join into
>
>
>
> > > tencent . in our team we developed&nbsp; hermes project ,we also build
>
>
>
> > > lucene on hdfs , hermes now realtime import 1000 billion rows of data
> per
>
>
>
> > > day.It's the largest database I've ever developed ,
>
>
>
> > > https://plus.tencent.com/bigdata/hermes
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > since 2018 I set up my own company called luxin, Lu Xin is the Chinese
>
>
>
> > > pronunciation of Lucene. as a funs of lucene ,luxin company`s domain is
>
>
>
> > > lucene.xin ,mail domain is lucene.cn.
>
>
>
> > > luxin`s first version of lxdb is called lsql,it`s means lucene
> sql.&nbsp;
>
>
>
> > > it used lucene(2.5.3)+hdfs+spark(1.6.3),it is stable, about 200+ of
>
>
>
> > > cluster use lsql. it`s process about 200 billions per day ,amount of
>
>
>
> > 20000
>
>
>
> > > billions rows in one&nbsp; single cluster. (1000 nodes)&nbsp;
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > since 2010 In the case of COVID-19 our team decide to developed the
> next
>
>
>
> > > generation of lsql called lxdb(lx=lucene pronunciation&nbsp;). we add
>
>
>
> > > hbase to lsql To solve the update problem.nowadays we have finish the
>
>
>
> > > first version of lxdb.&nbsp;https://github.com/lucene-cn/lxdb/wiki
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > == Known Risks ==
>
>
>
> > > ==Meritocracy ==
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > lxdb has been deployed in production and is applying more than 200
> lines
>
>
>
> > > of business. It has demonstrated great performance benefits and has
>
>
>
> > proved
>
>
>
> > > to be a better way for reporting and analysis based big data. Still We
>
>
>
> > > look forward to growing a rich user and developer community.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Orphaned products ===
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > The core developers currently work full-time for Luxin.
>
>
>
> > > lxdb is widely adopted by many companies and individuals. There's no
>
>
>
> > > realistic chance of it becoming orphaned. and we have a number of 1000
>
>
>
> > > person tencent qq Instant messaging group
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Inexperience with Open Source===
>
>
>
> > >
>
>
>
> > > The core developers are all active users and followers of open source.
>
>
>
> > > They are already committers and contributors to the lxdb project.&nbsp;
>
>
>
> > > developed yannian mu has tens years on open source project,&nbsp;
> jstorm
>
>
>
> > > https://github.com/alibaba/jstorm and
>
>
>
> > > mdrill&nbsp;https://github.com/alibaba/mdrill
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Homogenous Developers ===&nbsp;
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > The most of core developers are from luxin for the Closed source
> products
>
>
>
> > > reason, but when lxdb was open sourced, lxdb will received a lot of bug
>
>
>
> > > fixes and enhancements from other developers not working at luxin.Where
>
>
>
> > > did you learn it from and where did you return it.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > ===Reliance on Salaried Developers ===
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Lxin invested in lxdb as the&nbsp; solution and some of its key
> engineers
>
>
>
> > > are working full time on the project. In addition, since there is a
>
>
>
> > > growing Big Data need for scalable solutions, we look forward to other
>
>
>
> > > Apache developers and researchers to contribute to the project. Also
> key
>
>
>
> > > to addressing the risk associated with relying on Salaried developers
>
>
>
> > from
>
>
>
> > > a single entity is to increase the diversity of the contributors and
>
>
>
> > > actively lobby , Apache lxdb intends to do this.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === An Excessive Fascination with the Apache Brand ===
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Lxdb is proposing to enter incubation at Apache in order to help
> efforts
>
>
>
> > > to diversify the committer-base, not so much to capitalize on the
> Apache
>
>
>
> > > brand. The Lxdb project is in production use already inside lxdb, but
> is
>
>
>
> > > not expected to be an lxdb product for external customers. As such, the
>
>
>
> > > lxdb project is not seeking to use the Apache brand as a marketing
> tool.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Documentation===&nbsp;
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Information about Palo can be found at
> https://github.com/lucene-cn/lxdb
>
>
>
> > .
>
>
>
> > > The following links provide more information about lxdb in open source:
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > * wiki site: https://github.com/lucene-cn/lxdb/wiki
>
>
>
> > > * Issue Tracking: https://github.com/lucene-cn/lxdb/issues
>
>
>
> > > * Overview: https://github.com/lucene-cn/lxdb/wiki/intro
>
>
>
> > > * lxin home page: http://www.lucene.xin
>
>
>
> > >
>
>
>
> > > * lsql document: http://docs.lucene.xin/lsql/v21/
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > ##Initial Source
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > lxdb will development source code under an Apache license at
>
>
>
> > > https://github.com/lucene-cn/lxdb.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Core Developers ===
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Currently most of the core developers of LXDB are working in the
> research
>
>
>
> > > Team of luxin.
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > - yannian mu (dev)&nbsp;
>
>
>
> > > - yu chen (dev)&nbsp;
>
>
>
> > > - guangshi hao (dev)&nbsp;
>
>
>
> > > - wei sun (dev)&nbsp;
>
>
>
> > > - qihua zheng (dev)&nbsp;
>
>
>
> > > - xin wang (dev)&nbsp;
>
>
>
> > > - qingsong liu (dev)&nbsp;
>
>
>
> > > - anxing zhou (Tester)&nbsp;
>
>
>
> > > - jiajun duan (Tester)&nbsp;
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > == External Dependencies ==
>
>
>
> > >
>
>
>
> > > As all dependencies are managed using Apache Maven
>
>
>
> > > Dependency&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; License&nbsp; &nbsp;
> &nbsp;
>
>
>
> > > &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Optional?
>
>
>
> > > lucene&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp;
>
>
>
> > > &nbsp; &nbsp; &nbsp; true
>
>
>
> > > zookeeper&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Apache License
>
>
>
> > 2.0&nbsp;
>
>
>
> > > &nbsp; &nbsp; &nbsp; &nbsp; true
>
>
>
> > > hbase&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Apache License 2.0&nbsp;
>
>
>
> > > &nbsp; &nbsp; &nbsp; &nbsp; true
>
>
>
> > > spark&nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>
>
>
> > > true
>
>
>
> > > hadoop&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Apache
>
>
>
> > > License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; true
>
>
>
> > > hive&nbsp; &nbsp;Apache License 2.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>
>
>
> > true
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > == Required Resources ==
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Mailing lists ===
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > &nbsp;* lxdb-private (PMC discussion)
>
>
>
> > > &nbsp;* lxdb-dev (developer discussion)
>
>
>
> > > &nbsp;* lxdb-user (user discussion)
>
>
>
> > > &nbsp;* lxdb-commits (SCM commits)
>
>
>
> > > &nbsp;* lxdb-issues (JIRA issue feed)
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > === Subversion Directory ===
>
>
>
> > >
>
>
>
> > >
>
>
>
> > > Instead of subversion, LXDB prefers to git as source control
>
>
>
> > > management system: git://git.apache.org/lxdb
>
>
>
> >
>
>
>
> >
>
>
>
> >
>
>
>
> >
>
>
>
> >
>
>
>
> > --
>
>
>
> > Sent from: http://apache-incubator-general.996316.n3.nabble.com/
>
>
>
> >
>
>
>
> > ---------------------------------------------------------------------
>
>
>
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>
>
>
> > For additional commands, e-mail: general-h...@incubator.apache.org
>
>
>
> >
>
>
>
> >
>
>
>

Reply via email to