I will defer the intake of code to the secretary. On Thu, Nov 15, 2018, 12:20 黄向东 <saint...@gmail.com wrote:
> > - When you say "open source" repo, do you mean private repo vs public > > repo? > > Yes. > > > > > - I believe Craig as Secretary will say an SGA never hurts but isn't > > everything already licensed ASLv2? It's been a few weeks and a few > > proposals reviewed so it could be my memory. > > Currently, the licenses of the dependency libs of IoTDB includes: > Apache2.0, BSD (antlr3), EPL1.0 (logback) and EPL2.0 (junit). > We are working on checking all the licenses once again for avoiding > mistakes. > > Regards, > Xiangdong Huang > > > > 在 2018年11月15日,下午10:43,Kevin A. McGrail <kmcgr...@apache.org> 写道: > > > > Well, first, let's ask some questions: > > > > - When you say "open source" repo, do you mean private repo vs public > > repo? > > > > - I believe Craig as Secretary will say an SGA never hurts but isn't > > everything already licensed ASLv2? It's been a few weeks and a few > > proposals reviewed so it could be my memory. > > > > Regards, > > KAM > > > > -- > > Kevin A. McGrail > > VP Fundraising, Apache Software Foundation > > Chair Emeritus Apache SpamAssassin Project > > https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > > > > On Thu, Nov 15, 2018 at 7:27 AM hxd <hxd...@qq.com> wrote: > > > >> Currently, there are 6 repositories (IoTDB, IoTDB-JDBC, TsFile, > >> Spark-Connector, Hive-Connector, and Grafana-Connector) totally and we > will > >> merge them all in one repositories. > >> > >> Only the first one is private. > >> > >> Actually we are lack of experiences about how to open source. > >> > >> Should we open all the source now or after all the Apache legal > documents > >> are done? > >> > >> Best, > >> > >> Xiangdong Huang > >> > >>> 在 2018年11月15日,下午5:06,Willem Jiang <willem.ji...@gmail.com> 写道: > >>> > >>> Here is a question for the source code repository > >>> > >>> The main source git repo[1] is still a private repo. I think we need > >>> to open source the repo before sending the SGA? > >>> > >>> > >>> [1]https://github.com/thulab/iotdb > >>> > >>> Willem Jiang > >>> > >>> Twitter: willemjiang > >>> Weibo: 姜宁willem > >>> On Thu, Nov 15, 2018 at 4:08 PM hxd <hxd...@qq.com> wrote: > >>>> > >>>> Hi, > >>>> > >>>> In the proposal discussion process, we got 3 mentors, Justin Mclean, > >> Christofer Dutz, and Willem Ning Jiang. > >>>> > >>>> In the vote process, we got a new mentor, Joe Witt. > >>>> > >>>> Totally, there are one Champion and four mentors, they are: > >>>> > >>>> Kevin A. McGrail (the Champion), > >>>> Justin Mclean, > >>>> Christofer Dutz, > >>>> Willem Ning Jiang, and > >>>> Joe Witt > >>>> > >>>> I have checked their name on > >> http://people.apache.org/committer-index.html, and they are accurate > now. > >>>> The name list on the proposal list ( > >> https://wiki.apache.org/incubator/IoTDBProposal) is also correct. > >>>> > >>>> Regards, > >>>> Xiangdong Huang > >>>> > >>>> > >>>> > >>>> 在 2018年11月15日,上午12:51,Kevin A. McGrail <kmcgr...@apache.org> 写道: > >>>> > >>>> Congratulations! As champion, I think the next steps are: > >>>> > >>>> 1 - Xiangdong, Can you confirm the list of mentors on the proposal is > >> accurate? > >>>> > >>>> 2 - Also Xiangdong, Is there anyone else that stepped forward as a > >> mentor during the voting process that the project wants the IPMC to > approve? > >>>> > >>>> 3 - Justin, I think you have to request the creation of the podling > and > >> then I as champion work on things like the meta data file from this > page, > >>>> https://incubator.apache.org/policy/incubation.html, correct? > >>>> > >>>> Regards, > >>>> KAM > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Kevin A. McGrail > >>>> VP Fundraising, Apache Software Foundation > >>>> Chair Emeritus Apache SpamAssassin Project > >>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171 > >>>> > >>>> > >>>> On Wed, Nov 14, 2018 at 6:29 AM hxd <hxd...@qq.com> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> With 8 +1 binding votes, 2 +1 non-binding votes and No +/-0 or -1 > >> votes, this VOTE passes. > >>>>> > >>>>> Thanks to everyone who voted! > >>>>> > >>>>> Bellow is a voting tally: > >>>>> > >>>>> Binding > >>>>> Von Gosling > >>>>> Christofer Dutz > >>>>> Kevin A. McGrail > >>>>> Felix Cheung > >>>>> Matt Sticker > >>>>> Joe Witt > >>>>> Justin Mclean > >>>>> Willem Jiang > >>>>> > >>>>> > >>>>> Non-binding > >>>>> Sheng Wu > >>>>> Yang Bo > >>>>> > >>>>> The vote thread: > >> > https://lists.apache.org/thread.html/077f029ab2b52a2b19fc8d41c07438f660a8e93dd87b3895d262263c@%3Cgeneral.incubator.apache.org%3E > >> < > >> > https://lists.apache.org/thread.html/077f029ab2b52a2b19fc8d41c07438f660a8e93dd87b3895d262263c@%3Cgeneral.incubator.apache.org%3E > >>> > >>>>> The proposal: https://wiki.apache.org/incubator/IoTDBProposal < > >> https://wiki.apache.org/incubator/IoTDBProposal> > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Xiangdong Huang > >>>>> > >>>>> > >>>>>> 在 2018年11月7日,下午3:46,hxd <hxd...@qq.com> 写道: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Sorry for the previous mail with bad format. > >>>>>> I'd like to call a VOTE to accept IoTDB project, a database for > >> managing large amounts of time series data from IoT sensors in > industrial > >> applications, into the Apache Incubator. > >>>>>> The full proposal is available on the wiki: > >> https://wiki.apache.org/incubator/IoTDBProposal > >>>>>> and it is also attached below for your convenience. > >>>>>> > >>>>>> Please cast your vote: > >>>>>> > >>>>>> [ ] +1, bring IoTDB into Incubator > >>>>>> [ ] +0, I don't care either way, > >>>>>> [ ] -1, do not bring IoTDB into Incubator, because... > >>>>>> > >>>>>> The vote will open at least for 72 hours. > >>>>>> > >>>>>> Thanks, > >>>>>> Xiangdong Huang. > >>>>>> > >>>>>> > >>>>>> = IoTDB Proposal = > >>>>>> v0.1.1 > >>>>>> > >>>>>> > >>>>>> == Abstract == > >>>>>> IoTDB is a data store for managing large amounts of time series data > >> such as timestamped data from IoT sensors in industrial applications. > >>>>>> > >>>>>> == Proposal == > >>>>>> IoTDB is a database for managing large amount of time series data > >> with columnar storage, data encoding, pre-computation, and index > >> techniques. It has SQL-like interface to write millions of data points > per > >> second per node and is optimized to get query results in few seconds > over > >> trillions of data points. It can also be easily integrated with Apache > >> Hadoop MapReduce and Apache Spark for analytics. > >>>>>> > >>>>>> == Background == > >>>>>> > >>>>>> A new class of data management system requirements is becoming > >> increasingly important with the rise of the Internet of Things. There > are > >> some database systems and technologies aimed at time series data > >> management. For example, Gorilla and InfluxDB which are mainly built > for > >> data centers and monitoring application metrics. Other systems, for > >> example, OpenTSDB and KairosDB, are built on Apache HBase and Apache > >> Cassandra, respectively. > >>>>>> > >>>>>> However, many applications for time series data management have more > >> requirements especially in industrial applications as follows: > >>>>>> > >>>>>> * Supporting time series data which has high data frequency. For > >> example, a turbine engine may generate 1000 points per second (i.e., > >> 1000Hz), while each CPU only reports 1 data points per 5 seconds in a > data > >> center monitoring application. > >>>>>> > >>>>>> * Supporting scanning data multi-resolutionally. For example, > >> aggregation operation is important for time series data. > >>>>>> > >>>>>> * Supporting special queries for time series, such as pattern > >> matching, time series segmentation, time-frequency transformation and > >> frequency query. > >>>>>> > >>>>>> * Supporting a large number of monitoring targets (i.e. time > series). > >> An excavator may report more than 1000 time series, for example, > revolving > >> speed of the motor-engine, the speed of the excavator, the accelerated > >> speed, the temperature of the water tank and so on, while a CPU or an > >> application monitor has much fewer time series. > >>>>>> > >>>>>> * Optimization for out-of-order data points. In the industrial > >> sector, it is common that equipment sends data using the UDP protocol > >> rather than the TCP protocol. Sometimes, the network connect is unstable > >> and parts of the data will be buffered for later sending. > >>>>>> > >>>>>> * Supporting long-term storage. Historical data is precious for > >> equipment manufacturers. Therefore, removing or unloading historical > data > >> is highly desired for most industrial applications. The database system > >> must not only support fast retrieval of historical data, but also should > >> guarantee that the historical data does not impact the processing speed > for > >> “hot” or current data. > >>>>>> > >>>>>> * Supporting online transaction processing (OLTP) as well as complex > >> analytics. It is obvious that supporting analyzing from the data files > >> using Apache Spark/Apache Hadoop MapReduce directly is better than > >> transforming data files to another file format for Big Data analytics. > >>>>>> > >>>>>> * Flexible deployment either on premise or in the cloud. IoTDB is > as > >> simple and can be deployed on a Raspberry Pi handling hundreds of time > >> series. Meanwhile, the system can be also deployed in the cloud so that > it > >> supports tens of millions ingestions per second, OLTP queries in > >> milliseconds, and analytics using Apache Spark/Apache Hadoop MapReduce. > >>>>>> > >>>>>> * * (1) If users deploy IoTDB on a device, such as a Raspberry Pi, a > >> wind turbine, or a meteorological station, the deployment of the chosen > >> database is designed to be simple. A device may have hundreds of time > >> series (but less than a thousand time series) and the database needs to > >> handle them. > >>>>>> * * (2) When deploying IoTDB in a data center, the computational > >> resources (i.e., the hardware configuration of servers) is not a problem > >> when compared to a Raspberry Pi. In this deployment, IoTDB can use more > >> computation resources, and has the ability to handle more time seires > >> (e.g., millions of time series). > >>>>>> > >>>>>> Based on these requirements, we developed IoTDB, a new data store > >> system for managing time series data. > >>>>>> > >>>>>> IoTDB started as a Tsinghua University research project. IoTDB's > >> developer community has also grown to include additional institutions, > for > >> example, universities (e.g., Fudan University), research labs (e.g, > NEL-BDS > >> lab), and corporations (e.g., K2Data, Tencent). Funding has been > provided > >> by various institutions including the National Natural Science > Foundation > >> of China, and industry sponsors, such as Lenovo and K2Data. > >>>>>> > >>>>>> == Rationale == > >>>>>> Because there is no existed open-sourced time series databases > >> covering all the above requirements, we developed IoTDB. As the system > >> matures, we are seeking a long-term home for the project. We believe the > >> Apache Software Foundation would be an ideal fit. Also joining Apache > will > >> help coordinate and improve the development effort of the growing > number of > >> organizations which contribute to IoTDB improving the diversity of our > >> community. > >>>>>> > >>>>>> IoTDB contains multiple modules, which are classified into > categories: > >>>>>> > >>>>>> * '''TsFile Format''': TsFile is a new columnar file format. > >>>>>> * '''Adaptor for Analytics and Visualization''': Integrating TsFile > >> with Apache Hadoop HDFS, Apache Hadoop MapReduce and Apache Spark. > Examples > >> of integrating IoTDB with Apache Kafka, Apache Storm and Grafana are > also > >> provided. > >>>>>> * '''IoTDB Engine''': An engine which consists of SQL parser, query > >> plan generator, memtable, authentication and authorization,write ahead > log > >> (WAL), crash recovery, out-of-order data handler, and index for > aggregation > >> and pattern matching. The engine stores system data in TsFile format. > >>>>>> * '''IoTDB JDBC''': An implementation of Java Database Connectivity > >> (JDBC) for clients to connect to IoTDB using Java. > >>>>>> > >>>>>> === TsFile Format === > >>>>>> > >>>>>> TsFile format is a columnar store, which is similar with Apache > >> Parquet and Apache CarbonData. It has the concepts of Chunk Group, > Column > >> Chunk, Page and Footer. Comparing with Apache Parquet and Apache > >> CarbonData, it is designed and optimized for time series: > >>>>>> > >>>>>> ==== Time Series Friendly Encoding ==== > >>>>>> IoTDB currently supports run length encoding (RLE), delta-of-delta > >> encoding, and Facebook's Gorilla encoding. > >>>>>> > >>>>>> Lossy encoding methods (e.g., Piecewise Linear Approximation (PLA) > >> and time-frequency transformation are works-in-progress. > >>>>>> > >>>>>> > >>>>>> ==== Chunk Group ==== > >>>>>> The data part of a TsFile consists of many Chunk Groups. Each Chunk > >> Group stores the data of a device at a time interval. A Chunk Group is > >> similar to the row group in Apache Parquet, while there are some > >> constraints of the time dimension: For each device, the time intervals > of > >> different Chunk Groups are not overlapped and the latter Chunk Group > always > >> has a larger timestamp. > >>>>>> > >>>>>> Given a TsFile and a query with a time range filter, the query > >> process can terminate scanning data once it reads data points whose > >> timestamp reaches the time limit of the filter. We call the feature > >> ''fast-return'' and it makes the time range query in a TsFile very > >> efficient. > >>>>>> > >>>>>> > >>>>>> > >>>>>> ==== Different Column Chunk Format (Unnecessary the Repetition (R) > >> and Definition (D) Fields) ==== > >>>>>> > >>>>>> While Apache Parquet and Apache CarbonData support complex data > >> types, e.g., nested data and sparse columns, TsFile is exclusively > designed > >> for time series whose data model is \<device_id, series_id, timestamp, > >> value\>. > >>>>>> > >>>>>> In a `Chunk Group`, each time series is a `Column Chunk`. Even > though > >> these time series belong to the same device, the data points in > different > >> time series are not aligned in the time dimension originally. > >>>>>> > >>>>>> For example, if you have a device with 2 sensors on the same data > >> collection frequencies, sensor 1 may collect data at time 1521622662000 > >> while the other one collects data at time 1521622662001 (delta=1ms). > >> Therefore, each Column Chunk has its timestamps and values, which is > quite > >> different from Apache Parquet and Apache CarbonData. Because we store > the > >> time column along with each value column instead of making different > chunks > >> share the same time column for the sake of diverse data frequency for > >> different time series, we do not store any null value on disk to align > >> across time series. Besides, we do not need to attach `repetition` (R) > and > >> `definition` (D) fields on each value. Therefore, the disk space is > saved > >> and the query latency is reduced (because we do not align data by > >> calculating R and D fields). > >>>>>> > >>>>>> > >>>>>> ==== Domain Specific Information in Each Page ==== > >>>>>> Similar to Apache Parquet and Apache CarbonData, a `Column Chunk` > >> consists of several `Pages`, and each `Page` has a `Page header`. The > `Page > >> header` is a summary of the data in the page. > >>>>>> > >>>>>> Because TsFile is optimized for time series, the page header > contains > >> more domain specific information, such as the minimal and maximal value, > >> the minimal and the maximal timestamp, the frequency and so on. TsFile > can > >> even store the histogram of values in the page header. > >>>>>> > >>>>>> This header information helps IoTDB in speeding up queries by > >> skipping unnecessary pages. > >>>>>> > >>>>>> > >>>>>> === Adaptor for Analytics === > >>>>>> The TsFile provides: > >>>>>> > >>>>>> * InputFormat/OutputFormat interfaces for Reading/Writing data. > >>>>>> * Deep integration with Apache Spark/Hadoop MapReduce including > >> predicate push-down, column pruning, aggregation push down, etc. So > users > >> can use Apache Spark SQL/HiveQL to connect and query TsFiles. > >>>>>> > >>>>>> > >>>>>> === IoTDB Engine === > >>>>>> The IoTDB engine is a database engine, which uses TsFile as its > >> storage file format. The IoTDB Engine supports SQL-like query plus many > >> useful functions: > >>>>>> > >>>>>> * Tree-based time series schema > >>>>>> * Log-Structured Merge (LSM)-based storage > >>>>>> * Overflow file for out-of-order data > >>>>>> * Scalable index framework > >>>>>> * Special queries for time series > >>>>>> > >>>>>> ==== Tree-based Time Series Schema ==== > >>>>>> IoTDB manages all the time series definitions using a tree > structure. > >> A path from the root of the tree to a leaf node represents a time > series. > >> Therefore, the unique id of a time series is a path, e.g., > >> `root.China.beijing.windFarm1.windTurbine1.speed`. > >>>>>> > >>>>>> This kind of schema can express `group by` naturally. For example, > >> `root.China.beijing.windFarm1.*.speed` represents the speed of all the > wind > >> turbines in wind farm 1 in Beijing, China. > >>>>>> > >>>>>> ==== Log-Structured Merge (LSM)-based Storage ==== > >>>>>> In a time series, the data points should be ordered by their > >> timestamps. In IoTDB, we use Log-Structured Merge (LSM) based mechanism. > >> Therefore, a part of the data is stored in memory first and can be > called > >> as `memtable`. At this time, if data points come out-of-order, we resort > >> them in memory. When this part of data exceeds the configured memory > limit, > >> we flush it on disk as a `Chunk Group` into an unclosed TsFile. > Finally, a > >> TsFile may contain several Chunk Groups, for reducing the number of > small > >> data files, which is helpful to reduce the I/O load of the storage > system > >> and reduces the execution time of a file-merge in LSM. Notice that the > data > >> is time-ordered in one Chunk Group on disk, and this layout is helpful > for > >> fast filtering in one Chunk Group for a query. > >>>>>> > >>>>>> Rule 1: In a TsFile, the Chunk Groups of one device are ordered by > >> timestamp (Rule 1), and it is helpful for fast filtering among Chunk > Groups > >> for a query. > >>>>>> > >>>>>> Rule 2: When the size of the unclosed TsFile reaches the threshold > >> defined in the configuration file, we close the file and generate a new > one > >> to store new arriving data spanning the entire data set. Like many > systems > >> which use LSM-based storage, we never modify a TsFile which has been > closed > >> except for the file-merge process (Rule 2). > >>>>>> > >>>>>> Rule 3: To reduce the number of TsFiles involved in a query process, > >> we guarantee that the data points in different TsFiles are not > overlapping > >> on the time dimension after file mergence (Rule 3). > >>>>>> > >>>>>> ==== Overflow File for Out-of-order Data ==== > >>>>>> When a part of data is flushed on disk (and will form a `Chunk > Group` > >> in a TsFile), the newly arriving data points whose timestamps are > smaller > >> than the largest timestamp in the Tsfile are `out-of-order`. > >>>>>> > >>>>>> To store the out-of-order data, we organize all the troublesome > >> `out-of-order` data point insertions into a special TsFile, named > >> `UnSequenceTsFile`. In an UnSequenceTsFile, the Chunk Groups of one > device > >> may be overlapping in the time dimension, which violates the Rule 1 and > >> costs additional time compared to a normal TsFile for query filtering. > >>>>>> > >>>>>> There is another special operation: updating all the data points in > a > >> time range, e.g., `update all the speed values of device1 as 0 where the > >> data time is in [1521622000000, 1521622662000]`. The operation is called > >> when: (1) a sensor malfunctions and the database receives wrong data > for a > >> period; (2) we may want to reset all the records. Many NoSQL time series > >> databases do not support such an operation. To support the operation in > >> IoTDB, we use a tree-based structure, Treap, to store this part of > >> operations and store them as `Overflow` files. > >>>>>> > >>>>>> Therefore, there are 3 kinds of data files: TsFiles, > >> UnSequenceTsFiles and Overflow files. TsFiles should store most of the > >> data. The volume of UnSequenceTsFiles depends on the workload: if there > are > >> too many out-of-order and the time span of out-of-order is huge, the > volume > >> will be large. Overflow files handle fewest data operations but will > depend > >> on the use of the special operations. > >>>>>> > >>>>>> ==== LSM-tree ==== > >>>>>> Normally, LSM-based storage engines merge data files level by level > >> so that it looks like a tree structure. In this way, data is well > >> organized. The disadvantage is that data will be read and written > several > >> times. If the tree has 4 levels, each data point will be rewritten at > least > >> 4 times. > >>>>>> > >>>>>> Currently, we do not merge all the TsFiles into one because (1) the > >> number of TsFiles is kept lower than many LSM storage engines because a > >> memtable is mapped to several Chunk Groups rather than a file; (2) > >> different TsFiles are not overlapping with each other in the time > dimension > >> (because of Rule 3). > >>>>>> > >>>>>> As mentioned before, TsFile supports ''fast-return'' to accelerate > >> queries. However, UnSequenceTsFile and Overflow files do not allow this > >> feature. The time spans of UnSequenceTsFile, Overflow file andTsFile > may be > >> overlapped, which leads to more files involved in the query process. To > >> accelerate these queries, there is a merging process to reorganize > files in > >> the background. All the three kinds of files: TsFiles, UnSequenceTsFiles > >> and Overflow files, are involved in the merging process. The merging > >> process is implemented using multi-threading, while each thread is > >> responsible for a series family. > >>>>>> After merging, only TsFiles are left. These files have > >> non-overlapping time spans and support the ''fast-return'' feature. > >>>>>> > >>>>>> ==== Scalable Index Framework ==== > >>>>>> We allow users to implement indexes for faster queries. We currently > >> support an index for pattern matching query (KV-Match index, ICDE 2019). > >> Another index for fast aggregation (PISA index, CIKM 2016) is a > >> work-in-progress. > >>>>>> > >>>>>> ==== Special Queries ==== > >>>>>> We currently support `group by time interval` aggregation queries > and > >> `Fill by` operations, which are similar to those of InfluxDB. Time > series > >> segmentation operations and frequency queries are work-in-progress. > >>>>>> > >>>>>> == Initial Goals == > >>>>>> The initial goals are to be open sourced and to integrate with the > >> Apache development process. Furthermore, we plan for incremental > >> development, and releases along with the Apache guidelines. > >>>>>> > >>>>>> == Current Status == > >>>>>> We have developed the system for more than 2 years. There are > >> currently 13k lines of code, some of which are generated by Antlr3 and > >> Thrift. There are 230 issues which have been solved and more than 1500 > >> commits. > >>>>>> > >>>>>> The system has been deployed in the staging environment of the State > >> Grid Corporation of China to handle ~3 million time series (i.e, ~30,000 > >> power generation assembly * ~100 sensors) and an equipment service > company > >> in China managing ~2 million time series (i.e, ~20k devices * 100 > sensors). > >> The insertion speed reaches ~2 million points/second/node, which is > faster > >> than InfluxDB, OpenTSDB and Apache Cassandra in our environment. > >>>>>> > >>>>>> There are many new features in the works including those mentioned > >> herein. We will add more analytics functions, improve the data file > merge > >> process, and finish the first released version of IoTDB. > >>>>>> > >>>>>> == Meritocracy == > >>>>>> The IoTDB project operates on meritocratic principles. Developers > who > >> submit more code with higher quality earn more merit. We have used > `Issues` > >> and `Pull Requests` modules on Github for collecting users' suggestions > and > >> patches. Users who submit issues, pull requests, documents and help the > >> community management are welcomed and encouraged to become committers. > >>>>>> > >>>>>> == Community == > >>>>>> > >>>>>> The IoTDB project users communicate on Github ( > >>>>>> https://github.com/thulab/tsfile) . Developers make the > >> communication on a website which is similar with JIRA (Currently, only > >> registered users can apply to access the project for communication, url: > >> https://tower.im/projects/36de8571a0ff4833ae9d7f1c5c400c22/ > >>>>>> ). We have also introduced IoTDB at many technical conferences. > Next, > >> we will build the mailing list for more convenience, broader > communication > >> and archived discussions. > >>>>>> > >>>>>> If IoTDB is accepted for incubation at the Apache Software > >> Foundation, the primary goal is to build a larger community. We believe > >> that IoTDB will become a key project for time series data management, > and > >> so, we will rely on a large community of users and developers. > >>>>>> > >>>>>> TODO: IoTDB is currently on a private Github repository ( > >>>>>> https://github.com/thulab/iotdb), while its subproject TsFile (a > >> file format for storing time series data) is open sourced on Github ( > >> https://github.com/thulab/tsfile > >>>>>> ). > >>>>>> > >>>>>> == Core Developers == > >>>>>> IoTDB was initially developed by 2 dozen of students and teachers at > >> Tsinghua University. Now, more and more developers have joined coming > from > >> other universities: Fudan University, Northwestern Polytechnical > University > >> and Harbin Institute of Technology in China. Other developers come from > >> business companies such as Lenovo and Microsoft. We will be working to > >> bring more and more developers into the project making contributions to > >> IoTDB. > >>>>>> > >>>>>> == Relationships with Other Apache Products == > >>>>>> IoTDB requires some Apache products (Apache Thrift, commons, > >> collections, httpclient). > >>>>>> > >>>>>> IoTDB-Spark-connector and IoTDB-Hadoop-connector have been developed > >> for supporting analysing time series data by using Apache Spark and > >> MapReduce. > >>>>>> > >>>>>> Overall, IoTDB is designed as an open architecture, and it can be > >> integrated with many other systems in the future. > >>>>>> > >>>>>> As mentioned before, in the IoTDB project, we designed a new > columnar > >> file format, called TsFile, which is similar to Apache Parquet. However, > >> the new file format is optimized for time series data. > >>>>>> > >>>>>> > >>>>>> > >>>>>> == Known Risks == > >>>>>> > >>>>>> === Orphaned Products === > >>>>>> Given the current level of investment in IoTDB, the risk of the > >> project being abandoned is minimal. Time series data is more and more > >> important and there are several constituents who are highly inspired to > >> continue development. Tsinghua and NEL-BDS Lab relies on IoTDB as a > >> platform for a large number of long-term research projects. We have > >> deployed IoTDB in some company's staging environments for future > >> applications. > >>>>>> > >>>>>> === Inexperience with Open Source === > >>>>>> Students and researchers in Tsinghua University have been developing > >> and using open source software for a long time. It is wonderful to be > >> guided to join a formal open-source process for students. Some of our > >> committers > >>>>>> have experiences contributing to open source, for example: > >>>>>> > >>>>>> * druid: > >>>>>> > >> > https://github.com/druid-io/druid/commit/f18cc5df97e5826c2dd8ffafba9fcb69d10a4d44 > >>>>>> > >>>>>> * druid: > >>>>>> > >> > https://github.com/druid-io/druid/commit/aa7aee53ce524b7887b218333166941654788794 > >>>>>> > >>>>>> * YCSB: > >>>>>> https://github.com/brianfrankcooper/YCSB/pull/776 > >>>>>> > >>>>>> > >>>>>> Additionally, several ASF veterans and industry veterans have agreed > >> to mentor the project and are listed in this proposal. The project will > >> rely on their guidance and collective wisdom to quickly transition the > >> entire team of initial committers towards practicing the Apache Way. > >>>>>> > >>>>>> > >>>>>> === Reliance on Salaried Developers === > >>>>>> Most of current developers are students and researchers/professors > in > >> universities, and their researches focus on big data management and > >> analytics. It is unlikely that they will change their research focus > away > >> from big data management. We will work to ensure that the ability for > the > >> project to continuously be stewarded and to proceed forward independent > of > >> salaried developers is continued. > >>>>>> > >>>>>> === An Excessive Fascination with the Apache Brand === > >>>>>> Most of the initial developers come from Tsinghua University with no > >> intent to use the Apache brand for profit. We have no plans for making > use > >> of Apache brand in press releases nor posting billboards advertising > >> acceptance of IoTDB into Apache Incubator. > >>>>>> > >>>>>> > >>>>>> == Initial Source == > >>>>>> IoTDB's github address and some required dependencies: > >>>>>> > >>>>>> * The storage file format: > >>>>>> https://github.com/thulab/tsfile > >>>>>> > >>>>>> * Adaptor for Apache Hadoop MapReduce: > >>>>>> https://github.com/thulab/tsfile-hadoop-connector > >>>>>> > >>>>>> * Adaptor for Apache Spark: > >>>>>> https://github.com/thulab/tsfile-spark-connector > >>>>>> > >>>>>> * Adaptor for Grafana: > >>>>>> https://github.com/thulab/iotdb-grafana > >>>>>> > >>>>>> * The database engine: > >>>>>> https://github.com/thulab/iotdb > >>>>>> (private project up to now) > >>>>>> * The client driver: > >>>>>> https://github.com/thulab/iotdb-jdbc > >>>>>> > >>>>>> > >>>>>> > >>>>>> === External Dependencies === > >>>>>> To the best of our knowledge, all dependencies of IoTDB are > >> distributed under Apache compatible licenses. Upon acceptance to the > >> incubator, we would begin a thorough analysis of all transitive > >> dependencies to verify this fact and introduce license checking into the > >> build and release process. > >>>>>> > >>>>>> == Documentation == > >>>>>> * Documentation for TsFile: > >>>>>> https://github.com/thulab/tsfile/wiki > >>>>>> > >>>>>> * Documentation for IoTDB and its JDBC: > >>>>>> http://tsfile.org/document > >>>>>> (Chinese only. An English version is in progress.) > >>>>>> > >>>>>> == Required Resources == > >>>>>> === Mailing Lists === > >>>>>> * > >>>>>> priv...@iotdb.incubator.apache.org > >>>>>> > >>>>>> * > >>>>>> d...@iotdb.incubator.apache.org > >>>>>> > >>>>>> * > >>>>>> comm...@iotdb.incubator.apache.org > >>>>>> > >>>>>> > >>>>>> === Git Repositories === > >>>>>> * > >>>>>> https://git-wip-us.apache.org/repos/asf/incubator-iotdb.git > >>>>>> > >>>>>> > >>>>>> === Issue Tracking === > >>>>>> * JIRA IoTDB (We currently use the issue management provided by > >> Github to track issues.) > >>>>>> > >>>>>> > >>>>>> == Initial Committers == > >>>>>> Tsinghua University, K2Data Company, Lenovo, Microsoft > >>>>>> > >>>>>> Jianmin Wang (jimwang at tsinghua dot edu dot cn ) > >>>>>> > >>>>>> Xiangdong Huang (sainthxd at gmail dot com) > >>>>>> > >>>>>> Jun Yuan (richard_yuan16 at 163 dot com) > >>>>>> > >>>>>> Chen Wang ( wang_chen at tsinghua dot edu dot cn) > >>>>>> > >>>>>> Jialin Qiao (qjl16 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Jinrui Zhang (jinrzhan at microsoft dot com) > >>>>>> > >>>>>> Rong Kang (kr11 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Tian Jiang(jiangtia18 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Shuo Zhang (zhangshuo at k2data dot com dot cn) > >>>>>> > >>>>>> Lei Rui (rl18 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Rui Liu (liur17 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Kun Liu (liukun16 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Gaofei Cao (cgf16 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Xinyi Zhao (xyzhao16 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Dongfang Mao (maodf17 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Tianan Li(lta18 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Yue Su (suy18 at mails dot tsinghua dot edu dot cn) > >>>>>> > >>>>>> Hui Dai (daihui_iot at lenovo dot com, yuct_iot at lenovo dot com ) > >>>>>> > >>>>>> == Sponsors == > >>>>>> === Champion === > >>>>>> Kevin A. McGrail ( > >>>>>> kmcgr...@apache.org > >>>>>> ) > >>>>>> > >>>>>> === Nominated Mentors === > >>>>>> Justin Mclean (justin at classsoftware dot com) > >>>>>> > >>>>>> Christofer Dutz (christofer.dutz at c-ware dot de) > >>>>>> > >>>>>> Willem Jiang (willem.jiang at gmail dot com) > >>>>>> > >>>>>> > >>>> > >>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>> For additional commands, e-mail: general-h...@incubator.apache.org > >>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >