[DISCUSS] Ready to release version 0.14.0

寒江雪 Mon, 08 Feb 2021 18:00:38 -0800

Hi all:
     Since the release of 0.13,  Apache Doris (incubating)  contains around
390 new features, bug fixes, performance enhancements, documentation
improvements, code refactors from 60+ contributors.
     Now we are ready to release Apache Doris (incubating) 0.14.0.  I will
be the release manager of this version.  This release is expected to
include the following content:


# New Feature

### Import and delete

Support to delete multiple pieces of data at one time through the import
method to avoid performance degradation caused by multiple deletions. For
tables of the UniqueKey model, support to specify the Sequence column when
importing. Doris will judge the sequence of the data according to the value
of the Sequence column to ensure that the data is imported Time order
 [#4310] [#4256]

### Support database backup

The support in the backup stmt specifies the backup content (metadata and
data).
Support exclude backup and restore some tables in stmt. When backing up the
entire database, you can exclude some very large and unimportant tables.
Supports backing up and restoring the entire database instead of declaring
each table name in the backup and restore statement.

 [#5314]

### ODBC external table support

Support access to external tables such as MySQL, postgresql, Oracle, etc.
through ODBC protocol

 [#4798] [#4438] [#4559] [#4699]

### Support SQL level and Partition level result Cache

Support for caching query results to improve the efficiency of repeated
queries, support SQL-level and Partition-level results Cache [#4330]

### Built-in functions

- Support bitmap_xor function [#5098]
- Add replace() function [#4347]
- Add the time_round function to support time alignment according to
multiple time granularities [#4640]

### FE interface and HTTP interface

- The new FE UI interface can be enabled by setting the FE configuration
item enable_http_server_v2 [#4684]

- BE adds an http interface to show the distribution of all tablets in a
partition among different disks in a BE [#5096]
- BE adds an http interface to manually migrate a tablet to other disks on
the same node [#5101]
- Support to modify the configuration items of FE and BE through http, and
persist these modifications [#4704]
-

### Compatibility with MySQL

- Added support for views table in the information_schema database [#4778]
- Added table_privileges, schema_privileges and user_privileges to the
information_schema library for compatibility with certain MySQL
applications [#4899]
- A new statistic table is added to the information_schema meta-database
for compatibility with some MySQL tools [#4991]

### Monitoring

- BE added tablet-level monitoring indicators, including scanned data
volume and row number, written data volume and row number, to help locate
hot tablets [#4428]

- BE added metrics to view the usage of various LRU caches [#4688]

### Table building related

- Added CREATE TABLE LIKE statement to facilitate the creation of a table
metadata copy [#4705]
- Support atomic replacement of two tables through replace statement [#4669]

### Other

- Support adding Optimizer Hints of type SET_VAR in the Select statement to
set session variables [#4504]

- Support to repair damaged tablets by filling in empty tablets [#4255]
- Support Bucket Shuffle Join function (when the Join condition column is a
subset of the table bucket column, the right table will be shuffled to the
node where the data in the left table is located, which can significantly
reduce the network overhead caused by Shuffle Join and improve query speed)
[# 4677]
- Support batch cancel import tasks through cancel load statement [#4515]
- Add a Session variable to set whether to allow the partition column to be
NULL [#5013]
- Support TopN aggregation function [#4803]
- Support a new data balancing logic based on the number of partitions and
buckets [#5010]
- Support creating indexes on the value column of unique table [#5305]

# Enhancement

### Performance improvement

- Implemented a new compaction selection algorithm, providing lower write
amplification and a more reasonable compaction strategy [#4212]
- Optimize bit operation efficiency in variable length coding [#4366]
- Improve the execution efficiency of monery_format function [#4672]
- Optimize query execution plan: When the bucket column of the table is a
subset of the GroupBy column in SQL, reduce the data shuffle step [#4482]
- Improve the efficiency of column name search on BE [#4779]
- Improve the performance of the BE side LRU Cache [#4781]
- Optimized the tablet selection strategy of Compaction, reducing the
number of invalid selections [#4964]
- Optimized the reading efficiency of Unique Key table [#4958]
- Optimized the memory usage of LoadJob on the FE side and reduced the
memory overhead on the FE side [#4993]
- Reduce the lock granularity in FE metadata from Database level to Table
level to support more fine-grained concurrent access to metadata [#3775]
- Avoid unnecessary memory copy when creating hash table [#5301]
- Remove the path check when BE starts to speed up BE startup speed [#5268]
- Optimize the import performance of Json data [#5114]

### Functional improvements

- SQL supports collate utf8_general_ci syntax to improve MySQL syntax
compatibility [#4365]
- Improve the function of Batch delete, improve and optimize the related
compaction process [#4425]
- Enhance the function of parse_url() function, support lowercase, support
parsing port [#4429]
- When SQL execution specifies the execution mode of join (Join Hint), the
Colocation Join function will be disabled by default [#4497]
- Dynamic partition support hour level [#4514]
- HTTP interface on BE side supports gzip compression [#4533]
- Optimized the use of threads on the BE side [#4440]
- Optimize the checking process and error message of the rand() function in
the query analysis stage [#4439]
- Optimize the compaction triggering and execution logic to better limit
the resource overhead (mainly memory overhead) of the compaction operation,
and trigger the compaction operation more reasonably [#4670]
- Support pushing Limit conditions to ODBC/MySQL external tables [#4707]
- Increase the limit on the number of tablet versions on the BE side to
prevent excessive data versions from causing abnormal cluster load [#4687]
- When an RPC error occurs in a query, it can quickly return specific error
information to prevent the query from being stuck [#4702]
- Support automatic mapping of count(distinct if(bool, bitmap, null)) to
bitmap_union_count function [#4201]
- Support set sql_mode = concat(@@sql_mode, "STRICT_TRANS_TABLES")
statement [#4359]
- Support all stream load features in multiload [#4717]
- Optimize BE’s strategy for selecting disks when creating tablets, and use
the "two random choices" algorithm to ensure tablet copies are more even
[#4373]
- When creating a materialized view, the bitmap_union aggregation method
only supports integer columns, and hll_union does not support decimal
columns [#4432]
- Optimize the log level of some FEs to avoid log writing becoming a
bottleneck [#4766]
- In the describe table statement, display the definition expression of the
aggregate column of the materialized view [#4446]
- Support convert() function [#4364]
    -Support cast (expr as signed/unsigned int) syntax to be compatible
with MySQL ecology
    -Add more columns to the information_schema.columns table to be
compatible with the MySQL ecosystem
- In Spark Load function, use yarn command line instead of yarn-client API
to kill job or get job status [#4383]
- Persistence of stale rowset meta-information to ensure that this
information will not be lost after BE restarts [#4454]
- Return an error code in the schema change result to more clearly inform
the user of the specific error [#4388]
- Optimize the rowset selection logic of some compactions to make the
selection strategy more accurate [#5152]
- Optimize the Page Cache on the BE side, divide Page into data cache and
index cache [#5008]
- Optimized the accuracy of functions such as variance and standard
deviation on Decimal type [#4959]
- Optimized the processing logic of predicates pushed down to ScanNode to
avoid repeated filtering of predicate conditions at the query layer and
improve query efficiency [#4999]
- Optimized the predicate push-down logic of Unique Key table, and supports
push-down the conditions of non-primary key columns [#5022]
- Support pushing down "not in" and "!=" to the storage layer to improve
query efficiency [#5207]
- Support writing multiple memtables of a tablet in parallel during import.
Improve import efficiency [#5163]
- Optimize the creation logic of ZoneMap. When the number of rows on a page
is too small, ZoneMap will not be created anymore [#5260]
- Added histogram monitoring indicator class on BE [#5148]
- When importing Parquet files, if there is a parsing error, the specific
file name will be displayed in the error message [#4954]
- Optimize the creation logic of dynamic partitions, the table under
construction directly triggers the creation of dynamic partitions [#5209]
- In the result of the SHOW BACKENDS command, display the real start time
of BE [#4872]
- Support column names start with @ symbol, mainly used to support mapping
ES tables [#5006]
- Optimize the logic of the mapping and conversion relationship of the
declared columns in the import statement to make the use more clear [#5140]
- Optimize the execution logic of colocation join to make the query plan
more evenly executed on multiple BE nodes [#5104]
- Optimize the predicate pushdown logic, and support pushdown of is null
and is not null to the storage engine [#5092]
- Optimize the BE node selection logic in bucket join [#5133]
- Support UDF in import operation [#4863]

### Other

- Added support for IN Predicate in delete statement [#4404]
- Update the Dockerfile of the development image and add some new
dependencies [#4474]
- Fix various spelling errors in the code and documentation [#4714] [#4712]
[#4722] [#4723] [#4724] [#4725] [#4726] [#4727]
- Added two segment-related indicators in the OlapScanNode of the query
profile to display the total number of segments and the number of filtered
segments [#4348]
- Add batch delete function description document [#4435]
- Added Spark Load syntax manual [#4463]
- Added the display of cumulative compaction strategy name and rowset data
size in BE's /api/compaction/show API [#4466]
- Redirect the Spark Launcher log in Spark Load to a separate log file for
easy viewing [#4470]
- The BE configuration item streaming_load_max_batch_size_mb was renamed
streaming_load_json_max_mb to make its meaning more clear [#4791]
- Adjust the default value of the FE configuration item
thrift_client_timeout_ms to solve the problem of too long access to the
information_schema library [#4808]
- CPU or memory sampling of BE process is supported on BE web page to
facilitate performance debugging [#4632]
- Extend the data slicing balance class on the FE side, so that it can
extend more balance logic [#4771]
- The reorganized OLAP_SCAN_NODE profile information makes the profile
clearer and easier to read [#4825]
- Added monitoring indicators on the BE side to monitor cancelled Query
Fragment [#4862]
- Reorganized the profile information of HASH_JOIN_NODE, CROSS_JOIN_NODE,
UNION_NODE, ANALYTIC_EVAL_NODE to make the Profile more clear and easy to
read [#4878]
- Modify the default value of
query_colocate_join_memory_limit_penalty_factor to 1 to ensure that the
default memory limit of the execution plan fragment is consistent with the
user setting during the colocation join operation [#4895]
- Added consideration of tablet scanning frequency in the selection of
compaction strategy on the BE side [#4837]
- Optimize the strategy of sending Query Fragments and reduce the number of
sending public attributes to improve query plan scheduling performance
[#4904]
- Optimized the accuracy of load statistics for unavailable nodes when the
query scheduler is scheduling query plans [#4914]
- Add the code version information of the FE node in the result of the SHOW
FRONTENDS statement [#4943]
- Support more column type conversion, such as support conversion from CHAR
to numeric type, etc. [#4938]
- Import function to identify complex types in Parquet files [#4968]
- In the BE monitoring indicators, increase the monitoring of used permits
and waiting permits in the compaction logic [#4893]
- Optimize the execution time of BE single test [#5131]
- Added more JVM-related monitoring items on the FE side [#5112]
- Add a session variable to control the timeout period for the transaction
to take effect in the insert operation [#5170]
- Optimize the logic of selecting scan nodes for query execution plans, and
consider all ScanNode nodes in a query [#4984]
- Add more system monitoring indicators for FE nodes [#5149]
- Use of VLOG in unified BE code [#5264]

# Other

- Add some non-Apache protocol code protocol declarations to the NOTICE
file [#4831]

- Reformatted the code of BE using clang-format [#4965]

- Added clang-format checking and formatting scripts to unify the C++ code
style of BE before submission [#4934]

- The third-party library adds the AWS S3 SDK, which can be used to
directly read the data in the object storage through the SDK [#5234]

- Fixed some issues related to License: [#4371]

    1. The dependencies of the two third-party libraries, MySQL client and
LZO, will no longer be enabled in the default compilation options. If users
need MySQL external table function, they need to turn it on

    2. Removed the js and css code in the code library and introduced it in
the form of a third-party library dependency

- Updated the Docker development environment image build-env-1.2

- Updated the compilation method of the UnixODBC tripartite library, so
that the BE process no longer depends on the libltdl.so dynamic library of
the system when it is running

- Added third-party UDF to support more efficient set calculation of
orthogonal bitmap data [#4198]

- Added UnixODBC third-party library dependency to support ODBC external
table function [#4377]

# API Change

- Prohibit the creation of segment v1 tables [#4913]
- Rename the configuration item `streaming_load_max_batch_size_mb` to
`streaming_load_json_max_mb` [#4791]
- Support column reference passing in column definition of load statement
[#5140]
- Support creating indexes on the value column of unique table [#5305]
- Support atomic replacement of two tables through replace statement [#4669]
- Support CREATE TABLE LIKE statement

     To get more details please refers to ISSUE
https://github.com/apache/incubator-doris/issues/5374
     If you have any important feature that are in progress or not  merged
into the master and related to version 0.14, please reply to me by email.

[DISCUSS] Ready to release version 0.14.0

Reply via email to