Hi all: Since the release of 0.13, Apache Doris (incubating) contains around 390 new features, bug fixes, performance enhancements, documentation improvements, code refactors from 60+ contributors. Now we are ready to release Apache Doris (incubating) 0.14.0. I will be the release manager of this version. This release is expected to include the following content:
# New Feature ### Import and delete Support to delete multiple pieces of data at one time through the import method to avoid performance degradation caused by multiple deletions. For tables of the UniqueKey model, support to specify the Sequence column when importing. Doris will judge the sequence of the data according to the value of the Sequence column to ensure that the data is imported Time order [#4310] [#4256] ### Support database backup The support in the backup stmt specifies the backup content (metadata and data). Support exclude backup and restore some tables in stmt. When backing up the entire database, you can exclude some very large and unimportant tables. Supports backing up and restoring the entire database instead of declaring each table name in the backup and restore statement. [#5314] ### ODBC external table support Support access to external tables such as MySQL, postgresql, Oracle, etc. through ODBC protocol [#4798] [#4438] [#4559] [#4699] ### Support SQL level and Partition level result Cache Support for caching query results to improve the efficiency of repeated queries, support SQL-level and Partition-level results Cache [#4330] ### Built-in functions - Support bitmap_xor function [#5098] - Add replace() function [#4347] - Add the time_round function to support time alignment according to multiple time granularities [#4640] ### FE interface and HTTP interface - The new FE UI interface can be enabled by setting the FE configuration item enable_http_server_v2 [#4684] - BE adds an http interface to show the distribution of all tablets in a partition among different disks in a BE [#5096] - BE adds an http interface to manually migrate a tablet to other disks on the same node [#5101] - Support to modify the configuration items of FE and BE through http, and persist these modifications [#4704] - ### Compatibility with MySQL - Added support for views table in the information_schema database [#4778] - Added table_privileges, schema_privileges and user_privileges to the information_schema library for compatibility with certain MySQL applications [#4899] - A new statistic table is added to the information_schema meta-database for compatibility with some MySQL tools [#4991] ### Monitoring - BE added tablet-level monitoring indicators, including scanned data volume and row number, written data volume and row number, to help locate hot tablets [#4428] - BE added metrics to view the usage of various LRU caches [#4688] ### Table building related - Added CREATE TABLE LIKE statement to facilitate the creation of a table metadata copy [#4705] - Support atomic replacement of two tables through replace statement [#4669] ### Other - Support adding Optimizer Hints of type SET_VAR in the Select statement to set session variables [#4504] - Support to repair damaged tablets by filling in empty tablets [#4255] - Support Bucket Shuffle Join function (when the Join condition column is a subset of the table bucket column, the right table will be shuffled to the node where the data in the left table is located, which can significantly reduce the network overhead caused by Shuffle Join and improve query speed) [# 4677] - Support batch cancel import tasks through cancel load statement [#4515] - Add a Session variable to set whether to allow the partition column to be NULL [#5013] - Support TopN aggregation function [#4803] - Support a new data balancing logic based on the number of partitions and buckets [#5010] - Support creating indexes on the value column of unique table [#5305] # Enhancement ### Performance improvement - Implemented a new compaction selection algorithm, providing lower write amplification and a more reasonable compaction strategy [#4212] - Optimize bit operation efficiency in variable length coding [#4366] - Improve the execution efficiency of monery_format function [#4672] - Optimize query execution plan: When the bucket column of the table is a subset of the GroupBy column in SQL, reduce the data shuffle step [#4482] - Improve the efficiency of column name search on BE [#4779] - Improve the performance of the BE side LRU Cache [#4781] - Optimized the tablet selection strategy of Compaction, reducing the number of invalid selections [#4964] - Optimized the reading efficiency of Unique Key table [#4958] - Optimized the memory usage of LoadJob on the FE side and reduced the memory overhead on the FE side [#4993] - Reduce the lock granularity in FE metadata from Database level to Table level to support more fine-grained concurrent access to metadata [#3775] - Avoid unnecessary memory copy when creating hash table [#5301] - Remove the path check when BE starts to speed up BE startup speed [#5268] - Optimize the import performance of Json data [#5114] ### Functional improvements - SQL supports collate utf8_general_ci syntax to improve MySQL syntax compatibility [#4365] - Improve the function of Batch delete, improve and optimize the related compaction process [#4425] - Enhance the function of parse_url() function, support lowercase, support parsing port [#4429] - When SQL execution specifies the execution mode of join (Join Hint), the Colocation Join function will be disabled by default [#4497] - Dynamic partition support hour level [#4514] - HTTP interface on BE side supports gzip compression [#4533] - Optimized the use of threads on the BE side [#4440] - Optimize the checking process and error message of the rand() function in the query analysis stage [#4439] - Optimize the compaction triggering and execution logic to better limit the resource overhead (mainly memory overhead) of the compaction operation, and trigger the compaction operation more reasonably [#4670] - Support pushing Limit conditions to ODBC/MySQL external tables [#4707] - Increase the limit on the number of tablet versions on the BE side to prevent excessive data versions from causing abnormal cluster load [#4687] - When an RPC error occurs in a query, it can quickly return specific error information to prevent the query from being stuck [#4702] - Support automatic mapping of count(distinct if(bool, bitmap, null)) to bitmap_union_count function [#4201] - Support set sql_mode = concat(@@sql_mode, "STRICT_TRANS_TABLES") statement [#4359] - Support all stream load features in multiload [#4717] - Optimize BE’s strategy for selecting disks when creating tablets, and use the "two random choices" algorithm to ensure tablet copies are more even [#4373] - When creating a materialized view, the bitmap_union aggregation method only supports integer columns, and hll_union does not support decimal columns [#4432] - Optimize the log level of some FEs to avoid log writing becoming a bottleneck [#4766] - In the describe table statement, display the definition expression of the aggregate column of the materialized view [#4446] - Support convert() function [#4364] -Support cast (expr as signed/unsigned int) syntax to be compatible with MySQL ecology -Add more columns to the information_schema.columns table to be compatible with the MySQL ecosystem - In Spark Load function, use yarn command line instead of yarn-client API to kill job or get job status [#4383] - Persistence of stale rowset meta-information to ensure that this information will not be lost after BE restarts [#4454] - Return an error code in the schema change result to more clearly inform the user of the specific error [#4388] - Optimize the rowset selection logic of some compactions to make the selection strategy more accurate [#5152] - Optimize the Page Cache on the BE side, divide Page into data cache and index cache [#5008] - Optimized the accuracy of functions such as variance and standard deviation on Decimal type [#4959] - Optimized the processing logic of predicates pushed down to ScanNode to avoid repeated filtering of predicate conditions at the query layer and improve query efficiency [#4999] - Optimized the predicate push-down logic of Unique Key table, and supports push-down the conditions of non-primary key columns [#5022] - Support pushing down "not in" and "!=" to the storage layer to improve query efficiency [#5207] - Support writing multiple memtables of a tablet in parallel during import. Improve import efficiency [#5163] - Optimize the creation logic of ZoneMap. When the number of rows on a page is too small, ZoneMap will not be created anymore [#5260] - Added histogram monitoring indicator class on BE [#5148] - When importing Parquet files, if there is a parsing error, the specific file name will be displayed in the error message [#4954] - Optimize the creation logic of dynamic partitions, the table under construction directly triggers the creation of dynamic partitions [#5209] - In the result of the SHOW BACKENDS command, display the real start time of BE [#4872] - Support column names start with @ symbol, mainly used to support mapping ES tables [#5006] - Optimize the logic of the mapping and conversion relationship of the declared columns in the import statement to make the use more clear [#5140] - Optimize the execution logic of colocation join to make the query plan more evenly executed on multiple BE nodes [#5104] - Optimize the predicate pushdown logic, and support pushdown of is null and is not null to the storage engine [#5092] - Optimize the BE node selection logic in bucket join [#5133] - Support UDF in import operation [#4863] ### Other - Added support for IN Predicate in delete statement [#4404] - Update the Dockerfile of the development image and add some new dependencies [#4474] - Fix various spelling errors in the code and documentation [#4714] [#4712] [#4722] [#4723] [#4724] [#4725] [#4726] [#4727] - Added two segment-related indicators in the OlapScanNode of the query profile to display the total number of segments and the number of filtered segments [#4348] - Add batch delete function description document [#4435] - Added Spark Load syntax manual [#4463] - Added the display of cumulative compaction strategy name and rowset data size in BE's /api/compaction/show API [#4466] - Redirect the Spark Launcher log in Spark Load to a separate log file for easy viewing [#4470] - The BE configuration item streaming_load_max_batch_size_mb was renamed streaming_load_json_max_mb to make its meaning more clear [#4791] - Adjust the default value of the FE configuration item thrift_client_timeout_ms to solve the problem of too long access to the information_schema library [#4808] - CPU or memory sampling of BE process is supported on BE web page to facilitate performance debugging [#4632] - Extend the data slicing balance class on the FE side, so that it can extend more balance logic [#4771] - The reorganized OLAP_SCAN_NODE profile information makes the profile clearer and easier to read [#4825] - Added monitoring indicators on the BE side to monitor cancelled Query Fragment [#4862] - Reorganized the profile information of HASH_JOIN_NODE, CROSS_JOIN_NODE, UNION_NODE, ANALYTIC_EVAL_NODE to make the Profile more clear and easy to read [#4878] - Modify the default value of query_colocate_join_memory_limit_penalty_factor to 1 to ensure that the default memory limit of the execution plan fragment is consistent with the user setting during the colocation join operation [#4895] - Added consideration of tablet scanning frequency in the selection of compaction strategy on the BE side [#4837] - Optimize the strategy of sending Query Fragments and reduce the number of sending public attributes to improve query plan scheduling performance [#4904] - Optimized the accuracy of load statistics for unavailable nodes when the query scheduler is scheduling query plans [#4914] - Add the code version information of the FE node in the result of the SHOW FRONTENDS statement [#4943] - Support more column type conversion, such as support conversion from CHAR to numeric type, etc. [#4938] - Import function to identify complex types in Parquet files [#4968] - In the BE monitoring indicators, increase the monitoring of used permits and waiting permits in the compaction logic [#4893] - Optimize the execution time of BE single test [#5131] - Added more JVM-related monitoring items on the FE side [#5112] - Add a session variable to control the timeout period for the transaction to take effect in the insert operation [#5170] - Optimize the logic of selecting scan nodes for query execution plans, and consider all ScanNode nodes in a query [#4984] - Add more system monitoring indicators for FE nodes [#5149] - Use of VLOG in unified BE code [#5264] # Other - Add some non-Apache protocol code protocol declarations to the NOTICE file [#4831] - Reformatted the code of BE using clang-format [#4965] - Added clang-format checking and formatting scripts to unify the C++ code style of BE before submission [#4934] - The third-party library adds the AWS S3 SDK, which can be used to directly read the data in the object storage through the SDK [#5234] - Fixed some issues related to License: [#4371] 1. The dependencies of the two third-party libraries, MySQL client and LZO, will no longer be enabled in the default compilation options. If users need MySQL external table function, they need to turn it on 2. Removed the js and css code in the code library and introduced it in the form of a third-party library dependency - Updated the Docker development environment image build-env-1.2 - Updated the compilation method of the UnixODBC tripartite library, so that the BE process no longer depends on the libltdl.so dynamic library of the system when it is running - Added third-party UDF to support more efficient set calculation of orthogonal bitmap data [#4198] - Added UnixODBC third-party library dependency to support ODBC external table function [#4377] # API Change - Prohibit the creation of segment v1 tables [#4913] - Rename the configuration item `streaming_load_max_batch_size_mb` to `streaming_load_json_max_mb` [#4791] - Support column reference passing in column definition of load statement [#5140] - Support creating indexes on the value column of unique table [#5305] - Support atomic replacement of two tables through replace statement [#4669] - Support CREATE TABLE LIKE statement To get more details please refers to ISSUE https://github.com/apache/incubator-doris/issues/5374 If you have any important feature that are in progress or not merged into the master and related to version 0.14, please reply to me by email.