morrySnow opened a new issue, #55502:
URL: https://github.com/apache/doris/issues/55502

   Thanks to our devoted developers and supportive community users, the 
much-expected Apache Doris 3.1.0 is now available!
   
   ## VARIANT
   
   ### Sparse Columns and Sub-columns With Vertical Compaction
   
   Traditional OLAP systems often encounter metadata bloat, compaction 
amplification, and query degradation when dealing with "extremely wide 
tables/excessive columns" (ranging from thousands to tens of thousands). Doris 
3.1 leverages the sparsity of VARIANT sub-columns and sub-column-level Vertical 
Compaction to increase the manageable column limit to the order of tens of 
thousands.
   
   Through in-depth optimizations at the storage layer, VARIANT delivers the 
following benefits to users:
   
   - Stable support for "thousands to tens of thousands" of sub-columns 
(columnar storage), with smoother query and compaction latencies.
   - Controllable metadata and indexes, avoiding exponential growth.
   - Proven capability to extract over 10,000 sub-columns (columnar storage) 
with efficient Compaction performance.
   
   ### Schema Template
   
   Using Schema Template provides the following benefits when working with the 
VARIANT data type:
   
   - Type Stability: Critical sub-paths can have their types fixed in the DDL, 
preventing query errors, index invalidation, and overhead from implicit 
conversions caused by type drift.
   - Faster and More Accurate Retrieval: Inverted indexing strategies 
(tokenized/non-tokenized, parsers, phrase search, etc.) can be customized for 
different sub-paths, resulting in lower latency and more stable hit rates for 
common queries.
   - Controllable Indexing and Costs: Moves away from "uniform column-wide 
index inheritance" (an approach in 2.1 that easily leads to bloat) to 
"fine-grained configuration by sub-path," significantly reducing the number of 
indexes, write amplification, and storage costs.
   - Improved Maintainability and Collaboration: Equivalent to adding a "data 
contract" to JSON, ensuring semantic consistency across teams; type and index 
states are more observable, making issues easier to diagnose.
   - Evolution-Friendly: Core high-frequency paths can be templated with 
optional indexing, while long-tail fields retain flexible extensibility, 
preserving scalability.
   
   ## Inverted Index
   
   ### Inverted Index Storage Format V3
   
   Further storage optimizations compared to V2
   
   Index files are smaller, reducing disk usage and I/O overhead. Based on test 
results from the httplogs and logsbench datasets, storage space can be reduced 
by up to 20% with V3, making it ideal for large-scale text data and log 
analytics scenarios.
   
   ### New Tokenizers
   
   - ICU(International Components for Unicode) Tokenizer - Internationalized 
text containing complex writing systems, particularly suitable for multilingual 
mixed documents
   - IK Tokenizer - Chinese Tokenizer, Advanced algorithm-based Chinese 
tokenization, combining dictionary and statistical models
   - Basic Tokenizer - Basic tokenization, using character type recognition for 
segmentation
   
   ### Custom Tokenizer
   The custom tokenization feature is introduced to allow users to customize 
combinations according to their specific tokenization needs, further improving 
text retrieval recall. Custom tokenization overcomes the limitations of 
built-in tokenizers by enabling the combination of character filters, 
tokenizers, and token filters based on specific requirements, precisely 
defining how text is segmented into searchable terms, directly determining the 
relevance of search results and the accuracy of data analysis.
   
   ## LakeHouse
   
   ### Asynchronous Materialized Views Fully Support Data Lakes
   In version 3.1, asynchronous materialized views fully support partitioned 
incremental building and partition transparent rewriting for Paimon, Iceberg, 
and Hudi.
   
   ### Iceberg
   Version 3.1.0 introduces multiple optimizations and enhanced capabilities 
for the Iceberg table format, closely advancing integration with Iceberg's 
latest features.
   
   - Supports full lifecycle management of Branches and Tags
   - Supports querying Iceberg system tables
   - Supports querying Iceberg views
   - Supports modifying Iceberg table schema via ALTER statements
   
   ### Paimon
   Version 3.1.0 introduces multiple feature updates and capability 
enhancements for the Paimon table format, based on real user scenarios.
   
   - Supports Paimon Batch Incremental Query
   - Supports reading Branches and Tags
   - Supports querying Paimon system tables
   
   ### DataLake Query Perfermance
   Version 3.1.0 introduces multiple deep optimizations for query performance 
on data lake table formats, aiming to provide users with more stable and 
efficient data lake analytics capabilities in real production environments.
   
   - Dynamic Partition Pruning
   - Batch Shard Execution
   
   ## Storage
   
   - Flexible Column Updates
   - Optimizes MOW performance in high-concurrency scenarios in Compute-Storage 
Decoupled Mode
   
   ## Query Perfermance
   
   - Enhanced partition pruning performance and expanded applicability
   - Provides the capability to optimize queries leveraging data characteristics
   
   ## Behavior Changed
   
   ### VARIANT
   
   - variant_max_subcolumns_count constraint. Within the same table, the 
variant_max_subcolumns_count setting for all Variant columns must be either all 
0 or all greater than 0. Mixing these values will result in an error during 
table creation or schema change.
   - The new VARIANT read/write/serde and Compaction paths are compatible with 
existing data. However, queries on VARIANT data upgraded from older versions 
may exhibit format differences (e.g., additional whitespace, or the use of the 
'.' delimiter causing unintended hierarchical structure creation, resulting in 
extra levels).
   - When creating an Inverted Index on a VARIANT column, if no fields in the 
data meet the indexing criteria, an empty index file will still be generated. 
This is the expected behavior.
   
   ### Permissions
   - The permission requirement for "SHOW TRANSACTION" has been changed from 
requiring ADMIN_PRIV to requiring LOAD_PRIV on the corresponding database for 
imports.
   - The permissions for SHOW FRONTENDS / BACKENDS and the NODE Restful API 
have been unified. Access to these interfaces now requires SELECT_PRIV on the 
information_schema database.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to