TeslaCN commented on code in PR #19784: URL: https://github.com/apache/shardingsphere/pull/19784#discussion_r935205605
########## docs/blog/content/material/2022_03_09_SQL_Parse_Format_Function_A _Technical_Deep_Dive_by_Apache_ShardingSphere.en.md: ########## @@ -0,0 +1,282 @@ ++++ +title = "SQL Parse Format Function — A Technical Deep Dive by Apache ShardingSphere +" +chapter = true ++++ + +Complicted SQL statements are some of the most common problems that data scientists and engineers encounter. For example, can you comprehend at first glance the complex SQL statement below? + +``` +select a.order_id,a.status,sum(b.money) as money from t_order a inner join (select c.order_id as order_id, c.number * d.price as money from t_order_detail c inner join t_order_price d on c.s_id = d.s_id) b on a.order_id = b.order_id where b.money > 100 group by a.order_id +``` + +How about formatting it? Is it easier to understand the formatted formatted version below? + +``` +SELECT a . order_id , a . status , SUM(b . money) AS money +FROM t_order a INNER JOIN +( +SELECT c . order_id AS order_id, c . number * d . price AS money +FROM t_order_detail c INNER JOIN t_order_price d ON c . s_id = d . s_id +) b ON a . order_id = b . order_id +WHERE +b . money > 100 +GROUP BY a . order_id; +``` + +The first step to parse such a complex SQL is always formatting, and then its SQL semantics can be parsed based on the formatted content. SQL Formatter is, therefore, one of the essential functions of for any database software. + +Accordingly, [Apache ShardingSphere](https://shardingsphere.apache.org/) now offers a SQL formatting tool called SQL Parse Format that depends on ShardingSphere’s SQL dialect parser. + +**SQL Parse Format is an important function of the ShardingSphere Parser Engine, and also lays the foundation for ShardingSphere’s SQL Audit (TODO).** This article offers a deep dive into the SQL Parse Format function: + +- What’s its core concept? +- How you can use it? +- How can you develop SQL Parse Format? + +## Parser Engine + +To begin, we need to introduce more about Apache ShardingSphere’s Parser Engine because SQL Parse Format is a unique and relatively independent function of the parser engine. + +Apache ShardingSphere developed the parser engine to extract key information in SQL, such as fields of data shards and rewritten columns for data encryption. So far, Apache ShardingSphere’s parser engine has undergone three iterations. + +The initial parser engine leveraged [Druid](https://druid.apache.org/) as its SQL parser and performed quite well before ShardingSphere Version 1.4.x. + +Later the ShardingSphere community decided to develop its second-generation parser engine on its own. Since the use purpose was changed, ShardingSphere adopted another approach to comprehend SQL: only the contextual information that data sharding needs was extracted, without generating a parse tree or a secondary traversal, to improve performance and compatibility. + +Currently, the third generation of ShardingSphere Parser Engine uses [ANTLR](https://www.antlr.org/) as the parse tree generator and then extracts the contextual information by doing a secondary tree traversal. It is substantially compatible with more SQL dialects, which further accelerates developing other functions in Apache ShardingSphere. + +In version 5.0.x, ShardingSphere developers further enhanced the performance of the newest parser engine by changing its tree traversal method from Listener to Visitor and adding parsing results cache for pre-compiled SQL statements. + +The implementation of SQL Parse Format is attributable to the new parser engine. Next, let’s take a look at SQL Parse Format function. + +## SQL Parser Format +SQL Parse Format is used to format SQL statements. Additionally, SQL Parse Format function will be used in SQL Audit in the future to provide users with viewing SQL history, displaying formatted SQL with reports, or further analyzing or processing SQL. + +For instance, each part of the following SQL formatted by SQL Parse Format becomes clearer with wrapping and keywords in all caps: + +``` Review Comment: ```suggestion ```sql ``` ########## docs/blog/content/material/2022_03_09_SQL_Parse_Format_Function_A _Technical_Deep_Dive_by_Apache_ShardingSphere.en.md: ########## @@ -0,0 +1,282 @@ ++++ +title = "SQL Parse Format Function — A Technical Deep Dive by Apache ShardingSphere +" +chapter = true ++++ + +Complicted SQL statements are some of the most common problems that data scientists and engineers encounter. For example, can you comprehend at first glance the complex SQL statement below? + +``` +select a.order_id,a.status,sum(b.money) as money from t_order a inner join (select c.order_id as order_id, c.number * d.price as money from t_order_detail c inner join t_order_price d on c.s_id = d.s_id) b on a.order_id = b.order_id where b.money > 100 group by a.order_id +``` + +How about formatting it? Is it easier to understand the formatted formatted version below? + +``` +SELECT a . order_id , a . status , SUM(b . money) AS money +FROM t_order a INNER JOIN +( +SELECT c . order_id AS order_id, c . number * d . price AS money +FROM t_order_detail c INNER JOIN t_order_price d ON c . s_id = d . s_id +) b ON a . order_id = b . order_id +WHERE +b . money > 100 +GROUP BY a . order_id; +``` + +The first step to parse such a complex SQL is always formatting, and then its SQL semantics can be parsed based on the formatted content. SQL Formatter is, therefore, one of the essential functions of for any database software. + +Accordingly, [Apache ShardingSphere](https://shardingsphere.apache.org/) now offers a SQL formatting tool called SQL Parse Format that depends on ShardingSphere’s SQL dialect parser. + +**SQL Parse Format is an important function of the ShardingSphere Parser Engine, and also lays the foundation for ShardingSphere’s SQL Audit (TODO).** This article offers a deep dive into the SQL Parse Format function: + +- What’s its core concept? +- How you can use it? +- How can you develop SQL Parse Format? + +## Parser Engine + +To begin, we need to introduce more about Apache ShardingSphere’s Parser Engine because SQL Parse Format is a unique and relatively independent function of the parser engine. + +Apache ShardingSphere developed the parser engine to extract key information in SQL, such as fields of data shards and rewritten columns for data encryption. So far, Apache ShardingSphere’s parser engine has undergone three iterations. + +The initial parser engine leveraged [Druid](https://druid.apache.org/) as its SQL parser and performed quite well before ShardingSphere Version 1.4.x. + +Later the ShardingSphere community decided to develop its second-generation parser engine on its own. Since the use purpose was changed, ShardingSphere adopted another approach to comprehend SQL: only the contextual information that data sharding needs was extracted, without generating a parse tree or a secondary traversal, to improve performance and compatibility. + +Currently, the third generation of ShardingSphere Parser Engine uses [ANTLR](https://www.antlr.org/) as the parse tree generator and then extracts the contextual information by doing a secondary tree traversal. It is substantially compatible with more SQL dialects, which further accelerates developing other functions in Apache ShardingSphere. + +In version 5.0.x, ShardingSphere developers further enhanced the performance of the newest parser engine by changing its tree traversal method from Listener to Visitor and adding parsing results cache for pre-compiled SQL statements. + +The implementation of SQL Parse Format is attributable to the new parser engine. Next, let’s take a look at SQL Parse Format function. + +## SQL Parser Format +SQL Parse Format is used to format SQL statements. Additionally, SQL Parse Format function will be used in SQL Audit in the future to provide users with viewing SQL history, displaying formatted SQL with reports, or further analyzing or processing SQL. + +For instance, each part of the following SQL formatted by SQL Parse Format becomes clearer with wrapping and keywords in all caps: + +``` +select age as b, name as n from table1 join table2 where id = 1 and name = 'lu'; +-- After Formatting +SELECT age AS b, name AS n +FROM table1 JOIN table2 +WHERE + id = 1 + and name = 'lu'; +``` + +So far, we have covered the basics of the SQL Parse Format. + +> Next, let’s answer the question: what is the concept of SQL Parse Format? + +How a SQL statement is formatted in Apache ShardingSphere? Take the following SQL as an example: + +``` Review Comment: ```suggestion ```sql ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
