LucaCappelletti94 opened a new issue, #2237:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2237
While building a parser correctness benchmark using libpg_query
(`pg_query.rs`) as the PostgreSQL ground truth, we measured how often
`PostgreSqlDialect` accepts SQL that real PostgreSQL rejects. The numbers are
surprisingly high.
Against SQL extracted from the sqlparser-rs test suite itself:
- **28.7%** of statements rejected by pg_query are silently accepted by
`PostgreSqlDialect` (37/129, PostgreSQL-specific test file)
- **30.0%** in the broader common-dialect test file (141/470)
We understand sqlparser-rs is intentionally permissive. The question is:
**is this level of permissiveness intentional for `PostgreSqlDialect`, or is it
leakage that would be worth tightening?**
## Examples of what `PostgreSqlDialect` currently accepts
A selection from the 141 cases found, grouped by the dialect the syntax
originates from:
```sql
-- Oracle
FETCH NEXT IN my_cursor INTO result_table -- INTO clause on FETCH
-- SQL Server / T-SQL
SELECT TOP 3 * FROM tbl
EXEC my_proc N'param'
MERGE … OUTPUT inserted.* INTO log_target
EXECUTE FUNCTION f -- trigger EXECUTE without ()
-- MySQL / MariaDB
INSERT customer VALUES (1, 2, 3) -- missing INTO
INSERT OR REPLACE INTO t (id) VALUES(1)
DROP FUNCTION IF EXISTS f(a INTEGER, IN b INTEGER = 1) -- defaults in DROP
-- Snowflake / BigQuery
SELECT i FROM qt QUALIFY ROW_NUMBER() OVER (...) = 1
CREATE OR REPLACE TABLE t (a INT)
CREATE OR REPLACE USER IF NOT EXISTS u1 PASSWORD='secret'
-- ClickHouse
ALTER TABLE t ON CLUSTER my_cluster ADD CONSTRAINT bar PRIMARY KEY (baz)
-- HiveQL
ALTER TABLE t SET TBLPROPERTIES('classification' = 'parquet')
-- Unclear origin / possibly over-permissive parsing
ALTER TABLE t ALTER COLUMN id ADD GENERATED AS IDENTITY -- missing
ALWAYS/BY DEFAULT
COPY t FROM 'f.csv' BINARY DELIMITER ',' CSV HEADER -- mutually
exclusive formats
SHOW search_path search_path -- duplicate
trailing token
```
Happy to help with PRs if the direction is clear.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]