LucaCappelletti94 opened a new issue, #2237:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2237

   While building a parser correctness benchmark using libpg_query 
(`pg_query.rs`) as the PostgreSQL ground truth, we measured how often 
`PostgreSqlDialect` accepts SQL that real PostgreSQL rejects. The numbers are 
surprisingly high.
   
   Against SQL extracted from the sqlparser-rs test suite itself:
   
   - **28.7%** of statements rejected by pg_query are silently accepted by 
`PostgreSqlDialect` (37/129, PostgreSQL-specific test file)
   - **30.0%** in the broader common-dialect test file (141/470)
   
   We understand sqlparser-rs is intentionally permissive. The question is: 
**is this level of permissiveness intentional for `PostgreSqlDialect`, or is it 
leakage that would be worth tightening?**
   
   ## Examples of what `PostgreSqlDialect` currently accepts
   
   A selection from the 141 cases found, grouped by the dialect the syntax 
originates from:
   
   ```sql
   -- Oracle
   FETCH NEXT IN my_cursor INTO result_table   -- INTO clause on FETCH
   
   -- SQL Server / T-SQL
   SELECT TOP 3 * FROM tbl
   EXEC my_proc N'param'
   MERGE … OUTPUT inserted.* INTO log_target
   EXECUTE FUNCTION f                          -- trigger EXECUTE without ()
   
   -- MySQL / MariaDB
   INSERT customer VALUES (1, 2, 3)            -- missing INTO
   INSERT OR REPLACE INTO t (id) VALUES(1)
   DROP FUNCTION IF EXISTS f(a INTEGER, IN b INTEGER = 1)  -- defaults in DROP
   
   -- Snowflake / BigQuery
   SELECT i FROM qt QUALIFY ROW_NUMBER() OVER (...) = 1
   CREATE OR REPLACE TABLE t (a INT)
   CREATE OR REPLACE USER IF NOT EXISTS u1 PASSWORD='secret'
   
   -- ClickHouse
   ALTER TABLE t ON CLUSTER my_cluster ADD CONSTRAINT bar PRIMARY KEY (baz)
   
   -- HiveQL
   ALTER TABLE t SET TBLPROPERTIES('classification' = 'parquet')
   
   -- Unclear origin / possibly over-permissive parsing
   ALTER TABLE t ALTER COLUMN id ADD GENERATED AS IDENTITY  -- missing 
ALWAYS/BY DEFAULT
   COPY t FROM 'f.csv' BINARY DELIMITER ',' CSV HEADER      -- mutually 
exclusive formats
   SHOW search_path search_path                             -- duplicate 
trailing token
   ```
   
   Happy to help with PRs if the direction is clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to