Yu Xu created CALCITE-6988:
------------------------------

             Summary: Support DuckDB Dialect
                 Key: CALCITE-6988
                 URL: https://issues.apache.org/jira/browse/CALCITE-6988
             Project: Calcite
          Issue Type: New Feature
          Components: core
    Affects Versions: 1.39.0
            Reporter: Yu Xu
            Assignee: Yu Xu
             Fix For: 1.40.0


DuckDB is a very popular high-performance analytical database 
system(https://github.com/duckdb/duckdb), many users are using DuckDB in their 
work but Calcite had not supported with it yet.


I had relevant materials about mainly syntax difference between default dialect 
and DuckDB (from document : https://duckdb.org/docs/stable/ ):

*1. NullCollation is LAST mode, Nulls last for DESC and ASC, test as follow:*
D select * from t3 order by data asc;
┌───────┬─────────┐
│  id   │  data   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ aa      │
│     1 │ bb      │
│     1 │         │
└───────┴─────────┘
D select * from t3 order by data desc;
┌───────┬─────────┐
│  id   │  data   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ bb      │
│     1 │ aa      │
│     1 │         │
└───────┴─────────┘

 

*2. max precision of decimal is 38(we can refer to 
https://duckdb.org/docs/stable/sql/data_types/numeric#fixed-point-decimals)* 

 

*3. IdentifierQuoteString is ", test for it:*
test is OK:
D SELECT DATETRUNC('second', "d") from t5;
┌────────────────────────┐
│ datetrunc('second', d) │
│       timestamp        │
├────────────────────────┤
│ 2025-01-22 12:21:23    │
│ 2025-01-22 12:21:25    │
└────────────────────────┘

ERROR result:
D SELECT DATETRUNC('second', 'd') from t5;
Binder Error: Could not choose a best candidate function for the function call 
"datetrunc(STRING_LITERAL, STRING_LITERAL)". In order to select one, please add 
explicit type casts.
        Candidate functions:
        datetrunc(VARCHAR, DATE) -> TIMESTAMP
        datetrunc(VARCHAR, INTERVAL) -> INTERVAL
        datetrunc(VARCHAR, TIMESTAMP WITH TIME ZONE) -> TIMESTAMP WITH TIME ZONE
        datetrunc(VARCHAR, TIMESTAMP) -> TIMESTAMP

LINE 1: SELECT DATETRUNC('second', 'd') from t5;
               ^
D SELECT DATETRUNC('second', `d`) from t5;
Catalog Error: Scalar Function with name `__postfix does not exist!
Did you mean "!__postfix"?
LINE 1: SELECT DATETRUNC('second', `d`) from t5;

 

*4. support approx_count_distinct function:*
D select approx_count_distinct("id") from t3 ;
┌───────────────────────────┐
│ approx_count_distinct(id) │
│           int64           │
├───────────────────────────┤
│                         1 │
└───────────────────────────┘

*5. Not support character_length/char_length, need use length function to 
support it:*
D select length('aa');
┌──────────────┐
│ length('aa') │
│    int64     │
├──────────────┤
│            2 │
└──────────────┘

D select character_length('aa');
Catalog Error: Scalar Function with name character_length does not exist!
Did you mean "array_length"?
LINE 1: select character_length('aa');
               ^
D select char_length('aa');
Catalog Error: Scalar Function with name char_length does not exist!
Did you mean "array_length"?
LINE 1: select char_length('aa');
               ^


*6. Map construncor method need convert () to {}, test for it:*
result is OK:

D select MAP \{'a':1, 'b':2};
┌────────────────────────────────────────────────────────────┐
│ main.map(main.list_value('a', 'b'), main.list_value(1, 2)) │
│                   map(varchar, integer)                    │
├────────────────────────────────────────────────────────────┤
│ \{a=1, b=2}                                                 │
└────────────────────────────────────────────────────────────┘

Error result:

D select MAP ('a':1, 'b':2);
Parser Error: syntax error at or near ":"
LINE 1: select MAP ('a':1, 'b':2);

 

*7. floor date should convert datetrunc function(can refer to: 
https://duckdb.org/docs/stable/sql/functions/date#date_truncpart-date)*
D SELECT DATETRUNC('year', "d") from t5;
┌──────────────────────┐
│ datetrunc('year', d) │
│         date         │
├──────────────────────┤
│ 2025-01-01           │
│ 2025-01-01           │
└──────────────────────┘
D SELECT DATETRUNC('month', "d") from t5;
┌───────────────────────┐
│ datetrunc('month', d) │
│         date          │
├───────────────────────┤
│ 2025-01-01            │
│ 2025-01-01            │
└───────────────────────┘
D SELECT DATETRUNC('day', "d") from t5;
┌─────────────────────┐
│ datetrunc('day', d) │
│        date         │
├─────────────────────┤
│ 2025-01-22          │
│ 2025-01-22          │
└─────────────────────┘
D SELECT DATETRUNC('week', "d") from t5;
┌──────────────────────┐
│ datetrunc('week', d) │
│         date         │
├──────────────────────┤
│ 2025-01-20           │
│ 2025-01-20           │
└──────────────────────┘
D SELECT DATETRUNC('millisecond', "d") from t5;
┌─────────────────────────────┐
│ datetrunc('millisecond', d) │
│          timestamp          │
├─────────────────────────────┤
│ 2025-01-22 12:21:23.123     │
│ 2025-01-22 12:21:25.123     │
└─────────────────────────────┘
D SELECT DATETRUNC('microseconds', "d") from t5;
┌──────────────────────────────┐
│ datetrunc('microseconds', d) │
│          timestamp           │
├──────────────────────────────┤
│ 2025-01-22 12:21:23.123      │
│ 2025-01-22 12:21:25.123223   │
└──────────────────────────────┘

for these differences we should make a new DuckDB dialect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to