[ 
https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597434#comment-16597434
 ] 

Piotr Nowojski commented on FLINK-10257:
----------------------------------------

I can think of about 3 solutions to this problem:
 # Drop {{CHAR}} support, and convert all string literals to {{VARCHAR}} in 
SQL. Any attempt to cast a column to {{CHAR}} should return an exception.
 # Provide internal {{CHAR}} support in SQL, while keeping it unsupported in 
Table API
 # Provide internal {{CHAR}} support both for SQL and Table API, potentially 
changing type of string literals in Table API to {{CHAR}} as well to make it 
consistent with SQL

For option 1., we could either:
 * convince Calcite to do this
 * or we would have to rewrite all {{CHAR}} types on our side in all RexNodes 
and RelNodes (similarly, but a bit more complicated to 
{{RelTimeIndicatorConverter}}

For option 2., we would need to properly support {{CHAR}} in all string 
functions and comparisons, with respect to padding. Probably to make things 
more consistent, we should make a contract that either we internally store 
{{CHAR}} always padded or never padded (now it's semi random). For writing to 
connectors we would need to cast all {{CHAR}} columns to {{VARCHAR}} which 
would require trimming. 

For option 3. we would additionally need to add support for {{CHAR}} in Table 
API.

> Incorrect CHAR type support in Flink SQL and Table API
> ------------------------------------------------------
>
>                 Key: FLINK-10257
>                 URL: https://issues.apache.org/jira/browse/FLINK-10257
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>            Reporter: Piotr Nowojski
>            Priority: Critical
>
> Despite that we officially do not support CHAR type, this type is visible and 
> accessible for the users. First of all, string literals have default type of 
> CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.
> Problem is that we do not support CHAR correctly. We mishandle it in:
>  # comparisons and functions
>  # writing values to sinks
> According to SQL standard (and as almost all of the other databases do), CHAR 
> comparisons should ignore white spaces. On the other hand, functions like 
> {{CONCAT}} or {{LENGTH}} shouldn't: 
> [http://troels.arvin.dk/db/rdbms/#data_types-char] .
> Currently in In Flink we completely ignore those rules. Sometimes we store 
> internally CHAR with padded spaces sometimes without. This results with semi 
> random behaviour with respect to comparisons/functions/writing to sinks. For 
> example following query:
>  
> {code:java}
> tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM 
> sourceTable").insertInto("targetTable")
> env.execute()
> {code}
> Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi", 
> "Hello", "Hello world", writes to sink not padded strings (correctly), but 
> following query incorrectly
>  
> {code:java}
> tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) 
> WHERE c = 'Hi'")
>   .insertInto("targetTable")
> env.execute(){code}
> Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}} 
> is a NOOP in Flink, while 'Hi' constant handled by Calcite to us will be 
> padded with 8 spaces.
> On the other hand following query produces strings padded with spaces:
>  
> {code:java}
> tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE 
> 'this should not happen' END FROM sourceTable")
>   .insertInto("targetTable")
> env.execute()
> val expected = Seq(
>   "GERMANY",
>   "POLAND",
>   "POLAND").mkString("\n")
> org.junit.ComparisonFailure: Different elements in arrays: expected 3 
> elements and received 3
> expected: [GERMANY, POLAND, POLAND]
> received: [GERMANY , POLAND , POLAND ]
> {code}
>  
> To make matter even worse, Calcite's constant folding correctly performs 
> comparisons, while if same comparisons are performed by Flink, they yield 
> different results. In other words in SQL:
> {code:java}
> SELECT 'POLAND' = 'POLAND  '
> {code}
> return true, but same expression performed on columns
> {code:java}
> SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM 
> countries{code}
> returns false.
> To further complicated things, in SQL our string literals have {{CHAR}} type, 
> while in Table API our literals have String type (effectively {{VARCHAR}}) 
> making results inconsistent between those two APIs.
>  
> CC [~twalthr] [~fhueske] [~hequn8128]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to