Re: [DISCUSS] String literal behavior in Flink

Jark Wu Sun, 05 Mar 2023 23:11:29 -0800

Hi Aitozi,

I would suggest trying to contribute it to the upstream project Calcite first.


Best,
Jark

> 2023年3月6日 11:51，Aitozi <gjying1...@gmail.com> 写道：
> 
> Hi Jark,
> 
> Thank you for your helpful suggestion. It appears that 'E'foo\n'' is a more
> versatile and widely accepted option. To assess its feasibility, I have
> reviewed the relevant Unicode supports and concluded that it may
> necessitate modifications to the Parser.jj file to accommodate this new
> syntax.
> 
> 
> I am unsure whether we should initially incorporate this alteration in
> Calcite or if we can directly supersede the StringLiteral behavior within
> the Flink project. Nevertheless, I believe supporting this change is
> achievable.
> 
> 
> 
> Thanks,
> Aitozi.
> 
> Jark Wu <imj...@gmail.com> 于2023年3月6日周一 10:16写道：
> 
>> Hi Aitozi,
>> 
>> I think this is a good idea to improve the backslash escape strings.
>> However, I lean a bit more toward the Postgres approach[1],
>> which is more standard-compliant. PG allows backslash escape
>> string by writing the letter E (upper or lower case) just before the
>> opening single quote, e.g., E'foo\n'.
>> 
>> Recognizing backslash escapes in both regular and escape string constants
>> is not backward compatible in Flink, and is also deprecated in PG.
>> 
>> In addition, Flink also supports Unicode escape string constants by
>> writing the U& before the quote[1] which works in the same way with
>> backslash escape string.
>> 
>> Best,
>> Jark
>> 
>> [1]:
>> 
>> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS
>> [2]:
>> 
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/overview/
>> 
>> On Sat, 4 Mar 2023 at 23:31, Aitozi <gjying1...@gmail.com> wrote:
>> 
>>> Hi,
>>>  I encountered a problem when using string literal in Flink. Currently,
>>> Flink will escape the string literal during codegen, so for the query
>>> below:
>>> 
>>> SELECT 'a\nb'; it will print => a\nb
>>> 
>>> then for the query
>>> 
>>> SELECT SPLIT_INDEX(col, '\n', 0);
>>> 
>>> The col can not split by the newline. If we want to split by the newline,
>>> we should use
>>> 
>>> SELECT SPLIT_INDEX(col, '
>>> ', 0)
>>> 
>>> or
>>> 
>>> SELECT SPLIT_INDEX(col, CHR(10), 0)
>>> 
>>> The above way could be more intuitive. Some other databases support these
>>> "Special Character Escape Sequences"[1].
>>> 
>>> In this way, we can directly use
>>> SELECT SPLIT_INDEX(col, '\n', 0); for the query.
>>> 
>>> I know this is not standard behavior in ANSI SQL. I'm opening this thread
>>> for some opinions from the community guys.
>>> 
>>> [1]:
>>> 
>>> 
>> https://dev.mysql.com/doc/refman/8.0/en/string-literals.html#character-escape-sequences
>>> 
>>> Thanks,
>>> Aitozi
>>> 
>>

Re: [DISCUSS] String literal behavior in Flink

Reply via email to