Re: [DISCUSS] String literal behavior in Flink

Aitozi Sun, 05 Mar 2023 19:51:37 -0800

Hi Jark,

Thank you for your helpful suggestion. It appears that 'E'foo\n'' is a more
versatile and widely accepted option. To assess its feasibility, I have
reviewed the relevant Unicode supports and concluded that it may
necessitate modifications to the Parser.jj file to accommodate this new
syntax.



I am unsure whether we should initially incorporate this alteration in
Calcite or if we can directly supersede the StringLiteral behavior within
the Flink project. Nevertheless, I believe supporting this change is
achievable.



Thanks,
Aitozi.

Jark Wu <imj...@gmail.com> 于2023年3月6日周一 10:16写道：

> Hi Aitozi,
>
> I think this is a good idea to improve the backslash escape strings.
> However, I lean a bit more toward the Postgres approach[1],
> which is more standard-compliant. PG allows backslash escape
> string by writing the letter E (upper or lower case) just before the
> opening single quote, e.g., E'foo\n'.
>
> Recognizing backslash escapes in both regular and escape string constants
> is not backward compatible in Flink, and is also deprecated in PG.
>
> In addition, Flink also supports Unicode escape string constants by
> writing the U& before the quote[1] which works in the same way with
> backslash escape string.
>
> Best,
> Jark
>
> [1]:
>
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS
> [2]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/overview/
>
> On Sat, 4 Mar 2023 at 23:31, Aitozi <gjying1...@gmail.com> wrote:
>
> > Hi,
> >   I encountered a problem when using string literal in Flink. Currently,
> > Flink will escape the string literal during codegen, so for the query
> > below:
> >
> > SELECT 'a\nb'; it will print => a\nb
> >
> > then for the query
> >
> > SELECT SPLIT_INDEX(col, '\n', 0);
> >
> > The col can not split by the newline. If we want to split by the newline,
> > we should use
> >
> > SELECT SPLIT_INDEX(col, '
> > ', 0)
> >
> > or
> >
> > SELECT SPLIT_INDEX(col, CHR(10), 0)
> >
> > The above way could be more intuitive. Some other databases support these
> > "Special Character Escape Sequences"[1].
> >
> > In this way, we can directly use
> > SELECT SPLIT_INDEX(col, '\n', 0); for the query.
> >
> > I know this is not standard behavior in ANSI SQL. I'm opening this thread
> > for some opinions from the community guys.
> >
> > [1]:
> >
> >
> https://dev.mysql.com/doc/refman/8.0/en/string-literals.html#character-escape-sequences
> >
> > Thanks,
> > Aitozi
> >
>

Re: [DISCUSS] String literal behavior in Flink

Reply via email to