[
https://issues.apache.org/jira/browse/FLINK-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133958#comment-17133958
]
Jark Wu commented on FLINK-17752:
---------------------------------
First of all, after some investigation, I would like to summarize the
differences between [ISO-8601|https://en.wikipedia.org/wiki/ISO_8601] and
[RFC-3339|https://www.ietf.org/rfc/rfc3339.txt] and SQL standard.
- ISO-8601 requires "T" between date and time. But RFC-3339 allows a space
character instead, see [1][2].
- The time zone suffix is optional in ISO-8601 [3] (interpreted as local zone
by default [4]), but is required in RFC-3339 [1].
- SQL standard uses the format "YYYY-MM-DD HH:MM:SS[.fractional]", I didn't
find which section mentions this in standard specifiction, but several
documentation stand with this, e.g. {{TimestampType}} and
[this|https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.esqlc.doc/ids_esqlc_0190.htm].
- ISO-8601 has basic format and extended extended. "1981-04-05" is the extended
format and "19810405" is the basic format.
I have discussed with [~fsk119], our proposal is introducing a new config
option {{'json.timestamp-format'}} and {{'json.time-format'}}, all the valid
values are (default is "SQL"):
The timestamp format behavior works with {{TIMESTAMP}} type:
||{{timestamp-format}} || Data Example || TIMESTAMP [WITHOUT TIME ZONE] ||
| {{SQL}} | "2020-06-12 11:46:22.123456789" | Reading: the same to
{{CAST(string AS TIMESTAMP)}}, Writing: the same to {{CAST(timestamp AS
VARCHAR)}}|
| {{ISO-8601}} | "2020-06-12T11:46:22.123456789" or
"2020-06-12T11:46:22.123456789Z" | Reading: if data contains time zone suffix,
drop the time zone information. Writing: always no time zone suffix |
| {{RFC-3339}} | "2020-06-12T11:46:22.123456789Z" or "2020-06-12
11:46:22.123456789Z" | Not Supported |
| {{milliseconds-since-epoch}} | 1591940560123 | Reading: interprets the data
as number of milliseconds since the epoch {{1970-01-01 00:00:00.000}}, e.g. the
example data is interprets as {{2020-06-12 05:42:40.123}}. Writing: convert
this date-time to the number of milliseconds from the epoch of {{1970-01-01
00:00:00.000}}. |
| {{seconds-since-epoch}} | 1591940560 | Reading: interprets the data as number
of seconds since the epoch {{1970-01-01 00:00:00}}, e.g. the example data is
interprets as {{2020-06-12 05:42:40}}. Writing: convert this date-time to the
number of seconds from the epoch of {{1970-01-01 00:00:00}}. |
The timestamp format behavior works with {{TIMESTAMP WITH LOCAL TIME ZONE}}
type:
||{{timestamp-format}} || Data Example || TIMESTAMP WITH LOCAL TIME ZONE] ||
| {{SQL}} | "2020-06-12 11:46:22.123456789 +08:00" | Reading: the same to
{{CAST(string AS TIMESTAMP WITH LOCAL TIME ZONE)}}, Writing: the same to
{{CAST(timestamp AS VARCHAR)}}|
| {{ISO-8601}} | "2020-06-12T11:46:22.123456789" or
"2020-06-12T11:46:22.123456789Z" |Reading: if data doesn't contain time zone,
do nothing in the data, interpret the time in local timezone, otherwise,
convert the timestamp to local timezone and drop the time zone information.
Writing: always print in UTC zone. |
| {{RFC-3339}} | "2020-06-12T11:46:22.123456789Z" or "2020-06-12
11:46:22.123456789Z" | Reading: if data doesn't contain time zone, do nothing
in the data, interpret the time in local timezone, otherwise, convert the
timestamp to local timezone and drop the time zone information. Writing: always
print in UTC zone with "T" separator. |
| {{milliseconds-since-epoch}} | 1591940560123 | Reading: interprets the data
as number of milliseconds since the epoch {{1970-01-01T00:00:00.000Z}}, e.g.
the example data is interprets as {{2020-06-12T05:42:40.123Z}}. Writing:
convert this date-time to the number of milliseconds from the epoch of
{{1970-01-01T00:00:00.000Z}}. This is the |
| {{seconds-since-epoch}} | 1591940560 | Reading: interprets the data as number
of seconds since the epoch {{1970-01-01T00:00:00Z}}, e.g. the example data is
interprets as {{2020-06-12T05:42:40Z}}. Writing: convert this date-time to the
number of seconds from the epoch of {{1970-01-01T00:00:00Z}}. |
Note: the {{milliseconds-since-epoch}} and {{seconds-since-epoch}} are not
supported in {{'json.time-format'}}, but the others are the same.
What do you think? [~dwysakowicz] [~lzljs3620320]
[1]:
https://medium.com/easyread/understanding-about-rfc-3339-for-datetime-formatting-in-software-engineering-940aa5d5f68a
[2]: https://en.wikipedia.org/wiki/ISO_8601#RFCs
[3]: https://www.cryptosys.net/pki/manpki/pki_iso8601datetime.html
[4]: https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)
> Align the timestamp format with Flink SQL types in JSON format
> --------------------------------------------------------------
>
> Key: FLINK-17752
> URL: https://issues.apache.org/jira/browse/FLINK-17752
> Project: Flink
> Issue Type: Bug
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table
> SQL / Ecosystem
> Reporter: Jark Wu
> Assignee: Shengkai Fang
> Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, we are using RFC3339_TIMESTAMP_FORMAT (which will add timezone at
> the end of string) to as the timestamp format in JSON. However, the string
> representation fo {{TIMESTAMP (WITHOUT TIME ZONE)}} shoudn't adding 'Z' at
> the end.
> Previous discussion:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/TIME-TIMESTAMP-parse-in-Flink-TABLE-SQL-API-td33061.html#a33095
--
This message was sent by Atlassian Jira
(v8.3.4#803005)