[ 
https://issues.apache.org/jira/browse/FLINK-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133958#comment-17133958
 ] 

Jark Wu commented on FLINK-17752:
---------------------------------

First of all, after some investigation, I would like to summarize the 
differences between [ISO-8601|https://en.wikipedia.org/wiki/ISO_8601] and 
[RFC-3339|https://www.ietf.org/rfc/rfc3339.txt] and SQL standard. 

- ISO-8601 requires "T" between date and time. But RFC-3339 allows a space 
character instead, see [1][2].
- The time zone suffix is optional in ISO-8601 [3] (interpreted as local zone 
by default [4]), but is required in RFC-3339 [1].
- SQL standard uses the format "YYYY-MM-DD HH:MM:SS[.fractional]", I didn't 
find which section mentions this in standard specifiction, but several 
documentation stand with this, e.g. {{TimestampType}} and 
[this|https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.esqlc.doc/ids_esqlc_0190.htm].
 
- ISO-8601 has basic format and extended extended. "1981-04-05" is the extended 
format and "19810405" is the basic format.

I have discussed with [~fsk119], our proposal is introducing a new config 
option {{'json.timestamp-format'}} and {{'json.time-format'}}, all the valid 
values are (default is "SQL"):

The timestamp format behavior works with {{TIMESTAMP}} type:

||{{timestamp-format}} || Data Example || TIMESTAMP [WITHOUT TIME ZONE] || 
| {{SQL}} | "2020-06-12 11:46:22.123456789" | Reading: the same to 
{{CAST(string AS TIMESTAMP)}}, Writing: the same to {{CAST(timestamp AS 
VARCHAR)}}| 
| {{ISO-8601}} | "2020-06-12T11:46:22.123456789" or 
"2020-06-12T11:46:22.123456789Z" | Reading: if data contains time zone suffix, 
drop the time zone information. Writing: always no time zone suffix   | 
| {{RFC-3339}} | "2020-06-12T11:46:22.123456789Z" or  "2020-06-12 
11:46:22.123456789Z" | Not Supported | 
| {{milliseconds-since-epoch}} | 1591940560123 | Reading: interprets the data 
as number of milliseconds since the epoch {{1970-01-01 00:00:00.000}}, e.g. the 
example data is interprets as {{2020-06-12 05:42:40.123}}. Writing: convert 
this date-time to the number of milliseconds from the epoch of {{1970-01-01 
00:00:00.000}}. |
| {{seconds-since-epoch}} | 1591940560 | Reading: interprets the data as number 
of seconds since the epoch {{1970-01-01 00:00:00}}, e.g. the example data is 
interprets as {{2020-06-12 05:42:40}}. Writing: convert this date-time to the 
number of seconds from the epoch of {{1970-01-01 00:00:00}}. |

The timestamp format behavior works with {{TIMESTAMP WITH LOCAL TIME ZONE}} 
type:

||{{timestamp-format}} || Data Example || TIMESTAMP WITH LOCAL TIME ZONE] || 
| {{SQL}} | "2020-06-12 11:46:22.123456789 +08:00" | Reading: the same to 
{{CAST(string AS TIMESTAMP WITH LOCAL TIME ZONE)}}, Writing: the same to 
{{CAST(timestamp AS VARCHAR)}}| 
| {{ISO-8601}} | "2020-06-12T11:46:22.123456789" or 
"2020-06-12T11:46:22.123456789Z" |Reading: if data doesn't contain time zone, 
do nothing in the data, interpret the time in local timezone, otherwise, 
convert the timestamp to local timezone and drop the time zone information. 
Writing: always print in UTC zone. |
| {{RFC-3339}} | "2020-06-12T11:46:22.123456789Z" or  "2020-06-12 
11:46:22.123456789Z" | Reading: if data doesn't contain time zone, do nothing 
in the data, interpret the time in local timezone, otherwise, convert the 
timestamp to local timezone and drop the time zone information. Writing: always 
print in UTC zone with "T" separator.  | 
| {{milliseconds-since-epoch}} | 1591940560123 |  Reading: interprets the data 
as number of milliseconds since the epoch {{1970-01-01T00:00:00.000Z}}, e.g. 
the example data is interprets as {{2020-06-12T05:42:40.123Z}}. Writing: 
convert this date-time to the number of milliseconds from the epoch of 
{{1970-01-01T00:00:00.000Z}}. This is the  |
| {{seconds-since-epoch}} | 1591940560 | Reading: interprets the data as number 
of seconds since the epoch {{1970-01-01T00:00:00Z}}, e.g. the example data is 
interprets as {{2020-06-12T05:42:40Z}}. Writing: convert this date-time to the 
number of seconds from the epoch of {{1970-01-01T00:00:00Z}}. |

Note: the {{milliseconds-since-epoch}} and {{seconds-since-epoch}} are not 
supported in {{'json.time-format'}}, but the others are the same. 

What do you think? [~dwysakowicz] [~lzljs3620320]

[1]: 
https://medium.com/easyread/understanding-about-rfc-3339-for-datetime-formatting-in-software-engineering-940aa5d5f68a
[2]: https://en.wikipedia.org/wiki/ISO_8601#RFCs
[3]: https://www.cryptosys.net/pki/manpki/pki_iso8601datetime.html
[4]: https://en.wikipedia.org/wiki/ISO_8601#Local_time_(unqualified)

> Align the timestamp format with Flink SQL types in JSON format
> --------------------------------------------------------------
>
>                 Key: FLINK-17752
>                 URL: https://issues.apache.org/jira/browse/FLINK-17752
>             Project: Flink
>          Issue Type: Bug
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>            Reporter: Jark Wu
>            Assignee: Shengkai Fang
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> Currently, we are using RFC3339_TIMESTAMP_FORMAT (which will add timezone at 
> the end of string) to as the timestamp format in JSON. However, the string 
> representation fo {{TIMESTAMP (WITHOUT TIME ZONE)}} shoudn't adding 'Z' at 
> the end. 
> Previous discussion: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/TIME-TIMESTAMP-parse-in-Flink-TABLE-SQL-API-td33061.html#a33095



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to