dawidwys commented on a change in pull request #7777: [FLINK-9964][table] Add a full RFC-compliant CSV table format factory URL: https://github.com/apache/flink/pull/7777#discussion_r258561739
########## File path: docs/dev/table/connect.md ########## @@ -731,24 +757,73 @@ The CSV format allows to read and write comma-separated rows. {% highlight yaml %} format: type: csv - fields: # required: ordered format fields - - name: field1 - type: VARCHAR - - name: field2 - type: TIMESTAMP - field-delimiter: "," # optional: string delimiter "," by default - line-delimiter: "\n" # optional: string delimiter "\n" by default - quote-character: '"' # optional: single character for string values, empty by default - comment-prefix: '#' # optional: string to indicate comments, empty by default - ignore-first-line: false # optional: boolean flag to ignore the first line, by default it is not skipped - ignore-parse-errors: true # optional: skip records with parse error instead of failing by default + + # required: define the schema either by using type information + schema: "ROW(lon FLOAT, rideTime TIMESTAMP)" + + # or use the table's schema + derive-schema: true + + field-delimiter: ";" # optional: field delimiter character (',' by default) + line-delimiter: "\r\n" # optional: line delimiter ("\n" by default; otherwise "\r" or "\r\n" are allowed) + quote-character: "'" # optional: quote character for enclosing field values ('"' by default) + allow-comments: true # optional: ignores comment lines that start with "#" (disabled by default) + ignore-parse-errors: true # optional: skip fields and rows with parse errors instead of failing; + # fields are set to null in case of errors + array-element-delimiter: "|" # optional: the array element delimiter string for separating + # array and row element values (";" by default) + escape-character: "\\" # optional: escape character for escaping values (disabled by default) + null-literal: "n/a" # optional: null literal string that is interpreted as a + # null value (disabled by default) {% endhighlight %} </div> </div> -The CSV format is included in Flink and does not require additional dependencies. +The following table lists supported types that can be read and written: + +| Supported Flink SQL Types | +| :------------------------ | +| `ROW` | +| `VARCHAR` | +| `ARRAY[_]` | +| `INT` | +| `BIGINT` | +| `FLOAT` | +| `DOUBLE` | +| `BOOLEAN` | +| `DATE` | +| `TIME` | +| `TIMESTAMP` | +| `DECIMAL` | +| `NULL` (unsupported yet) | + +**Numeric types:** Value should be a number but the literal `"null"` can also be understood. An empty string is +considered `null`. Values are also trimmed (leading/trailing white space). Numbers are parsed using +Java's `valueOf` semantics. Other non-numeric strings may cause parsing exception. + +**String and time types:** Value is not trimmed. The literal `"null"` can also be understood. Review comment: How about we add info what is the format for time types? This pops up in user's questions often. I believe we can either point to `Timestamp/Time.valueOf` or copy the expected formats. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services