[GitHub] [seatunnel] zhilinli123 commented on a diff in pull request #4871: [Docs][Connector-V2][HDFS]Refactor connector-v2 docs using unified format HDFS.

via GitHub Tue, 20 Jun 2023 02:23:12 -0700


zhilinli123 commented on code in PR #4871:
URL: https://github.com/apache/seatunnel/pull/4871#discussion_r1234991366



##########
docs/en/connector-v2/source/HdfsFile.md:
##########
@@ -33,233 +27,77 @@ Read all the data in a split in a pollNext call. What 
splits are read will be sa
   - [x] json
   - [x] excel
 
-## Options
-
-|           name            |  type   | required |    default value    |
-|---------------------------|---------|----------|---------------------|
-| path                      | string  | yes      | -                   |
-| file_format_type          | string  | yes      | -                   |
-| fs.defaultFS              | string  | yes      | -                   |
-| read_columns              | list    | yes      | -                   |
-| hdfs_site_path            | string  | no       | -                   |
-| delimiter                 | string  | no       | \001                |
-| parse_partition_from_path | boolean | no       | true                |
-| date_format               | string  | no       | yyyy-MM-dd          |
-| datetime_format           | string  | no       | yyyy-MM-dd HH:mm:ss |
-| time_format               | string  | no       | HH:mm:ss            |
-| kerberos_principal        | string  | no       | -                   |
-| kerberos_keytab_path      | string  | no       | -                   |
-| skip_header_row_number    | long    | no       | 0                   |
-| schema                    | config  | no       | -                   |
-| common-options            |         | no       | -                   |
-| sheet_name                | string  | no       | -                   |
-
-### path [string]
-
-The source file path.
-
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when 
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path 
`hdfs://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-|     name      | age |
-|---------------|-----|
-| tyrantlucifer | 26  |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date, 
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to 
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` 
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time, 
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then Seatunnel will skip the first 2 lines from source files
-
-### file_format_type [string]
-
-File type, supported as the following file types:
-
-`text` `csv` `parquet` `orc` `json` `excel`
-
-If you assign file type to `json`, you should also assign schema option to 
tell connector how to parse data to the row you want.
-
-For example:
-
-upstream data is the following:
-
-```json
-
-{"code":  200, "data":  "get success", "success":  true}
-
-```
-
-You can also save multiple pieces of data in one file and split them by 
newline:
-
-```json lines
-
-{"code":  200, "data":  "get success", "success":  true}
-{"code":  300, "data":  "get failed", "success":  false}
-
-```
-
-you should assign schema as the following:
-
-```hocon
-
-schema {
-    fields {
-        code = int
-        data = string
-        success = boolean
-    }
-}
-
-```
-
-connector will generate data as the following:
-
-| code |    data     | success |
-|------|-------------|---------|
-| 200  | get success | true    |
-
-If you assign file type to `parquet` `orc`, schema option not required, 
connector can find the schema of upstream data automatically.
-
-If you assign file type to `text` `csv`, you can choose to specify the schema 
information or not.
-
-For example, upstream data is the following:
+## Description
 
-```text
+Read data from hdfs file system.
 
-tyrantlucifer#26#male
+## Supported DataSource Info
+
+| Datasource |  Supported Versions |
+|------------|---------------------|
+| HdfsFile   | hadoop 2.x and 3.x  |
+
+## Data Type Mapping
+
+|                                                          Mysql Data type     
                                                     |                          
                                       Seatunnel Data type                      
                                           |

Review Comment:
   `SeaTunnel Data type`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [seatunnel] zhilinli123 commented on a diff in pull request #4871: [Docs][Connector-V2][HDFS]Refactor connector-v2 docs using unified format HDFS.

Reply via email to