[
https://issues.apache.org/jira/browse/FLINK-20746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086446#comment-18086446
]
Vishal Kamlapure commented on FLINK-20746:
------------------------------------------
Hi [~leonard] ,
I investigated this issue and reproduced the behavior locally.
The legacy {{CsvTableSource}} supported skipping the first line via
{{{}ignoreFirstLine{}}}, but the current filesystem CSV connector does not
expose an equivalent option.
Looking through the implementation, the filesystem CSV path already builds a
Jackson {{CsvSchema}} in {{{}CsvFileFormatFactory.buildCsvSchema(){}}}. The
underlying Jackson schema builder supports {{{}setSkipFirstDataRow(true){}}}.
A possible implementation would be:
* Add a new option {{csv.ignore-first-line}} (default {{{}false{}}})
* Register it in {{CsvFormatOptions}} / {{CsvCommons}}
* Apply it in {{CsvFileFormatFactory.buildCsvSchema()}}
* Call {{csvBuilder.setSkipFirstDataRow(true)}} when enabled
This would preserve the legacy {{ignoreFirstLine}} semantics without
introducing header-based column mapping ({{{}setUseHeader(true){}}}).
If this approach sounds reasonable, I'd be happy to work on this issue and
submit a PR.
> Support ignore-first-line option for CSV format
> -----------------------------------------------
>
> Key: FLINK-20746
> URL: https://issues.apache.org/jira/browse/FLINK-20746
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet,
> ORC, SequenceFile), Table SQL / Ecosystem
> Affects Versions: 1.13.0
> Reporter: Leonard Xu
> Priority: Not a Priority
> Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> ignore-first-line option is a useful feature for CSV format in filesystem
> connector, and I found there're users consulting the feature in
> stackoverflow[1].
>
> [1]https://stackoverflow.com/questions/65359382/apache-flink-sql-reference-guide-for-table-properties
--
This message was sent by Atlassian Jira
(v8.20.10#820010)