[ 
https://issues.apache.org/jira/browse/FLINK-20746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086446#comment-18086446
 ] 

Vishal Kamlapure commented on FLINK-20746:
------------------------------------------

Hi [~leonard] ,  

I investigated this issue and reproduced the behavior locally.

The legacy {{CsvTableSource}} supported skipping the first line via 
{{{}ignoreFirstLine{}}}, but the current filesystem CSV connector does not 
expose an equivalent option.

Looking through the implementation, the filesystem CSV path already builds a 
Jackson {{CsvSchema}} in {{{}CsvFileFormatFactory.buildCsvSchema(){}}}. The 
underlying Jackson schema builder supports {{{}setSkipFirstDataRow(true){}}}.

A possible implementation would be:
 * Add a new option {{csv.ignore-first-line}} (default {{{}false{}}})

 * Register it in {{CsvFormatOptions}} / {{CsvCommons}}

 * Apply it in {{CsvFileFormatFactory.buildCsvSchema()}}

 * Call {{csvBuilder.setSkipFirstDataRow(true)}} when enabled

This would preserve the legacy {{ignoreFirstLine}} semantics without 
introducing header-based column mapping ({{{}setUseHeader(true){}}}).

If this approach sounds reasonable, I'd be happy to work on this issue and 
submit a PR.

> Support ignore-first-line option for CSV format
> -----------------------------------------------
>
>                 Key: FLINK-20746
>                 URL: https://issues.apache.org/jira/browse/FLINK-20746
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, 
> ORC, SequenceFile), Table SQL / Ecosystem
>    Affects Versions: 1.13.0
>            Reporter: Leonard Xu
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> ignore-first-line option is a useful feature for CSV format in filesystem 
> connector, and I found there're users  consulting the feature in 
> stackoverflow[1]. 
>  
> [1]https://stackoverflow.com/questions/65359382/apache-flink-sql-reference-guide-for-table-properties



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to