Re: Handling Schema Variability and Applying Regex Patterns in Flink Job Configuration

arjun s Mon, 06 Nov 2023 10:30:19 -0800

Thanks for your response.
How should we address the issue of dealing with the unpredictable file
schema(Table API)  in the source directory, as I previously mentioned in my
email?


Thanks and regards,
Arjun

On Mon, 6 Nov 2023 at 20:56, Chen Yu <yuchen.e...@gmail.com> wrote:

> Hi Arjun,
>
> If you can filter files by a regex pattern, I think the config
> `source.path.regex-pattern`[1] maybe what you want.
>
>   'source.path.regex-pattern' = '...',  -- optional: regex pattern to filter 
> files to read under the                                         -- directory 
> of `path` option. This regex pattern should be                                
>         -- matched with the absolute file path. If this option is set,        
>                                 -- the connector  will recursive all files 
> under the directory                                        -- of `path` option
>
>
> Best,
> Yu Chen
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/filesystem/
>
> ------------------------------
> *发件人:* arjun s <arjunjoice...@gmail.com>
> *发送时间:* 2023年11月6日 20:50
> *收件人:* user@flink.apache.org <user@flink.apache.org>
> *主题:* Handling Schema Variability and Applying Regex Patterns in Flink
> Job Configuration
>
> Hi team,
> I'm currently utilizing the Table API function within my Flink job, with
> the objective of reading records from CSV files located in a source
> directory. To obtain the file names, I'm creating a table and specifying
> the schema using the Table API in Flink. Consequently, when the schema
> matches, my Flink job successfully submits and executes as intended.
> However, in cases where the schema does not match, the job fails to submit.
> Given that the schema of the files in the source directory is
> unpredictable, I'm seeking a method to handle this situation.
> Create table query
> =============
> CREATE TABLE sample (col1 STRING,col2 STRING,col3 STRING,col4
> STRING,file.path` STRING NOT NULL METADATA) WITH ('connector' =
> 'filesystem','path' = 'file:///home/techuser/inputdata','format' =
> 'csv','source.monitor-interval' = '10000')
> =============
>
> Furthermore, I have a question about whether there's a way to read files
> from the source directory based on a specific regex pattern. This is
> relevant in our situation because only file names that match a particular
> pattern need to be processed by the Flink job.
>
> Thanks and Regards,
> Arjun
>

Re: Handling Schema Variability and Applying Regex Patterns in Flink Job Configuration

Reply via email to