Hi team, Thank you for your response. Could you please provide a sample regex(source.path.regex-pattern) for the following scenarios:
Matching filenames that start with "flink" Eg : flink_2023_11_08.csv Matching filenames that end with "flink.csv" Eg: customer_2023_11_08_flink.csv Thanks and regards, Arjun On Tue, 7 Nov 2023 at 16:00, Yu Chen <yuchen.e...@gmail.com> wrote: > Hi Arjun, > > As stated in the document, 'This regex pattern should be matched with the > absolute file path.' > Therefore, you should adjust your regular expression to match absolute > paths. > > Please let me know if there are any other problems. > > Best, > Yu Chen > > > 2023年11月7日 18:11,arjun s <arjunjoice...@gmail.com> 写道: > > > > Hi Chen, > > I attempted to configure the 'source.path.regex-pattern' property in the > table settings as '^customer.*' to ensure that the Flink job only processes > file names starting with "customer" in the specified directory. However, it > appears that this configuration is not producing the expected results. Are > there any additional configurations or adjustments that need to be made? > The table script I used is as follows: > > CREATE TABLE sample ( > > col1 STRING, > > col2 STRING, > > col3 STRING, > > col4 STRING, > > file.path STRING NOT NULL METADATA > > ) WITH ( > > 'connector' = 'filesystem', > > 'path' = 'file:///home/techuser/inputdata', > > 'format' = 'csv', > > 'source.path.regex-pattern' = '^customer.*', > > 'source.monitor-interval' = '10000' > > ) > > Thanks in advance, > > Arjun > > > > On Mon, 6 Nov 2023 at 20:56, Chen Yu <yuchen.e...@gmail.com> wrote: > > Hi Arjun, > > > > If you can filter files by a regex pattern, I think the config > `source.path.regex-pattern`[1] maybe what you want. > > > > 'source.path.regex-pattern' = '...', -- optional: regex pattern to > filter files to read under the > > -- directory of `path` option. > This regex pattern should be > > -- matched with the absolute > file path. If this option is set, > > -- the connector will recursive > all files under the directory > > -- of `path` option > > > > Best, > > Yu Chen > > > > > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/filesystem/ > > > > 发件人: arjun s <arjunjoice...@gmail.com> > > 发送时间: 2023年11月6日 20:50 > > 收件人: user@flink.apache.org <user@flink.apache.org> > > 主题: Handling Schema Variability and Applying Regex Patterns in Flink Job > Configuration Hi team, > > I'm currently utilizing the Table API function within my Flink job, with > the objective of reading records from CSV files located in a source > directory. To obtain the file names, I'm creating a table and specifying > the schema using the Table API in Flink. Consequently, when the schema > matches, my Flink job successfully submits and executes as intended. > However, in cases where the schema does not match, the job fails to submit. > Given that the schema of the files in the source directory is > unpredictable, I'm seeking a method to handle this situation. > > Create table query > > ============= > > CREATE TABLE sample (col1 STRING,col2 STRING,col3 STRING,col4 > STRING,file.path` STRING NOT NULL METADATA) WITH ('connector' = > 'filesystem','path' = 'file:///home/techuser/inputdata','format' = > 'csv','source.monitor-interval' = '10000') > > ============= > > > > Furthermore, I have a question about whether there's a way to read files > from the source directory based on a specific regex pattern. This is > relevant in our situation because only file names that match a particular > pattern need to be processed by the Flink job. > > > > Thanks and Regards, > > Arjun > >