I wanted to bump this request from last year. As I understand it the new FileSource is not suitable for my existing use case.
The TLDR is that we have extensively customized/extended the file splits and input formats so a single ContinuousFileReaderOperator can handle non-homogenous files and acts as a source in the middle of our stream. The alternative is we will re-implement this operator on our own, which seems like a reasonable alternative so long as access to the MailboxExecutor won't also be deprecated. If needed, I can create this as a ticket. Cheers Darin On Tue, Nov 21, 2023 at 1:30 PM Darin Amos <darin.a...@instacart.com> wrote: > Hi All! > > I posted on the community slack channel and was referred to this mailing > list. I think it would be helpful if the ContinuousFileReaderOperator was > made a public class and not removed in Flink 2.0 (or to have an equivalent > created). I have a use case for it where FileSource isn't sufficient, at > least not to my knowledge. > > I think our use case is rather unique and I'm not sure who else would > benefit. Essentially this operator acts as a source in the middle of our > stream. Our application processes non-homogenous files which are generally, > but not limited to, CSV files. In our case each CSV file has varying > headers (both values and number of header), delimiters and quote > characters. > > Our application will receive a Kafka message with sufficient metadata to > parse a file (path, delimiter, quote char - configured by supplier) and > uses an Async operator to pre-download the headers. Afterwards we are able > to generate custom file splits (which contain the parsing instructions and > headers) paired with custom format class to create name-value-pair records > with the ContinuousFileReaderOperator. > > I'm more than happy to share more details about our customization if > required. > > Thanks! > > Darin Amos > > >