I wanted to bump this request from last year. As I understand it the new
FileSource is not suitable for my existing use case.

The TLDR is that we have extensively customized/extended the file splits
and input formats so a single ContinuousFileReaderOperator can handle
non-homogenous files and acts as a source in the middle of our stream. The
alternative is we will re-implement this operator on our own, which seems
like a reasonable alternative so long as access to the MailboxExecutor
won't also be deprecated.

If needed, I can create this as a ticket.

Cheers

Darin

On Tue, Nov 21, 2023 at 1:30 PM Darin Amos <darin.a...@instacart.com> wrote:

> Hi All!
>
> I posted on the community slack channel and was referred to this mailing
> list. I think it would be helpful if the ContinuousFileReaderOperator was
> made a public class and not removed in Flink 2.0 (or to have an equivalent
> created). I have a use case for it where FileSource isn't sufficient, at
> least not to my knowledge.
>
> I think our use case is rather unique and I'm not sure who else would
> benefit. Essentially this operator acts as a source in the middle of our
> stream. Our application processes non-homogenous files which are generally,
> but not limited to, CSV files. In our case each CSV file has varying
> headers (both values and number of header), delimiters and quote
> characters.
>
> Our application will receive a Kafka message with sufficient metadata to
> parse a file (path, delimiter, quote char - configured by supplier) and
> uses an Async operator to pre-download the headers. Afterwards we are able
> to generate custom file splits (which contain the parsing instructions and
> headers) paired with custom format class to create name-value-pair records
> with the ContinuousFileReaderOperator.
>
> I'm more than happy to share more details about our customization if
> required.
>
> Thanks!
>
> Darin Amos
>
>
>

Reply via email to