Hi All!

I posted on the community slack channel and was referred to this mailing
list. I think it would be helpful if the ContinuousFileReaderOperator was
made a public class and not removed in Flink 2.0 (or to have an equivalent
created). I have a use case for it where FileSource isn't sufficient, at
least not to my knowledge.

I think our use case is rather unique and I'm not sure who else would
benefit. Essentially this operator acts as a source in the middle of our
stream. Our application processes non-homogenous files which are generally,
but not limited to, CSV files. In our case each CSV file has varying
headers (both values and number of header), delimiters and quote
characters.

Our application will receive a Kafka message with sufficient metadata to
parse a file (path, delimiter, quote char - configured by supplier) and
uses an Async operator to pre-download the headers. Afterwards we are able
to generate custom file splits (which contain the parsing instructions and
headers) paired with custom format class to create name-value-pair records
with the ContinuousFileReaderOperator.

I'm more than happy to share more details about our customization if
required.

Thanks!

Darin Amos

Reply via email to