Impossible to get pending file names/paths on checkpoint?

Preston Price Fri, 08 Oct 2021 12:28:57 -0700

I am trying to implement a File Sink that persists files to Azure Data
Lake, and then on commit I want to ingest these files to Azure Data
Explorer. Persisting the files is pretty trivial using the ABFS connector.


However, it does not appear to be possible to get any details about
names/paths to the pending files when they're committed. There are very few
details exposed in FileSinkCommittable, so I am currently blocked. The
paths to the files are needed when issuing ingest commands to the Azure
Data Explorer API. I have considered using automated ingestion for Azure
Data Explorer with EventHub but I need more control over the ingestion
commands for my use case.

I'm finding it very difficult to extend the functionality of the FileSink
as many public classes and interfaces have private constructors, or package
protected return types so I have to re-implement a significant amount of
these features to make minor changes.

Perhaps I'm pursuing this solution in the wrong way?
Thanks for any clues or guidance.

Impossible to get pending file names/paths on checkpoint?

Reply via email to