Hello,
StreamingFileSink's part file naming convention is not adjustable. It has
form: *part-<integer>-<integer>. *

My use case for StreamingFileSink is a Kafka -> S3 pipeline, and files are
read and processed from S3 using spark. In almost all cases, I want to
compress raw data before writing to S3 using the BulkFormat.

Spark relies on filename extensions to do compression inference, so the
current naming scheme results in gibberish. I see that 1.10 currently
provides the ability to customize the suffix/prefix, but I really need an
alternative solution to this as soon as possible. Can this be backported to
1.9, or are there alternatives?

Reply via email to