Hi,
Yeah we generally read files from hdfs or object stores like S3, gcs, etc
where files cannot be updated.
Regards
Gourav
On Sun, 7 Jun 2020, 22:36 Jungtaek Lim,
wrote:
> Hi Nick,
>
> I guess that's by design - Spark assumes the input file will not be
> modified once it is placed on the inpu
Hi Nick,
I guess that's by design - Spark assumes the input file will not be
modified once it is placed on the input path. This makes Spark easy to
track the list of processed files vs unprocessed files. Assume input files
can be modified, then Spark will have to enumerate all of files and track
h
We were trying to use structured streaming from file source, but had
problems getting the files read by Spark properly. We have another
process generating the data files in the Spark data source directory on
a continuous basis. What we have observed was that the moment a data
file is created