subject:"Re\: Spark Structured Streaming from GCS files"

Re: Spark Structured Streaming from GCS files

2021-03-15 Thread Gowrishankar Sunder

Our online services running in GCP collect data from our clients and write it to GCS under time-partitioned folders like /mm/dd/hh/mm (current-time) or similar ones. We need these files to be processed in real-time from Spark. As for the runtime, we plan to run it either on Dataproc or K8s. -

Re: Spark Structured Streaming from GCS files

2021-03-15 Thread Mich Talebzadeh

Hi, I looked at the stackoverflow reference. The first question that comes to my mind is how you are populating these gcs buckets? Are you shifting data from on-prem and landing them in the buckets and creating a new folder at the given interval? Where will you be running your Spark Structured