I am not sure if you've understood the question. Here's how we're saving the DataFrame:
df .coalesce(numFiles) .write .partitionBy(partitionDate) .mode("overwrite") .format("parquet") .save(*someDirectory*) Now where would I add a 'prefix' in this one? On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > try it see if it works > > fullyQualifiedTableName = appName+'_'+tableName > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sat, 17 Jul 2021 at 18:02, Eric Beabes <mailinglist...@gmail.com> > wrote: > >> I don't think Spark allows adding a 'prefix' to the file name, does it? >> If it does, please tell me how. Thanks. >> >> On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Jobs have names in spark. You can prefix it to the file name when >>> writing to directory I guess >>> >>> val sparkConf = new SparkConf(). >>> setAppName(sparkAppName). >>> >>> >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Sat, 17 Jul 2021 at 17:40, Eric Beabes <mailinglist...@gmail.com> >>> wrote: >>> >>>> Reason we've two jobs writing to the same directory is that the data is >>>> partitioned by 'day' (yyyymmdd) but the job runs hourly. Maybe the only way >>>> to do this is to create an hourly partition (/yyyymmdd/hh). Is that the >>>> only way to solve this? >>>> >>>> On Fri, Jul 16, 2021 at 5:45 PM ayan guha <guha.a...@gmail.com> wrote: >>>> >>>>> IMHO - this is a bad idea esp in failure scenarios. >>>>> >>>>> How about creating a subfolder each for the jobs? >>>>> >>>>> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes <mailinglist...@gmail.com> >>>>> wrote: >>>>> >>>>>> We've two (or more) jobs that write data into the same directory via >>>>>> a Dataframe.save method. We need to be able to figure out which job wrote >>>>>> which file. Maybe provide a 'prefix' to the file names. I was wondering >>>>>> if >>>>>> there's any 'option' that allows us to do this. Googling didn't come up >>>>>> with any solution so thought of asking the Spark experts on this mailing >>>>>> list. >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>> -- >>>>> Best Regards, >>>>> Ayan Guha >>>>> >>>>