why s3a? On Thu, Jan 9, 2020 at 2:20 AM anbutech <anbutec...@outlook.com> wrote:
> Hello, > > version = spark 2.4.3 > > I have 3 different sources json logs data which having same schema(same > columns order) in the raw data and want to add one new column as > "src_category" for all the 3 different source to distinguish the source > category and merge all the 3 different sources into the single dataframe > to read the json data for the processing.what is the best way to handle > this case. > > df = spark.read.json(merged_3sourcesraw_data) > > Input: > > s3a://my-bucket/ingestion/source1/y=2019/m=12/d=12/logs1.json > s3a://my-bucket/ingestion/source2/y=2019/m=12/d=12/logs1.json > s3a://my-bucket/ingestion/source3/y=2019/m=12/d=12/logs1.json > > output: > s3a://my-bucket/ingestion/processed/y=2019/m=12/d=12/src_category=other > > s3a://my-bucket/ingestion/processed/y=2019/m=12/d=12/src_category=windows-new > s3a://my-bucket/ingestion/processed/y=2019/m=12/d=12/src_category=windows > > > Thanks > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >