can't get around this error when performing union of two datasets
(ds1.union(ds2)) having complex data type (struct, list),
*18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception:
org.apache.spark.sql.AnalysisException: Union can onl
can't get around this error when performing union of two datasets having
complex data type (struct, list),
*18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception:
org.apache.spark.sql.AnalysisException: Union can only be performed on
As Jay suggested correctly, if you're joining then overwrite otherwise only
append as it removes dups.
I think, in this scenario, just change it to write.mode('overwrite')
because you're already reading the old data and your job would be done.
On Sat 2 Jun, 2018, 10:27 PM Benjamin Kim, wrote:
Hi Jay,
Thanks for your response. Are you saying to append the new data and then
remove the duplicates to the whole data set afterwards overwriting the
existing data set with new data set with appended values? I will give that
a try.
Cheers,
Ben
On Fri, Jun 1, 2018 at 11:49 PM Jay wrote:
> Ben
Did you use RDDs or DataFrames?
What is the Spark version?
On Mon, May 28, 2018 at 10:32 PM, Saulo Sobreiro
wrote:
> Hi,
> I run a few more tests and found that even with a lot more operations on
> the scala side, python is outperformed...
>
> Dataset Stream duration: ~3 minutes (csv formatted d
Structured streaming can provide idempotent and exactly once writings in
parquet but I don't know how it does under the hood.
Without this you need to load all your dataset, then dedup, then write back
the entire dataset. This overhead can be minimized with partitionning
output files.
Le ven. 1 ju