Can you please quantify the difference and provide the query code? On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani <bhadani.neeraj...@gmail.com> wrote:
> Hi Team, > I am executing same spark code using the Spark SQL API and DataFrame > API, however, Spark SQL is taking longer than expected. > > PFB Sudo code. > > ----------------------------------------------------------------------------------------------- > > Case 1 : Spark SQL > > > ----------------------------------------------------------------------------------------------- > > %sql > > CREATE TABLE <tbl_name> > > AS > > > WITH <table_1> AS ( > > <qry1> > > ) > > ,<table_2> AS ( > > <qry2> > > ) > > > SELECT * FROM <table_1> > > UNION ALL > > SELECT * FROM <table_2> > > > > ----------------------------------------------------------------------------------------------- > > Case 2 : DataFrame API > > > ----------------------------------------------------------------------------------------------- > > > df1 = spark.sql(<qry1>) > > df2 = spark.sql(<qry2>) > > df3 = df1.union(df2) > > df3.write.saveAsTable(<table_name>) > > > ----------------------------------------------------------------------------------------------- > > > As per my understanding, both Spark SQL and DtaaFrame API generate the > same code under the hood and execution time has to be similar. > > > Regards, > > Neeraj > > > -- Thanks, Jason