Hi Users,
I have a general doubt regarding DataFrames in SparkR.
I am trying to read a file from Hive and it gets created as DataFrame.
sqlContext <- sparkRHive.init(sc)
#DF
sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
source = "com.databricks.spark.csv", inferSchema='true')
registerTempTable(sales,"Sales")
Do I need to create a new DataFrame for every update to the DataFrame like
addition of new column or need to update the original sales DataFrame.
sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")
Please help me with this , as the orignal file is only 20MB but it throws
out of memory exception on a cluster of
4GB Master and Two workers of 4GB each.
Also, what is the logic with DataFrame do I need to register and drop
tempTable after every update??
Thanks,
Vipul
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]