>>> names(SALES)[which(names(SALES)=="div_no")]<-"DIV_NO"
This line only create a new data frame. The memory overhead is only the new dataframe object itself, not including its data. So it would be a very little memory consumption. Which line cause the OOM in your case ? On Mon, Nov 23, 2015 at 5:33 PM, Vipul Rai <vipulrai8...@gmail.com> wrote: > Hi Jeff, > > This is only part of the actual code. > > My questions are mentioned in comments near the code. > > SALES<- SparkR::sql(hiveContext, "select * from sales") > PRICING<- SparkR::sql(hiveContext, "select * from pricing") > > > ## renaming of columns ## > #sales file# > > # Is this right ??? Do we have to create a new DF for every column > Addition to the original DF. > > # And if we do that , then what about the older DF , they will also take > memory ? > > names(SALES)[which(names(SALES)=="div_no")]<-"DIV_NO" > names(SALES)[which(names(SALES)=="store_no")]<-"STORE_NO" > > #pricing file# > names(PRICING)[which(names(PRICING)=="price_type_cd")]<-"PRICE_TYPE" > names(PRICING)[which(names(PRICING)=="price_amt")]<-"PRICE_AMT" > > registerTempTable(SALES,"sales") > registerTempTable(PRICING,"pricing") > > #merging sales and pricing file# > merg_sales_pricing<- SparkR::sql(hiveContext,"select > .....................") > > head(merg_sales_pricing) > > > Thanks, > Vipul > > > On 23 November 2015 at 14:52, Jeff Zhang <zjf...@gmail.com> wrote: > >> If possible, could you share your code ? What kind of operation are you >> doing on the dataframe ? >> >> On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai <vipulrai8...@gmail.com> >> wrote: >> >>> Hi Zeff, >>> >>> Thanks for the reply, but could you tell me why is it taking so much >>> time. >>> What could be wrong , also when I remove the DataFrame from memory using >>> rm(). >>> It does not clear the memory but the object is deleted. >>> >>> Also , What about the R functions which are not supported in SparkR. >>> Like ddply ?? >>> >>> How to access the nth ROW of SparkR DataFrame. >>> >>> Regards, >>> Vipul >>> >>> On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> >>> Do I need to create a new DataFrame for every update to the >>>> DataFrame like >>>> addition of new column or need to update the original sales DataFrame. >>>> >>>> Yes, DataFrame is immutable, and every mutation of DataFrame will >>>> produce a new DataFrame. >>>> >>>> >>>> >>>> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> >>>> wrote: >>>> >>>>> Hello Rui, >>>>> >>>>> Sorry , What I meant was the resultant of the original dataframe to >>>>> which a new column was added gives a new DataFrame. >>>>> >>>>> Please check this for more >>>>> >>>>> https://spark.apache.org/docs/1.5.1/api/R/index.html >>>>> >>>>> Check for >>>>> WithColumn >>>>> >>>>> >>>>> Thanks, >>>>> Vipul >>>>> >>>>> >>>>> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote: >>>>> >>>>>> Vipul, >>>>>> >>>>>> Not sure if I understand your question. DataFrame is immutable. You >>>>>> can't update a DataFrame. >>>>>> >>>>>> Could you paste some log info for the OOM error? >>>>>> >>>>>> -----Original Message----- >>>>>> From: vipulrai [mailto:vipulrai8...@gmail.com] >>>>>> Sent: Friday, November 20, 2015 12:11 PM >>>>>> To: user@spark.apache.org >>>>>> Subject: SparkR DataFrame , Out of memory exception for very small >>>>>> file. >>>>>> >>>>>> Hi Users, >>>>>> >>>>>> I have a general doubt regarding DataFrames in SparkR. >>>>>> >>>>>> I am trying to read a file from Hive and it gets created as DataFrame. >>>>>> >>>>>> sqlContext <- sparkRHive.init(sc) >>>>>> >>>>>> #DF >>>>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', >>>>>> source = "com.databricks.spark.csv", >>>>>> inferSchema='true') >>>>>> >>>>>> registerTempTable(sales,"Sales") >>>>>> >>>>>> Do I need to create a new DataFrame for every update to the DataFrame >>>>>> like addition of new column or need to update the original sales >>>>>> DataFrame. >>>>>> >>>>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as >>>>>> a") >>>>>> >>>>>> >>>>>> Please help me with this , as the orignal file is only 20MB but it >>>>>> throws out of memory exception on a cluster of 4GB Master and Two workers >>>>>> of 4GB each. >>>>>> >>>>>> Also, what is the logic with DataFrame do I need to register and drop >>>>>> tempTable after every update?? >>>>>> >>>>>> Thanks, >>>>>> Vipul >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For >>>>>> additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Vipul Rai >>>>> www.vipulrai.me >>>>> +91-8892598819 >>>>> <http://in.linkedin.com/in/vipulrai/> >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >>> >>> -- >>> Regards, >>> Vipul Rai >>> www.vipulrai.me >>> +91-8892598819 >>> <http://in.linkedin.com/in/vipulrai/> >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > Regards, > Vipul Rai > www.vipulrai.me > +91-8892598819 > <http://in.linkedin.com/in/vipulrai/> > -- Best Regards Jeff Zhang