I guess the problem is:

    
dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(column[[1]]==levels(as.factor(unlist(cat.column)))[j],1,0)
 )


    dataframe<-dummy.df

Once dataframe is re-assigned to reference a new DataFrame in each iteration, 
the column variable has to be re-assigned to reference a column in the new 
DataFrame.

From: Devesh Raj Singh [mailto:raj.deves...@gmail.com]
Sent: Saturday, February 6, 2016 8:31 PM
To: Sun, Rui <rui....@intel.com>
Cc: user@spark.apache.org
Subject: Re: different behavior while using createDataFrame and read.df in 
SparkR

Thank you ! Rui Sun for the observation! It helped.

I have a new problem arising. When I create a small function for dummy variable 
creation for categorical column

BDADummies<-function(dataframe,column){
  cat.column<-vector(mode="character",length=nrow(dataframe))
  cat.column<-collect(column)
  lev<-length(levels(as.factor(unlist(cat.column))))
  for (j in 1:lev){


    
dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(column[[1]]==levels(as.factor(unlist(cat.column)))[j],1,0)
 )


    dataframe<-dummy.df
    }
  return(dataframe)
}

and when I call the function using

newdummy.df<-BDADummies(df1,column=select(df1,df1$Species))


I get the below error

Error in withColumn(dataframe, paste0(colnames(cat.column), j), 
ifelse(column[[1]] ==  :
  error in evaluating the argument 'col' in selecting a method for function 
'withColumn': Error in if (le > 0) paste0("[1:", paste(le), "]") else "(0)" :
  argument is not interpretable as logical


but when i use it without calling or creating a function , the statement

dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(column[[1]]==levels(as.factor(unlist(cat.column)))[j],1,0)
 )

gives me the new columns generating column names as desired.

Warm regards,
Devesh.

On Sat, Feb 6, 2016 at 7:09 AM, Sun, Rui 
<rui....@intel.com<mailto:rui....@intel.com>> wrote:
I guess this is related to https://issues.apache.org/jira/browse/SPARK-11976

When calling createDataFrame on iris, the “.” Character in column names will be 
replaced with “_”.
It seems that when you create a DataFrame from the CSV file, the “.” Character 
in column names are still there.

From: Devesh Raj Singh 
[mailto:raj.deves...@gmail.com<mailto:raj.deves...@gmail.com>]
Sent: Friday, February 5, 2016 2:44 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Cc: Sun, Rui
Subject: different behavior while using createDataFrame and read.df in SparkR


Hi,

I am using Spark 1.5.1

When I do this

df <- createDataFrame(sqlContext, iris)

#creating a new column for category "Setosa"

df$Species1<-ifelse((df)[[5]]=="setosa",1,0)

head(df)

output: new column created

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

but when I saved the iris dataset as a CSV file and try to read it and convert 
it to sparkR dataframe

df <- read.df(sqlContext,"/Users/devesh/Github/deveshgit2/bdaml/data/iris/",
              source = "com.databricks.spark.csv",header = "true",inferSchema = 
"true")

now when I try to create new column

df$Species1<-ifelse((df)[[5]]=="setosa",1,0)
I get the below error:

16/02/05 12:11:01 ERROR RBackendHandler: col on 922 failed
Error in select(x, x$"*", alias(col, colName)) :
  error in evaluating the argument 'col' in selecting a method for function 
'select': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  org.apache.spark.sql.AnalysisException: Cannot resolve column name 
"Sepal.Length" among (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, 
Species);
            at org.apache.spark.s
--
Warm regards,
Devesh.



--
Warm regards,
Devesh.

Reply via email to