overwriting a spark output using pyspark

2016-03-07 Thread Devesh Raj Singh
I am trying to overwrite a spark dataframe using the following option but I am not successful spark_df.write.format('com.databricks.spark.csv').option("header", "true",mode='overwrite').save(self.output_file_path) the mode=overwrite command is not successful -- Warm regards, Devesh.

can we create dummy variables from categorical variables, using sparkR

2016-01-19 Thread Devesh Raj Singh
Hi, Can we create dummy variables for categorical variables in sparkR like we do using "dummies" package in R -- Warm regards, Devesh.

avg(df$column) not returning a value but just the text "Column avg"

2016-01-21 Thread Devesh Raj Singh
Hi, I want to create average of numerical columns in iris dataset using sparkR Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.3.0" "sparkr-shell"') library(SparkR) sc=sparkR.init(master="local",sparkHome = "/Users/devesh/Downloads/spark-1.4.1-bin-hadoop2.6",sparkP

NA value handling in sparkR

2016-01-24 Thread Devesh Raj Singh
Hi, I have applied the following code on airquality dataset available in R , which has some missing values. I want to omit the rows which has NAs library(SparkR) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"') sc <- sparkR.init("local",sparkHo

Re: NA value handling in sparkR

2016-01-25 Thread Devesh Raj Singh
sible that createDataFrame converts R's values to null, so dropna() > works with that. But perhaps read.df() does not convert R s to null, as > those are most likely interpreted as strings when they come in from the > csv. Just a guess, can anyone confirm? > > Deb > &g

Re: NA value handling in sparkR

2016-01-26 Thread Devesh Raj Singh
iltered_aq <- filter(aq, aq$Ozone != "NA" & aq$Solar_R != "NA") >> head(filtered_aq) >> >> Perhaps it would be better to have an option for read.df to convert any >> "NA" it encounters into null types, like createDataFrame does for , and &g

Re: NA value handling in sparkR

2016-01-27 Thread Devesh Raj Singh
e does for , and > then one would be able to use dropna() etc. > > > > On Mon, Jan 25, 2016 at 3:24 AM, Devesh Raj Singh > wrote: > >> Hi, >> >> Yes you are right. >> >> I think the problem is with reading of csv files. read.df is not >> c

can we do column bind of 2 dataframes in spark R? similar to cbind in R?

2016-02-01 Thread Devesh Raj Singh
Hi, I want to merge 2 dataframes in sparkR columnwise similar to cbind in R. We have "unionAll" for r bind but could not find anything for cbind in sparkR -- Warm regards, Devesh.

sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
Hi, i am trying to create dummy variables in sparkR by creating new columns for categorical variables. But it is not appending the columns df <- createDataFrame(sqlContext, iris) class(dtypes(df)) cat.column<-vector(mode="character",length=nrow(df)) cat.column<-collect(select(df,df$Species)) le

Re: sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
ter wrote: > > I had problems doing this as well - I ended up using 'withColumn', it's > not particularly graceful but it worked (1.5.2 on AWS EMR) > > cheerd > > On 3 February 2016 at 22:06, Devesh Raj Singh > wrote: > >> Hi, >> >> i am try

Re: sparkR not able to create /append new columns

2016-02-04 Thread Devesh Raj Singh
12225) which is still under > discussion. If you desire this feature, you could comment on it. > > > > *From:* Franc Carter [mailto:franc.car...@gmail.com] > *Sent:* Wednesday, February 3, 2016 7:40 PM > *To:* Devesh Raj Singh > *Cc:* user@spark.apache.org > *Subject:* Re: sparkR not abl

problem in creating function in sparkR for dummy handling

2016-02-04 Thread Devesh Raj Singh
Hi, I have written a code to create dummy variables in sparkR df <- createDataFrame(sqlContext, iris) class(dtypes(df)) cat.column<-vector(mode="character",length=nrow(df)) cat.column<-collect(select(df,df$Species)) lev<-length(levels(as.factor(unlist(cat.column for (j in 1:lev){ dummy.df

different behavior while using createDataFrame and read.df in SparkR

2016-02-04 Thread Devesh Raj Singh
Hi, I am using Spark 1.5.1 When I do this df <- createDataFrame(sqlContext, iris) #creating a new column for category "Setosa" df$Species1<-ifelse((df)[[5]]=="setosa",1,0) head(df) output: new column created Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1

Re: different behavior while using createDataFrame and read.df in SparkR

2016-02-06 Thread Devesh Raj Singh
> > When calling createDataFrame on iris, the “.” Character in column names > will be replaced with “_”. > > It seems that when you create a DataFrame from the CSV file, the “.” > Character in column names are still there. > > > > *From:* Devesh Raj Singh [mailto:raj.dev

reading spark dataframe in python

2016-02-16 Thread Devesh Raj Singh
Hi, I want to read a spark dataframe using python and then convert the spark dataframe to pandas dataframe then convert the pandas dataframe back to spark dataframe ( after doing some data analysis) . Please suggest. -- Warm regards, Devesh.

Reading CSV file using pyspark

2016-02-18 Thread Devesh Raj Singh
Hi, I want to read CSV file in pyspark I am running pyspark on pycharm I am trying to load a csv using pyspark import os import sys os.environ['SPARK_HOME']="/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6" sys.path.append("/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6/python/") # Now we

pandas dataframe to spark csv

2016-02-23 Thread Devesh Raj Singh
Hi, I have imported spark csv dataframe in python and read the spark data the converted the dataframe to pandas dataframe using toPandas() I want to convert the pandas dataframe back to spark csv and write the csv to a location. Please suggest -- Warm regards, Devesh.

Re: pandas dataframe to spark csv

2016-02-23 Thread Devesh Raj Singh
above solution you can read CSV directly into a dataframe as > well. > > Regards, > Gourav > > On Tue, Feb 23, 2016 at 12:03 PM, Devesh Raj Singh > wrote: > >> Hi, >> >> I have imported spark csv dataframe in python and read the spark data the >> conve