checkpoint

2017-04-13 Thread issues solution
Hi I am newer in spark and i want ask you what wrang with checkpoint On pyspark 1.6.0 i dont unertsand what happen after i try to use it under datframe : dfTotaleNormalize24 = dfTotaleNormalize23.select([i if i not in listrapcot else udf_Grappra(F.col(i)).alias(i) for i in dfTotaleN

why we can t apply udf on rdd ???

2017-04-13 Thread issues solution
hi what kind of orgine of this error ??? java.lang.UnsupportedOperationException: Cannot evaluate expression: PythonUDF#Grappra(input[410, StringType]) regrads

checkpoint how to use correctly checkpoint with udf

2017-04-13 Thread issues solution
Hi , somone can explain me how i can use inPYSPAK not in scala chekpoint , Because i have lot of udf to apply on large data frame and i dont understand how i can use checkpoint to break lineag to prevent from java.lang.stackoverflow regrads

How to coorect code after java.lang.stackoverflow

2017-04-13 Thread issues solution
Hi , i wonder if we have solution to correct code after getting stackoverflow error i mean you have df.<- transformation 1 df.<- transformation 12 df.<- transformation 3 df.<- transformation 4 . . . df.<- transformation 1n and : df.<- transformation n+1 get error stack overflow error how

Number of column in data frame

2017-04-13 Thread issues solution
Hi , the number of columns that spark can handle without fuss regards

how to master cache and chekpoint for pyspark

2017-04-13 Thread issues solution
hi can ask you to give me example (complete) where : you use udf multiple time one after one and cache after that your data frame or you checkpoint dataframe according to appropriate steps (cache or checkpoint) thanks

checkpoint

2017-04-14 Thread issues solution
Hi somone can give me an complete example to work with chekpoint under Pyspark 1.6 ? thx regards

create column with map function apply to dataframe

2017-04-14 Thread issues solution
Hi , how you can create column inside map function like that : df.map(lambd l : len(l) ) . but instead return rdd we create column insde data frame .

java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Hi , somone can tell me why i get the folowing error with udf apply like udf def replaceCempty(x): if x is None : return "" else : return x.encode('utf-8') udf_replaceCempty = F.udf(replaceCempty,StringType()) dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Pyspark 1.6 On cloudera 5.5 (yearn) 2017-04-19 13:42 GMT+02:00 issues solution : > Hi , > somone can tell me why i get the folowing error with udf apply like udf > > def replaceCempty(x): > if x is None : > return "" > else : &

spark 1.6 .0 and gridsearchcv

2017-05-03 Thread issues solution
Hi , i wonder if we have methode under pyspakr 1.6 to perform gridsearchCv ? if yes can i ask example please . thx

Create multiple columns in pyspak with one shot

2017-05-04 Thread issues solution
Hi , How we can create multiple columns iteratively i mean how you can create empty columns inside loop because : with for i in listl : df = df.withcolumn(i,F.lit(0)) we get stackoverflow how we can do that inside list of columns like that df.select([F.col(i).lit(0) for i in df.columns

Normalize columns items for Onehotencoder

2017-05-04 Thread issues solution
Hi, I have 3 data frame with not same items inside labled values i mean : data frame 1 collabled a b c dataframe2 collabled a w z when i enode the first data fram i get collabled ab c a1 0 0 b 01 0 c

imbalance classe inside RANDOMFOREST CLASSIFIER

2017-05-05 Thread issues solution
Hi , in sicki-learn we have sample_weights option that allow us to create array to balacne class category By calling like that rf.fit(X,Y,sample_weights=[10 10 10 ...1 1 10 ]) i 'am wondering if equivelent exist inside ml or mlib class ??? if yes can i ask refrence or example thx for advanc

Crossvalidator after fit

2017-05-05 Thread issues solution
Hi get the following error after trying to perform gridsearch and crossvalidation on randomforst estimator for classificaiton rf = RandomForestClassifier(labelCol="Labeld",featuresCol="features") evaluator = BinaryClassificationEvaluator(metricName="F1 Score") rf_cv = CrossValidator(estimator=r

SPARK randomforestclassifer and balancing classe

2017-05-09 Thread issues solution
HI i have aleardy ask this question but i still without ansewr somone can help me to figure out who i can balance my class when i use fit methode of randomforestclassifer thx for adavance.

features IMportance

2017-05-10 Thread issues solution
Hi , some one can tell me if we have features importance inside pyspark 1.6.0 thx

URGENT :

2017-05-10 Thread issues solution
Hi , i know you busy about questions but i don't undestand : 1- why we dont have features importance inside pyspakr features ? 2- why we can't use cache data frame with cross validation ? 3- why the documnetation it s not clear when we talk about pyspark ? you can understand when

CrossValidator and stackoverflowError

2017-05-10 Thread issues solution
Hi , when i try to perform CrossValidator i get the stackoverflowError i have aleardy perform all necessary transforimation Stringindexer vector and save data frame in HDFS like parquet afeter that i load all in new data frame and split to train and test when i try fit(train_set) i get st

CROSSVALIDATION and hypotetic fail

2017-05-11 Thread issues solution
Hi , often we preform a grid search and Cross validation under pyspark to find best perameters , but when you have in error not related to computation but to networks or any think else . HOW WE CAN SAVE INTERMADAITE RESULT ,particulary when you have a large process during 3 or 4 days

save SPark ml

2017-05-15 Thread issues solution
Hi, I am under Pyspark 1.6 i want save my model in hdfs file like parquet how i can do this ? My model it s a RandomForestClassifier performed with corssvalidation like this rf_csv2 = CrossValidator() how i can save it ? thx for adavance

Re: save SPark ml

2017-05-15 Thread issues solution
Hi , please i need help about that question 2017-05-15 10:32 GMT+02:00 issues solution : > Hi, > I am under Pyspark 1.6 i want save my model in hdfs file like parquet > > how i can do this ? > > > My model it s a RandomForestClassifier performed with corssval

Cloudera 5.8.0 and spark 2.1.1

2017-05-17 Thread issues solution
Hi , it s possible to use prebuilt version of spark 2.1 inside cloudera 5.8 where scala 2.1.0 not scala 2.1.1 and java 1.7 not java 1.8 Why ? i am in corporate area and i want to test last version of spark. but my probleme i dont Know if the version 2.1.1 of spark can or not work with this ver