I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. Logistic 
regression took 85 minutes and linear regression 127 seconds… 

My dataset as I said is 128 MB and contains: 1000 features and ~100 classes. 


#SparkSession
ss = SparkSession.builder.getOrCreate()


start = time.time()

#Read data
trainData = ss.read.format("csv").option("inferSchema","true").load(file)

#Calculate Features
assembler = VectorAssembler(inputCols=trainData.columns[1:], 
outputCol="features")
trainData = assembler.transform(trainData)

#Drop columns
dropColumns = trainData.columns
dropColumns = [e for e in dropColumns if e not in ('_c0', 'features')]
trainData = trainData.drop(*dropColumns)

#Rename column from _c0 to label
trainData = trainData.withColumnRenamed("_c0", "label")

#Logistic regression
lr = LogisticRegression(maxIter=500, regParam=0.3, elasticNetParam=0.8)
lrModel = lr.fit(trainData)

#Output Coefficients
print("Coefficients: " + str(lrModel.coefficientMatrix))



- Thodoris


> On 27 Apr 2018, at 22:50, Irving Duran <irving.du...@gmail.com> wrote:
> 
> Are you reformatting the data correctly for logistic (meaning 0 & 1's) before 
> modeling?  What are OS and spark version you using?
> 
> Thank You,
> 
> Irving Duran
> 
> 
> On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <z...@ics.forth.gr 
> <mailto:z...@ics.forth.gr>> wrote:
> Hello,
> 
> I am running an experiment to test logistic and linear regression on spark 
> using MLlib.
> 
> My dataset is only 128MB and something weird happens. Linear regression takes 
> about 127 seconds either with 1 or 500 iterations. On the other hand, 
> logistic regression most of the times does not manage to finish either with 1 
> iteration. I usually get memory heap error.
> 
> In both cases I use the default cores and memory for driver and I spawn 1 
> executor with 1 core and 2GBs of memory. 
> 
> Except that, I get a warning about NativeBLAS. I searched in the Internet and 
> I found that I have to install libgfortran. Even if I did it the warning 
> remains.
> 
> Any ideas for the above?
> 
> Thank you,
> - Thodoris
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> 

Reply via email to