Multilabel classification with Spark MLlib

2016-11-25 Thread Md. Rezaul Karim
appreciated. Regards, _ *Md. Rezaul Karim,* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Multilabel classification with Spark MLlib

2016-11-29 Thread Md. Rezaul Karim
appreciated. Regards, _ *Md. Rezaul Karim,* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

How to compute the recall and F1-score in Linear Regression based model

2016-12-06 Thread Md. Rezaul Karim
List()) { count++; } System.out.println("precision: " + (double) (count * 100) / predictions.count()); Now, I would like to compute other evaluation metrics like *Recall *and *F1-score *etc. How could I do that? Regards, _____ *M

Re: How to compute the recall and F1-score in Linear Regression based model

2016-12-06 Thread Md. Rezaul Karim
or binary class dataset. Regards, _________ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html> On

Pruning decision tree to create an optimal tree

2016-12-07 Thread Md. Rezaul Karim
, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: Running spark from Eclipse and then Jar

2016-12-07 Thread Md. Rezaul Karim
single An example pom.xml file has been attached for your reference. Feel free to reuse it. Regards, _ *Md. Rezaul Karim,* BSc

Re: Running spark from Eclipse and then Jar

2016-12-07 Thread Md. Rezaul Karim
tion: Failed to find data source: libsvm. * The application works fine on Eclipse. However, while packaging the corresponding jar file, I am getting the above error which is really weird! Regards, _____________ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Dat

"Failed to find data source: libsvm" while running Spark application with jar

2016-12-08 Thread Md. Rezaul Karim
at of the input file. Any kind of help is appreciated. Regards, _________ *Md. Rezaul Karim* BSc, MSc Ph.D. Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/i

Re: Random Forest hangs without trace of error

2016-12-09 Thread Md. Rezaul Karim
I had similar experience last week. Even I could not find any error trace. Later on, I did the following to get rid of the problem: i) I downgraded to Spark 2.0.0 ii) Decreased the value of maxBins and maxDepth Additionally, make sure that you set the featureSubsetStrategy as "auto" to let the al

Re: Running spark from Eclipse and then Jar

2016-12-10 Thread Md. Rezaul Karim
;*db.lck*" file which was preventing the jar to be executed from the command line. I just deleted that file, packaged my project as jar again and finally the problem resolved. Regards, _________ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data An

Issue with SparkR setup on RStudio

2016-12-29 Thread Md. Rezaul Karim
Like.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.Traversabl Any kind of help would be appreciated. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data A

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Md. Rezaul Karim
Hello Cheung, Happy New Year! No, I did not configure Hive on my machine. Even I have tried not setting the HADOOP_HOME but getting the same error. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of

RBackendHandler Error while running ML algorithms with SparkR on RStudio

2017-01-03 Thread Md. Rezaul Karim
titanicDF nbModel <- spark.naiveBayes(nbDF, Survived ~ Class + Sex + Age) # Model summary summary(nbModel) # Prediction nbPredictions <- predict(nbModel, nbTestDF) showDF(nbPredictions) Someone please help me to get rid of this error. Regards, _ *Md. Rezaul

Re: Issue with SparkR setup on RStudio

2017-01-04 Thread Md. Rezaul Karim
Cheung, The problem has been solved after switching from Windows to Linux environment. Thanks. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway

Re: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Md. Rezaul Karim
These features will help make your machine learning scalable and easy too. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://ww

Re: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Md. Rezaul Karim
Hi, Currently, I have been using Spark 2.1.0 for ML and so far did not experience any critical issue. It's much stable compared to Spark 2.0.1/2.0.2 I would say. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics Nat

Re: How to save spark-ML model in Java?

2017-01-12 Thread Md. Rezaul Karim
rwrite().save("output/NBModel") Hope that helps. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/

H2O DataFrame to Spark RDD/DataFrame

2017-01-12 Thread Md. Rezaul Karim
docs/booklets/SparklingWaterVignette.pdf> However, it discusses how to convert a Spark RDD or DaataFrame to H2O DatFrame but not the vice-versa. Regards, _____________ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Busin

Old version of Spark [v1.2.0]

2017-01-15 Thread Md. Rezaul Karim
Hi, I am looking for Spark 1.2.0 version. I tried to download in the Spark website but it's no longer available. Any suggestion? Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, G

Re: Old version of Spark [v1.2.0]

2017-01-15 Thread Md. Rezaul Karim
Hi Ayan, Thanks a million. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

Parsing RDF data with Spark

2017-01-18 Thread Md. Rezaul Karim
Hi All, Is there any way to parse Linked Data in RDF(.n3,. ttl, .nq,. nt) format with Spark? Kind regards, Reza

"Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
brary.path=$HADOOP_HOME/lib/native" Although my Spark job executes successfully and writes the results to a file at the end. However, I am not getting any logs to track the progress. Could someone help me to solve this problem? Regards, _________ *Md. Rezaul Karim

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
Thanks, Sean. I will explore online more. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

How to reduce number of tasks and partitions in Spark job?

2017-01-26 Thread Md. Rezaul Karim
? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: How to tune number of tesks

2017-01-26 Thread Md. Rezaul Karim
argument as TRUE. Val yourRDD = yourRDD.coalesce(1).saveAsTextFile("data/output") Hope that helps. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dang

Re: Text

2017-01-27 Thread Md. Rezaul Karim
Some operations like map, filter, flatMap and coalesce (with shuffle=false) usually preserve the order. However, sortBy, reduceBy, partitionBy, join etc. do not. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National

DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi All, I am running a Spark job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
that I am experiencing the same issue with Spark 2.x (i.e. 2.0.0, 2.0.1, 2.0.2 and 2.1.0). Refer the attached screenshot of the UI that I am seeing on my machine: [image: Inline images 1] Please suggest. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Resea

Pruning decision tree in Spark

2017-01-30 Thread Md. Rezaul Karim
, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: DAG Visualization option is missing on Spark Web UI

2017-01-30 Thread Md. Rezaul Karim
Hi Mark, That worked for me! Thanks a million. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

How to specify "verbose GC" in Spark submit?

2017-02-06 Thread Md. Rezaul Karim
Dear All, Is there any way to specify verbose GC -i.e. “-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” in Spark submit? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA

Re: How to specify "verbose GC" in Spark submit?

2017-02-06 Thread Md. Rezaul Karim
Thanks, Bryan. Got your point. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Md. Rezaul Karim
, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Md. Rezaul Karim
Hi Takeshi, Now I understand that spark-ec2 script was moved to AMPLab. How could I use that one i.e. new location/URL, please? Alternatively, can I use the same script provided with prior Spark releases? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher

Re: EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Md. Rezaul Karim
Thanks for the great help. Appreciated! Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

Debugging Spark application

2017-02-16 Thread Md. Rezaul Karim
Hi, I was looking for some URLs/documents for getting started on debugging Spark applications. I prefer developing Spark applications with Scala on Eclipse and then package the application jar before submitting. Kind regards, Reza

Re: Debugging Spark application

2017-02-16 Thread Md. Rezaul Karim
nk* > > > Regards > Sam > > > On Thu, 16 Feb 2017 at 22:00, Md. Rezaul Karim < > rezaul.ka...@insight-centre.org> wrote: > >> Hi, >> >> I was looking for some URLs/documents for getting started on debugging >> Spark applications. >>

Re: Question on Spark's graph libraries

2017-03-10 Thread Md. Rezaul Karim
+1 Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Research paper used in GraphX

2017-03-31 Thread Md. Rezaul Karim
Hi All, Could anyone please tell me which research paper(s) was/were used to implement the metrics like strongly connected components, page rank, triangle count, closeness centrality, clustering coefficient etc. in Spark GrpahX? Regards, _ *Md. Rezaul Karim

How to convert Spark MLlib vector to ML Vector?

2017-04-09 Thread Md. Rezaul Karim
= new PCA() .setInputCol("features") .setOutputCol("pcaFeatures") .setK(100) .fit(trainingDF) /// GETTING EXCEPTION HERE Please, someone, help me to solve the problem. Kind regards, *Md. Rezaul Karim*

Re: How to convert Spark MLlib vector to ML Vector?

2017-04-10 Thread Md. Rezaul Karim
Hi Yan, Ryan, and Nick, Actually, for a special use case, I had to use RDD-based Spark MLlib which did not work eventually. Therefore, I had to switch to Spark ML later on. Thanks for your support guys. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher

Could you please add a book info on Spark website?

2017-06-25 Thread Md. Rezaul Karim
Hi Sean, Last time, you helped me add a book info (in the books section) on this page https://spark.apache.org/documentation.html. Could you please add another book info. Here's necessary information about the book: *Title*: Scala and Spark for Big Data Analytics *Authors*: Md. Rezaul

Re: Could you please add a book info on Spark website?

2017-06-25 Thread Md. Rezaul Karim
Thanks, Sean. I will ask them to do so. Regards, _ *Md. Rezaul Karim*, BSc, MSc, PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

RE: IDE for python

2017-06-28 Thread Md. Rezaul Karim
By the way, Pycharm from JetBrians also have a community edition which is free and open source. Moreover, if you are a student, you can use the professional edition for students as well. For more, see here https://www.jetbrains.com/student/ On Jun 28, 2017 11:18 AM, "Sotola, Radim" wrote: > Py

Re: [Spark ML] LogisticRegressionWithSGD

2017-06-29 Thread Md. Rezaul Karim
+1 On Jun 29, 2017 10:46 PM, "Kevin Quinn" wrote: > Hello, > > I'd like to build a system that leverages semi-online updates and I wanted > to use stochastic gradient descent. However, after looking at the > documentation it looks like that method is deprecated. Is there a reason > why it was

Bayesian network with Saprk

2017-09-11 Thread Md. Rezaul Karim
Hi All, I am planning to use a Bayesian network to integrate and infer the links between miRNA and proteins based on their expression. Is there any implementation in Spark for the Bayesian network so that I can adapt to feed my data? Regards, _ *Md. Rezaul

WARN: Truncated the string representation with df.describe()

2017-10-16 Thread Md. Rezaul Karim
Hi, When I try to see the statistics in a DataFrame using the df.describe() method, I am experiencing the following WARN and as a result, nothing is getting printed: 17/10/16 18:37:54 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted b

StringIndexer on several columns in a DataFrame with Scala

2017-10-27 Thread Md. Rezaul Karim
am experiencing NullPointerException at for (colName <- featureCol) I am sure, I am doing something wrong. Any suggestion? Regards, _____ *Md. Rezaul Karim*, BSc, MSc Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Busine

Re: StringIndexer on several columns in a DataFrame with Scala

2017-10-30 Thread Md. Rezaul Karim
Hi Nick, Both approaches worked and I realized my silly mistake too. Thank you so much. @Xu, thanks for the update. Best, Regards, _ *Md. Rezaul Karim*, BSc, MSc Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA

SpecificColumnarIterator has grown past JVM limit of 0xFFF

2017-11-17 Thread Md. Rezaul Karim
0xFFF* I understand that the current implementation cannot handle so many columns. However, I was still wondering if there's any workaround to handle a dataset like this? Kind regards, _____ *Md. Rezaul Karim*, BSc, MSc Research Scientist, Fraunhofer FIT, German

Reinforcement Learning with Spark

2018-01-05 Thread Md. Rezaul Karim
Hi All, Is there any Reinforcement Learning algorithm implemented with Spark -i.e. any link to GitHub/open source project etc.? Best, Md. Rezaul Karim, BSc, MSc Research Scientist, Fraunhofer FIT, Germany Ph.D. Researcher, Information Systems, RWTH Aachen University, Germany eMail

Writing a DataFrame is taking too long and huge space

2018-03-09 Thread Md. Rezaul Karim
ce() myDF. coalesce(1).write.format("com.databricks.spark.csv").save("data/file.csv") Any better suggestion? Md. Rezaul Karim, BSc, MSc Research Scientist, Fraunhofer FIT, Germany Ph.D. Researcher, Information Systems, RWTH Aachen University, Germany eMail: rezaul.ka...@fit.fraunhofer.de Tel: +49 241 80-21527

Re: Writing a DataFrame is taking too long and huge space

2018-03-09 Thread Md. Rezaul Karim
k for only pre-processing. By the way, I tried using Spark builtin CSV library too. Best, Md. Rezaul Karim, BSc, MSc Research Scientist, Fraunhofer FIT, Germany Ph.D. Researcher, Information Systems, RWTH Aachen University, Germany eMail: rezaul.ka...@fit.fraunhofer.de Tel: +49 241 80-2

How to read multiple libsvm files in Spark?

2018-09-20 Thread Md. Rezaul Karim
I'm experiencing "Exception in thread "main" java.io.IOException: Multiple input paths are not supported for libsvm data" exception while trying to read multiple libsvm files using Spark 2.3.0: val URLs = spark.read.format("libsvm").load("url_svmlight.tar/url_svmlight/*.svm") Any other alternativ