Re: Belief propagation algorithm is open sourced

2016-12-15 Thread Ulanov, Alexander
using BP. We were using both LibDAI and our own implementation of BP for GraphLab and as a reference. Best regards, Manish Marwah & Alexander From: Bertrand Dechoux Sent: Thursday, December 15, 2016 1:03:49 AM To: Bryan Cutler Cc: Ulanov, Alexander; user; d

Belief propagation algorithm is open sourced

2016-12-13 Thread Ulanov, Alexander
Dear Spark developers and users, HPE has open sourced the implementation of the belief propagation (BP) algorithm for Apache Spark, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops. Wh

scalable-deeplearning 1.0.0 released

2016-09-09 Thread Ulanov, Alexander
Dear Spark users and developers, I have released version 1.0.0 of scalable-deeplearning package. This package is based on the implementation of artificial neural networks in Spark ML. It is intended for new Spark deep learning features that were not yet merged to Spark ML or that are too specif

accessing spark packages through proxy

2016-09-09 Thread Ulanov, Alexander
Dear Spark users, I am trying to use spark packages, however I get the ivy error listed below. I checked JIRA and stackoverflow and it might be a proxy error. However, neither of proposed solutions did not work for me. Could you suggest how to solve this issue? https://issues.apache.org/jira/b

RE: Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-03 Thread Ulanov, Alexander
Hi Sean, I updated the issue, could you check the changes? Best regards, Alexander -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, August 03, 2016 2:49 AM To: Utkarsh Sengar Cc: User Subject: Re: Spark 2.0 error: Wrong FS: file://spark-warehouse, expect

RE: Spark MLlib: MultilayerPerceptronClassifier error?

2016-07-05 Thread Ulanov, Alexander
Hi Mikhail, I have followed the MLP user-guide and used the dataset and network configuration you mentioned. MLP was trained without any issues with default parameters, that is block size of 128 and 100 iterations. Source code: scala> import org.apache.spark.ml.classification.MultilayerPerceptr

RE: Non-classification neural networks

2016-03-28 Thread Ulanov, Alexander
Hi Jim, It is possible to use raw artificial neural networks by means of FeedForwardTrainer. It is [ml] package private, so your code should be in that package too. Basically, you need to do the same as it is done in MultilayerPerceptronClassifier but without encoding the output as one-hot: h

RE: best way to do deep learning on spark ?

2016-03-18 Thread Ulanov, Alexander
Hi Charles, There is an implementation of multilayer perceptron in Spark (since 1.5): https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier Other features such as autoencoder, convolutional layers, etc. are currently under development. Please ref

RE: Learning Fails with 4 Number of Layes at ANN Training with SGDOptimizer

2016-02-16 Thread Ulanov, Alexander
Hi Hayri, The default MLP optimizer is LBFGS. SGD is available only thought the private interface and its use is discouraged due to multiple reasons. With regards to SGD in general, the paramters are very specific to the dataset and network configuration, one need to find them empirically. The

RE: Spark LBFGS Error with ANN

2016-02-16 Thread Ulanov, Alexander
Hi Hayri, The MLP classifier is multi-class (one class per instance) but not multi-label (multiple classes per instance). The top layer of the network is softmax http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier that requires the outputs sum

RE: SparkML algos limitations question.

2016-01-04 Thread Ulanov, Alexander
Hi Yanbo, As long as two models fit into memory of a single machine, there should be no problems, so even 16GB machines can handle large models. (master should have more memory because it runs LBFGS) In my experiments, I’ve trained the models 12M and 32M parameters without issues. Best regards

RE: How to save Multilayer Perceptron Classifier model.

2015-12-14 Thread Ulanov, Alexander
Hi Vadim, As Yanbo pointed out, that feature is not yet merged into the main branch. However, there is a hacky workaround: // save model sc.parallelize(Seq(model), 1).saveAsObjectFile("path") // load model val sameModel = sc.objectFile[YourCLASS]("path").first() Best regards, Alexander From: Ya

RE: Any way to get raw score from MultilayerPerceptronClassificationModel ?

2015-11-17 Thread Ulanov, Alexander
Hi Robert, Raw scores are not available through the public API. It would be great to add this feature, it seems that we overlooked it. The simple way to access the raw predictions currently would be to create a wrapper for mlpModel. This wrapper should be defined in [ml] package. One need to

RE: Spark ANN

2015-09-09 Thread Ulanov, Alexander
considering matrix-matrix multiplication for convolution optimization at least as a first version. It can also take advantage of data batches. From: Feynman Liang [mailto:fli...@databricks.com] Sent: Wednesday, September 09, 2015 12:56 AM To: Ulanov, Alexander Cc: Ruslan Dautkhanov; Nick Pentreath

RE: Spark ANN

2015-09-08 Thread Ulanov, Alexander
That is an option too. Implementing convolutions with FFTs should be considered as well http://arxiv.org/pdf/1312.5851.pdf. From: Feynman Liang [mailto:fli...@databricks.com] Sent: Tuesday, September 08, 2015 12:07 PM To: Ulanov, Alexander Cc: Ruslan Dautkhanov; Nick Pentreath; user; na

RE: Sort (order by) of the big dataset

2015-05-07 Thread Ulanov, Alexander
avulanov.blogspot.com, though it does not have more on this particular issue than I already posted. From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, May 07, 2015 6:25 AM To: Ulanov, Alexander Cc: user@spark.apache.org Subject: Re: Sort (order by) of the big dataset Where can I find

RE: Sort (order by) of the big dataset

2015-05-07 Thread Ulanov, Alexander
The answer for Spark SQL “order by” is setting spark.sql.shuffle.partitions to a bigger number. For RDD.sortBy it works out of the box if RDD has enough number of partitions. From: Night Wolf [mailto:nightwolf...@gmail.com] Sent: Thursday, May 07, 2015 5:26 AM To: Ulanov, Alexander Cc: user

How to specify Worker and Master LOG folders?

2015-05-06 Thread Ulanov, Alexander
Hi, How can I specify Worker and Master LOG folders? If I set "SPARK_WORKER_DIR" in spark-env, it only affects Executor logs and shuffling folder. But Worker and Master logs still goes to something default: starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/sbin/../logs

RE: Reading large files

2015-05-06 Thread Ulanov, Alexander
: Wednesday, May 06, 2015 2:23 PM To: Ulanov, Alexander Cc: user@spark.apache.org Subject: Re: Reading large files Thanks. In both cases, does the driver need to have enough memory to contain the entire file? How do both these functions work when, for example, the binary file is 4G and available

RE: DataFrame DSL documentation

2015-05-06 Thread Ulanov, Alexander
+1 I had to browse spark-catalyst sources to find what is supported: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala Alexander From: Gerard Maas [mailto:gerard.m...@gmail.com] Sent: Wednesday, May 06, 2015 11:42 AM To: spark

RE: Reading large files

2015-05-06 Thread Ulanov, Alexander
SparkContext has two methods for reading binary files: binaryFiles (reads multiple binary files into RDD) and binaryRecords (reads separate lines of a single binary file into RDD). For example, I have a big binary file split into logical parts, so I can use “binaryFiles”. The possible problem is

RE: Multilabel Classification in spark

2015-05-05 Thread Ulanov, Alexander
If you are interested in multilabel (not multiclass), you might want to take a look at SPARK-7015 https://github.com/apache/spark/pull/5830/files. It is supposed to perform one-versus-all transformation on classes, which is usually how multilabel classifiers are built. Alexander From: Joseph B

RE: Sort (order by) of the big dataset

2015-04-29 Thread Ulanov, Alexander
e to some strange Exceptions. It is really hard to trace back which executor was first to be lost. The other follow it as in house of cards. What's the problem? The number of reducers. For the first task it is equal to the number of partitions, i.e. 2000, but for the second it switched to 200

Sort (order by) of the big dataset

2015-04-29 Thread Ulanov, Alexander
Hi, I have a 2 billion records dataset witch schema . It is stored in Parquet format in HDFS, size 23GB. Specs: Spark 1.3, Hadoop 1.2.1, 8 nodes with Xeon 16GB RAM, 1TB disk space, each node has 3 workers with 3GB memory. I keep failing to sort the mentioned dataset in Spark. I do the following

RE: Scalability of group by

2015-04-28 Thread Ulanov, Alexander
Richard, The same problem is with sort. I have enough disk space and tmp folder. The errors in logs tell out of memory. I wonder what does it hold in memory? Alexander From: Richard Marscher [mailto:rmarsc...@localytics.com] Sent: Tuesday, April 28, 2015 7:34 AM To: Ulanov, Alexander Cc: user

RE: Scalability of group by

2015-04-27 Thread Ulanov, Alexander
It works on a smaller dataset of 100 rows. Probably I could find the size when it fails using binary search. However, it would not help me because I need to work with 2B rows. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Monday, April 27, 2015 6:58 PM To: Ulanov, Alexander Cc: user

Scalability of group by

2015-04-27 Thread Ulanov, Alexander
Hi, I am running a group by on a dataset of 2B of RDD[Row [id, time, value]] in Spark 1.3 as follows: "select id, time, first(value) from data group by id, time" My cluster is 8 nodes with 16GB RAM and one worker per node. Each executor is allocated with 5GB of memory. However, all executors ar

RE: Group by order by

2015-04-27 Thread Ulanov, Alexander
From: Richard Marscher [mailto:rmarsc...@localytics.com] Sent: Monday, April 27, 2015 12:47 PM To: Ulanov, Alexander Cc: user@spark.apache.org Subject: Re: Group by order by It's not related to Spark, but the concept of what you are trying to do with the data. Grouping by ID means consolid

RE: Group by order by

2015-04-27 Thread Ulanov, Alexander
Hi Richard, There are several values of time per id. Is there a way to perform group by id and sort by time in Spark? Best regards, Alexander From: Richard Marscher [mailto:rmarsc...@localytics.com] Sent: Monday, April 27, 2015 12:20 PM To: Ulanov, Alexander Cc: user@spark.apache.org Subject

Group by order by

2015-04-27 Thread Ulanov, Alexander
Hi, Could you suggest what is the best way to do "group by x order by y" in Spark? When I try to perform it with Spark SQL I get the following error (Spark 1.3): val results = sqlContext.sql("select * from sample group by id order by time") org.apache.spark.sql.AnalysisException: expression 'tim

Logging from the Spark shell

2014-11-05 Thread Ulanov, Alexander
Dear Spark users, I would like to run a long experiment using spark-shell. How can I log my intermediate results (numbers, strings) into some file on a master node? What are the best practices? It is NOT performance metrics of Spark that I want to log every X seconds. Instead, I would like to l

Dense to sparse vector converter

2014-07-07 Thread Ulanov, Alexander
Hi, Is there a method in Spark/MLlib to convert DenseVector to SparseVector? Best regards, Alexander

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-26 Thread Ulanov, Alexander
be easy to standardize libsvm converter on data that can be on hdfs,hbase, cassandra or solrbut of course libsvm, netflix format, csv are standard for algorithms and mllib supports all 3... On Jun 25, 2014 6:00 AM, "Ulanov, Alexander" mailto:alexander.ula...@hp.com>> wrote: H

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-25 Thread Ulanov, Alexander
Hi Imk, I am not aware of any classifier in MLLib that accept nominal type of data. They do accept RDD of LabeledPoints, which are label + vector of Double. So, you'll need to convert nominal to double. Best regards, Alexander -Original Message- From: lmk [mailto:lakshmi.muralikrish...

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread Ulanov, Alexander
Hi Imk, There is a number of libraries and scripts to convert text to libsvm format, if you just type " libsvm format converter" in search engine. Unfortunately I cannot recommend a specific one, except the one that is built in Weka. I use it for test purposes, and for big experiments it is eas

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread Ulanov, Alexander
Hi, You need to convert your text to vector space model: http://en.wikipedia.org/wiki/Vector_space_model and then pass it to SVM. As far as I know, in previous versions of MLlib there was a special class for doing this: https://github.com/amplab/MLI/blob/master/src/main/scala/feat/NGrams.scala.

Multiclass classification evaluation measures

2014-06-23 Thread Ulanov, Alexander
Hi, I've implemented a class with measures for evaluation of multiclass classification (as well as unit tests). They are per class and averaged Precision, Recall and F1-measure. As far as I know, in Spark, there is binary classification evaluator only, given that Spark's Bayesian classifier sup

RE: Adding external jar to spark-shell classpath in spark 1.0

2014-06-11 Thread Ulanov, Alexander
://issues.apache.org/jira/browse/SPARK-1919. We haven't found a fix yet, but for now, you can workaround this by including your simple class in your application jar. 2014-06-11 10:25 GMT-07:00 Ulanov, Alexander mailto:alexander.ula...@hp.com>>: Hi, I am currently using spark 1.0

RE: Adding external jar to spark-shell classpath in spark 1.0

2014-06-11 Thread Ulanov, Alexander
gt; > On Wed, Jun 11, 2014 at 10:25 AM, Ulanov, Alexander > wrote: >> Hi, >> >> >> >> I am currently using spark 1.0 locally on Windows 7. I would like to >> use classes from external jar in the spark-shell. I followed the instruction >> in: >>

Adding external jar to spark-shell classpath in spark 1.0

2014-06-11 Thread Ulanov, Alexander
Hi, I am currently using spark 1.0 locally on Windows 7. I would like to use classes from external jar in the spark-shell. I followed the instruction in: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3CCALrNVjWWF6k=c_jrhoe9w_qaacjld4+kbduhfv0pitr8h1f...@mail.gmail.com%3E I ha