Spark random forest - string data

2015-01-16 Thread Asaf Lahav
Hi, I have been playing around with the new version of Spark MLlib Random forest implementation, and while in the process, tried it with a file with String Features. While training, it fails with: java.lang.NumberFormatException: For input string. Is MBLib Random forest adapted to run on top of

Using a Database to persist and load data from

2014-10-30 Thread Asaf Lahav
Hi Ladies and Gents, I would like to know what are the options I have if I would like to leverage Spark code I already have written to use a DB (Vertica) as its store/datasource. The data is of tabular nature. So any relational DB can essentially be used. Do I need to develop a context? If yes, ho

Spark clustered client

2014-07-22 Thread Asaf Lahav
Hi Folks, I have been trying to dig up some information in regards to what are the possibilities when wanting to deploy more than one client process that consumes Spark. Let's say I have a Spark Cluster of 10 servers, and would like to setup 2 additional servers which are sending requests to it t

Re: Executing spark jobs with predefined Hadoop user

2014-04-12 Thread Asaf Lahav
text**.**SPARK_UNKNOWN_USER* > > *}* > > > > Thanks > > Jerry > > > > *From:* Asaf Lahav [mailto:asaf.la...@gmail.com] > *Sent:* Thursday, April 10, 2014 8:15 PM > *To:* user@spark.apache.org > *Subject:* Executing spark jobs with predefined Hadoop u

Executing spark jobs with predefined Hadoop user

2014-04-10 Thread Asaf Lahav
Hi, We are using Spark with data files on HDFS. The files are stored as files for predefined hadoop user ("hdfs"). The folder is permitted with · read write, executable and read permission for the hdfs user · executable and read permission for users in the group · just