from:"Pagliari, Roberto"

unable to bring up cluster with ec2 script

2015-07-07 Thread Pagliari, Roberto

I'm following the tutorial about Apache Spark on EC2. The output is the following: $ ./spark-ec2 -i ../spark.pem -k spark --copy launch spark-training Setting up security groups... Searching for existing cluster spark-training... Latest Spark AMI: ami-19474270 Launching ins

Spark on EMR with S3 example (Python)

2015-07-14 Thread Pagliari, Roberto

Is there an example about how to load data from a public S3 bucket in Python? I haven't found any. Thank you,

RE: Spark on EMR with S3 example (Python)

2015-07-14 Thread Pagliari, Roberto

Hi Sujit, I just wanted to access public datasets on Amazon. Do I still need to provide the keys? Thank you, From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Tuesday, July 14, 2015 3:14 PM To: Pagliari, Roberto Cc: user@spark.apache.org Subject: Re: Spark on EMR with S3 example (Python

setting cost in linear SVM [Python]

2015-04-22 Thread Pagliari, Roberto

Is there a way to set the cost value C when using linear SVM?

gridsearch - python

2015-04-23 Thread Pagliari, Roberto

Can anybody point me to an example, if available, about gridsearch with python? Thank you,

RE: gridsearch - python

2015-04-23 Thread Pagliari, Roberto

I know grid search with cross validation is not supported. However, I was wondering if there is something availalable for the time being. Thanks, From: Punyashloka Biswal [mailto:punya.bis...@gmail.com] Sent: Thursday, April 23, 2015 9:06 PM To: Pagliari, Roberto; user@spark.apache.org Subject

indexing an RDD [Python]

2015-04-24 Thread Pagliari, Roberto

I have an RDD of LabledPoints. Is it possible to select a subset of it based on a list of indeces? For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD with elements 0,4,5,6 and 8. - To unsubscribe, e-mail:

RE: indexing an RDD [Python]

2015-04-24 Thread Pagliari, Roberto

values and preserve the original ones. Thank you, From: Sven Krasser [mailto:kras...@gmail.com] Sent: Friday, April 24, 2015 5:56 PM To: Pagliari, Roberto Cc: user@spark.apache.org Subject: Re: indexing an RDD [Python] The solution depends largely on your use case. I assume the index is in the key

deos randomSplit return a copy or a reference to the original rdd? [Python]

2015-04-27 Thread Pagliari, Roberto

Suppose I have something like the code below for idx in xrange(0, 10): train_test_split = training.randomSplit(weights=[0.75, 0.25]) train_cv = train_test_split[0] test_cv = train_test_split[1] # scale train_cv and test_cv by scaling train

bug: numClasses is not a valid argument of LogisticRegressionWithSGD

2015-04-27 Thread Pagliari, Roberto

With the Python APIs, the available arguments I got (using inspect module) are the following: ['cls', 'data', 'iterations', 'step', 'miniBatchFraction', 'initialWeights', 'regParam', 'regType', 'intercept'] numClasses is not available. Can someone comment on this? Thanks,

Spark SQL configuration

2014-10-26 Thread Pagliari, Roberto

I'm a newbie with Spark. After installing it on all the machines I want to use, do I need to tell it about Hadoop configuration, or will it be able to find it himself? Thank you,

RE: Spark SQL configuration

2014-10-26 Thread Pagliari, Roberto

:08 PM To: Pagliari, Roberto Cc: u...@spark.incubator.apache.org Subject: Re: Spark SQL configuration You can write `HADOOP_CONF_DIR=your_hadoop_conf_path` to `conf/spark-env.sh` to enable: 1 connect to your yarn cluster 2 set `hdfs` as default FileSystem, otherwise you have to write “hdfs

using existing hive with spark sql

2014-10-27 Thread Pagliari, Roberto

If I already have hive running on Hadoop, do I need to build Hive using sbt/sbt -Phive assembly/assembly command? If the answer is no, how do I tell spark where hive home is? Thanks,

install sbt

2014-10-28 Thread Pagliari, Roberto

Is there a repo or some kind of instruction about how to install sbt for centos? Thanks,

problem with start-slaves.sh

2014-10-28 Thread Pagliari, Roberto

I ran sbin/start-master.sh followed by sbin/start-slaves.sh (I build with PHive option to be able to interface with hive) I'm getting this ip_address: org.apache.spark.deploy.worker.Worker running as process . Stop it first. Am I doing something wrong? In my specific case, shark+hive is ru

RE: problem with start-slaves.sh

2014-10-29 Thread Pagliari, Roberto

2014 at 4:32 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: I ran sbin/start-master.sh followed by sbin/start-slaves.sh (I build with PHive option to be able to interface with hive) I’m getting this ip_address: org.apache.spark.deploy.worker.Worker running as process .

RE: problem with start-slaves.sh

2014-10-30 Thread Pagliari, Roberto

I also didn’t realize I was trying to bring up the 2ndNameNode as a slave.. that might be an issue as well.. Thanks, From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com] Sent: Thursday, October 30, 2014 11:27 AM To: Pagliari, Roberto Cc: user@spark.apache.org Subject: Re: problem with start

SparkContext._lock Error

2014-11-05 Thread Pagliari, Roberto

I'm using this system Hadoop 1.0.4 Scala 2.9.3 Hive 0.9.0 With spark 1.1.0. When importing pyspark, I'm getting this error: >>> from pyspark.sql import * Traceback (most recent call last): File "", line 1, in ? File "//spark-1.1.0/python/pyspark/__init__.py", line 63, in ? from pyspark.

RE: SparkContext._lock Error

2014-11-05 Thread Pagliari, Roberto

I'm not on the cluster now so I cannot check. What is the minimum requirement for Python? Thanks, From: Davies Liu [dav...@databricks.com] Sent: Wednesday, November 05, 2014 7:41 PM To: Pagliari, Roberto Cc: user@spark.apache.org Subjec

error when importing HiveContext

2014-11-07 Thread Pagliari, Roberto

I'm getting this error when importing hive context >>> from pyspark.sql import HiveContext Traceback (most recent call last): File "", line 1, in File "/path/spark-1.1.0/python/pyspark/__init__.py", line 63, in from pyspark.context import SparkContext File "/path/spark-1.1.0/python/pys

spark context not defined

2014-11-07 Thread Pagliari, Roberto

I'm running the latest version of spark with Hadoop 1.x and scala 2.9.3 and hive 0.9.0. When using python 2.7 from pyspark.sql import HiveContext sqlContext = HiveContext(sc) I'm getting 'sc not defined' On the other hand, I can see 'sc' from pyspark CLI. Is there a way to fix it?

failed to create a table with python (single node)

2014-11-11 Thread Pagliari, Roberto

I'm executing this example from the documentation (in single node mode) # sc is an existing SparkContext. from pyspark.sql import HiveContext sqlContext = HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") # Queries can be expressed in HiveQL. results = sqlC

unable to bring up cluster with ec2 script

Spark on EMR with S3 example (Python)

RE: Spark on EMR with S3 example (Python)

setting cost in linear SVM [Python]

gridsearch - python

RE: gridsearch - python

indexing an RDD [Python]

RE: indexing an RDD [Python]

deos randomSplit return a copy or a reference to the original rdd? [Python]

bug: numClasses is not a valid argument of LogisticRegressionWithSGD

Spark SQL configuration

RE: Spark SQL configuration

using existing hive with spark sql

install sbt

problem with start-slaves.sh

RE: problem with start-slaves.sh

RE: problem with start-slaves.sh

SparkContext._lock Error

RE: SparkContext._lock Error

error when importing HiveContext

spark context not defined

failed to create a table with python (single node)

22 matches

Site Navigation

Mail list logo

Footer information