Hello,
I'm trying to run pyspark using the following setup:
- spark 1.6.1 standalone cluster on ec2
- virtualenv installed on master
- app is run using the following command:
export PYSPARK_DRIVER_PYTHON=/path_to_virtualenv/bin/python
export PYSPARK_PYTHON=/usr/bin/python
/root/spark/bin/spark-
Hi,
I'm trying to run spark applications on a standalone cluster, running on
top of AWS. Since my slaves are spot instances, in some cases they are
being killed and lost due to bid prices. When apps are running during this
event, sometimes the spark application dies - and the driver process just
h
Hello spark-users,
I would like to use the spark standalone cluster for multi-tenants, to run
multiple apps at the same time. The issue is, when submitting an app to the
spark standalone cluster, you cannot pass "--num-executors" like on yarn,
but only "--total-executor-cores". *This may cause sta
Hi all,
I'm running spark 1.2.0 on a 20-node Yarn emr cluster. I've noticed that
whenever I'm running a heavy computation job in parallel to other jobs
running, I'm getting these kind of exceptions:
* [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager-
Lost task 820.0 in stage
On YARN, spark does not manage the cluster, but YARN does. Usually the
cluster manager UI is under http://:9026/cluster. I believe
that it chooses the port for the spark driver UI randomly, but an easy way
of accessing it is by clicking on the "Application Master" link under the
"Tracking UI" colum
name there. I believe giving it with the --name property to spark-submit
> should work.
>
> -Sandy
>
> On Thu, Dec 11, 2014 at 10:28 AM, Tomer Benyamini
> wrote:
>>
>>
>>
>> On Thu, Dec 11, 2014 at 8:27 PM, Tomer Benyamini
>> wrote:
>>
>>&g
On Thu, Dec 11, 2014 at 8:27 PM, Tomer Benyamini
wrote:
> Hi,
>
> I'm trying to set a custom spark app name when running a java spark app in
> yarn-cluster mode.
>
> SparkConf sparkConf = new SparkConf();
>
> sparkConf.setMaster(System.getProperty("spark.ma
Hi,
I'm trying to set a custom spark app name when running a java spark app in
yarn-cluster mode.
SparkConf sparkConf = new SparkConf();
sparkConf.setMaster(System.getProperty("spark.master"));
sparkConf.setAppName("myCustomName");
sparkConf.set("spark.logConf", "true");
JavaSparkContext
wever, you can change the FS being used like so (prior to the first
>> usage):
>> sc.hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>>
>> On Wed, Nov 26, 2014 at 1:47 AM, Tomer Benyamini
>> wrote:
&
Thanks Lalit; Setting the access + secret keys in the configuration works
even when calling sc.textFile. Is there a way to select which hadoop s3
native filesystem implementation would be used at runtime using the hadoop
configuration?
Thanks,
Tomer
On Wed, Nov 26, 2014 at 11:08 AM, lalit1303
wr
Hello,
I'm building a spark app required to read large amounts of log files from
s3. I'm doing so in the code by constructing the file list, and passing it
to the context as following:
val myRDD = sc.textFile("s3n://mybucket/file1, s3n://mybucket/file2, ... ,
s3n://mybucket/fileN")
When running
Hello,
I would like to parallelize my work on multiple RDDs I have. I wanted
to know if spark can support a "foreach" on an RDD of RDDs. Here's a
java example:
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf().setAppName("testapp");
sparkConf.setM
Hi,
I'm working on the problem of remotely submitting apps to the spark
master. I'm trying to use the spark-jobserver project
(https://github.com/ooyala/spark-jobserver) for that purpose.
For scala apps looks like things are working smoothly, but for java
apps, I have an issue with implementing t
Hello,
I'm trying to read from s3 using a simple spark java app:
-
SparkConf sparkConf = new SparkConf().setAppName("TestApp");
sparkConf.setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XX");
Hello,
I'm trying to read from s3 using a simple spark java app:
-
SparkConf sparkConf = new SparkConf().setAppName("TestApp");
sparkConf.setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XX");
Yes exactly.. so I guess this is still an open request. Any workaround?
On Wed, Oct 1, 2014 at 6:04 PM, Nicholas Chammas
wrote:
> Are you trying to do something along the lines of what's described here?
> https://issues.apache.org/jira/browse/SPARK-3533
>
> On Wed, Oct 1, 2014 a
Hi,
I'm trying to write my JavaPairRDD using saveAsNewAPIHadoopFile with
MultipleTextOutputFormat,:
outRdd.saveAsNewAPIHadoopFile("/tmp", String.class, String.class,
MultipleTextOutputFormat.class);
but I'm getting this compilation error:
Bound mismatch: The generic method saveAsNewAPIHadoopFil
Hi,
I would like to upgrade a standalone cluster to 1.1.0. What's the best
way to do it? Should I just replace the existing /root/spark folder
with the uncompressed folder from
http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz ? What
about hdfs and other installations?
I have spark 1.
ning with
> datanode process
>
> --
> Ye Xianjin
> Sent with Sparrow
>
> On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:
>
> Still no luck, even when running stop-all.sh followed by start-all.sh.
>
> On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini wrote:
>>
>> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
>>
>> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
>> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same err
org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
Any idea?
Thanks!
Tomer
On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen wrote:
> If I recall, you should be able to start Hadoop MapReduce using
> ~/ephemeral-hdfs/sbin/start-mapred.sh.
>
> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini wrote:
>&
Do you have a mapreduce
> cluster on your hdfs?
> And from the error message, it seems that you didn't specify your jobtracker
> address.
>
> --
> Ye Xianjin
> Sent with Sparrow
>
> On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:
>
> Hi,
>
>
Hi,
I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
running on the cluster - I'm getting the exception below.
Is there a way to activate it, or is there a spark alternative to distcp?
Thanks,
Tomer
mapreduce.Cluster (Clust
Thanks! I found the hdfs ui via this port - http://[master-ip]:50070/.
It shows 1 node hdfs though, although I have 4 slaves on my cluster.
Any idea why?
On Sun, Sep 7, 2014 at 4:29 PM, Ognen Duzlevski
wrote:
>
> On 9/7/2014 7:27 AM, Tomer Benyamini wrote:
>>
>> 2. What shoul
Hi,
I would like to make sure I'm not exceeding the quota on the local
cluster's hdfs. I have a couple of questions:
1. How do I know the quota? Here's the output of hadoop fs -count -q
which essentially does not tell me a lot
root@ip-172-31-7-49 ~]$ hadoop fs -count -q /
2147483647 21474
25 matches
Mail list logo