Spark-shell return results when the job is executing?

2014-09-01 Thread Hao Wang
Hi, all I am wondering if I use Spark-shell to scan a large file to obtain lines containing "error", whether the shell returns results while the job is executing, or the job has been totally finished. Regards, Wang Hao(王灏) CloudTeam | School of Software Engineering Shanghai Jiao Tong University

Re: Re: Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Zhanfeng Huo
Thank you. Zhanfeng Huo From: Andrew Or Date: 2014-09-02 08:21 To: Zhanfeng Huo CC: user Subject: Re: Can value in spark-defaults.conf support system variables? No, not currently. 2014-09-01 2:53 GMT-07:00 Zhanfeng Huo : Hi,all: Can value in spark-defaults.conf support system variables?

zip equal-length but unequally-partition

2014-09-01 Thread Kevin Jung
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip Please check this url . I got same problem in v1.0.1 In some cases, RDD losts several elements after zip so that a total count of ZippedRDD is less than

Re: [Streaming] Triggering an action in absence of data

2014-09-01 Thread Tobias Pfeiffer
Hi, On Mon, Sep 1, 2014 at 9:25 PM, Aniket Bhatnagar wrote: > > No state/foreach methods get called when no data has arrived. > Have you double-checked this? I am pretty sure that, for example, foreachRDD gets called (with an empty RDD) even when there was no data received. Tobias

Re: Spark driver application can not connect to Spark-Master

2014-09-01 Thread Andrew Or
Your Master is dead, and your application can't connect to it. Can you verify whether it was your application that killed the Master (by checking the Master logs before and after you submit your application)? Try restarting your master (and workers) through `sbin/stop-all.sh` and `sbin/start-all.sh

Re: Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Andrew Or
No, not currently. 2014-09-01 2:53 GMT-07:00 Zhanfeng Huo : > Hi,all: > > Can value in spark-defaults.conf support system variables? > > Such as "mess = ${user.home}/${user.name}". > > Best Regards > > -- > Zhanfeng Huo >

Spark 1.0.2 Can GroupByTest example be run in Eclipse without change

2014-09-01 Thread Shing Hing Man
Hi, I have noticed that the GroupByTest example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala has been changed to be run using spark-submit. Previously, I set "local" as the first command line parameter, and this enable me t

Re: Time series forecasting

2014-09-01 Thread filipus
i guess it is not a question of spark but a question on your dataset you need to Setup think about what you wonna model and how you can shape the data in such a way spark can use it akima is a technique i know a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5} spark can finde the cofficien

RE: Spark and Shark

2014-09-01 Thread Paolo Platter
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server with Spark 1.1 RC2 and it works fine. Best Paolo Paolo Platter Agile Lab CTO Da: Michael Armbrust Inviato: lunedì 1 settembre 2014 19:43 A: arthur.hk.c...@gmail.com Cc: user@spa

Re: Spark and Shark

2014-09-01 Thread Michael Armbrust
I don't believe that Shark works with Spark > 1.0. Have you considered trying Spark SQL? On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com < arthur.hk.c...@gmail.com> wrote: > Hi, > > I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling > from source). > > spark: 1.

Spark and Shark

2014-09-01 Thread arthur.hk.c...@gmail.com
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how

Re: transforming a Map object to RDD

2014-09-01 Thread Matthew Farrellee
and in python, >>> map = {'a': 1, 'b': 2, 'c': 3} >>> rdd = sc.parallelize(map.items()) >>> rdd.collect() [('a', 1), ('c', 3), ('b', 2)] best, matt On 08/28/2014 07:01 PM, Sean Owen wrote: val map = Map("foo" -> 1, "bar" -> 2, "baz" -> 3) val rdd = sc.parallelize(map.toSeq) rdd is a an RDD[

Re: how to filter value in spark

2014-09-01 Thread Matthew Farrellee
you could join, it'll give you the intersection and a list of the labels where the value was found. > a.join(b).collect Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b))) best, matt On 08/31/2014 09:23 PM, Liu, Raymond wrote: > You could use cogroup to combine RDDs in one RDD for

Re: Problem Accessing Hive Table from hiveContext

2014-09-01 Thread Yin Huai
Hello Igor, Although Decimal is supported, Hive 0.12 does not support user definable precision and scale (it was introduced in Hive 0.13). Thanks, Yin On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor wrote: > Hi All, > New to spark and using Spark 1.0.2 and hive 0.12. > > If hive table created

[Streaming] Triggering an action in absence of data

2014-09-01 Thread Aniket Bhatnagar
Hi all I am struggling to implement a use case wherein I need to trigger an action in case no data has been received for X amount of time. I haven't been able to figure out an easy way to do this. No state/foreach methods get called when no data has arrived. I thought of generating a 'tick' DStrea

Value of SHUFFLE_PARTITIONS

2014-09-01 Thread Chirag Aggarwal
Hi, Currently the number of shuffle partitions is config driven parameter (SHUFFLE_PARTITIONS) . This means that anyone who is running a spark-sql query should first of all analyze that what value of SHUFFLE_PARTITIONS would give the best performance for the query. Shouldn't there be a logic i

Has anybody faced SPARK-2604 issue regarding Application hang state

2014-09-01 Thread twinkle sachdeva
Hi, Has anyone else also experienced https://issues.apache.org/jira/browse/SPARK-2604? It is an edge case scenario of mis configuration, where the executor memory asked is same as the maximum allowed memory by yarn. In such situation, application stays in hang state, and the reason is not logged

Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Zhanfeng Huo
Hi,all: Can value in spark-defaults.conf support system variables? Such as "mess = ${user.home}/${user.name}". Best Regards Zhanfeng Huo

Spark driver application can not connect to Spark-Master

2014-09-01 Thread moon soo Lee
Hi, I'm developing an application with Spark. My java application trying to creates spark context like Creating spark context public SparkContext createSparkContext(){ String execUri = System.getenv("SPARK_EXECUTOR_URI"); String[] jars = SparkILoop.getAddedJars(); SparkCo

operations on replicated RDD

2014-09-01 Thread rapelly kartheek
Hi, An RDD replicated by an application is owned by only that application. No other applications can share it. Then, what is motive behind providing the rdd replication feature. What all oparations can be performed on the replicated RDD. Thank you!!! -karthik

RDD.pipe error on context cleaning

2014-09-01 Thread Jaonary Rabarisoa
Dear all, When callinig an external process with RDD.pipe I got the following error : *Not interrupting system thread Thread[process reaper,10,system]* *Not interrupting system thread Thread[process reaper,10,system]* *Not interrupting system thread Thread[process reaper,10,system]* *14/09/01 10