Re: How to union RDD and remove duplicated keys

2015-02-13 Thread Boromir Widas
ne entry for the same key > > Code snippet is appreciated because I am new to Spark. > > Ningjun > > > > *From:* Boromir Widas [mailto:vcsub...@gmail.com] > *Sent:* Friday, February 13, 2015 1:28 PM > *To:* Wang, Ningjun (LNG-NPV) > *Cc:* user@spark.apache.org &g

Re: How to union RDD and remove duplicated keys

2015-02-13 Thread Boromir Widas
reducebyKey should work, but you need to define the ordering by using some sort of index. On Fri, Feb 13, 2015 at 12:38 PM, Wang, Ningjun (LNG-NPV) < ningjun.w...@lexisnexis.com> wrote: > > > I have multiple RDD[(String, String)] that store (docId, docText) pairs, > e.g. > > > > rdd1: (“id1”, “

Re: How to design a long live spark application

2015-02-05 Thread Boromir Widas
You can check out https://github.com/spark-jobserver/spark-jobserver - this allows several users to upload their jars and run jobs with a REST interface. However, if all users are using the same functionality, you can write a simple spray server which will act as the driver and hosts the spark con

Re: Building Spark behind a proxy

2015-01-29 Thread Boromir Widas
At least a part of it is due to connection refused, can you check if curl can reach the URL with proxies - [FATAL] Non-resolvable parent POM: Could not transfer artifact org.apache:apache:pom:14 from/to central ( http://repo.maven.apache.org/maven2): Error transferring file: Connection refused from

Re: Apache Spark standalone mode: number of cores

2015-01-23 Thread Boromir Widas
The local mode still parallelizes calculations and it is useful for debugging as it goes through the steps of serialization/deserialization as a cluster would. On Fri, Jan 23, 2015 at 5:44 PM, olegshirokikh wrote: > I'm trying to understand the basics of Spark internals and Spark > documentation

GroupBy multiple attributes

2015-01-23 Thread Boromir Widas
Hello, I am trying to do a groupBy on 5 attributes to get results in a form like a pivot table in microsoft excel. The keys are the attribute tuples and values are double arrays(maybe very large). Based on the code below, I am getting back correct results, but would like to optimize it further(I p

Re: Re: I think I am almost lost in the internals of Spark

2015-01-06 Thread Boromir Widas
I do not understand Chinese but the diagrams on that page are very helpful. On Tue, Jan 6, 2015 at 9:46 PM, eric wong wrote: > A good beginning if you are chinese. > > https://github.com/JerryLead/SparkInternals/tree/master/markdown > > 2015-01-07 10:13 GMT+08:00 bit1...@163.com : > >> Thank you

Re: Launching Spark app in client mode for standalone cluster

2015-01-06 Thread Boromir Widas
ystem; spark context also creates an >> akka actor system, is it possible there are some conflict ? >> >> >> >> Sent from my iPad >> >> On Jan 4, 2015, at 7:42 PM, Boromir Widas wrote: >> >> Hello, >> >> I am trying to laun

Launching Spark app in client mode for standalone cluster

2015-01-04 Thread Boromir Widas
Hello, I am trying to launch a Spark app(client mode for standalone cluster) from a Spray server, using the following code. When I run it as $> java -cp SprayServer the SimpleApp.getA() call from SprayService returns -1(which means it sees the logData RDD as null for HTTP requests), but the s

Re: building spark1.2 meet error

2015-01-03 Thread Boromir Widas
it should be under > ls assembly/target/scala-2.10/* On Sat, Jan 3, 2015 at 10:11 PM, j_soft wrote: > >- thanks, it is success builded >- .but where is builded zip file? I not find finished .zip or .tar.gz >package > > > 2014-12-31 19:22 GMT+08:00 xhudik [via Apache Spark User List]

Re: Spark profiler

2014-12-29 Thread Boromir Widas
It would be very helpful if there is any such tool, but the distributed nature may be difficult to capture. I had been trying to run a task where merging the accumulators was taking an inordinately long time and was not reflecting in the standalone cluster's web UI. What I think will be useful is

Re: Using more cores on machines

2014-12-22 Thread Boromir Widas
If you are looking to reduce network traffic then setting spark.deploy.spreadOut to false may help. On Mon, Dec 22, 2014 at 11:44 AM, Ashic Mahtab wrote: > > Hi Josh, > I'm not looking to change the 1:1 ratio. > > What I'm trying to do is get both cores on two machines working, rather > than one

Re: How to emit multiple keys for the same value?

2014-10-20 Thread Boromir Widas
flatMap should help, it returns a Seq for every input. On Mon, Oct 20, 2014 at 12:31 PM, HARIPRIYA AYYALASOMAYAJULA < aharipriy...@gmail.com> wrote: > Hello, > > I am facing a problem with implementing this - My mapper should emit > multiple keys for the same value -> for every input (k, v) it sh

Re: object in an rdd: serializable?

2014-10-16 Thread Boromir Widas
make it a case class should work. On Thu, Oct 16, 2014 at 8:30 PM, ll wrote: > i got an exception complaining about serializable. the sample code is > below... > > class HelloWorld(val count: Int) { > ... > ... > } > > object Test extends App { > ... > val data = sc.parallelize(List(new

Re: Executor and BlockManager memory size

2014-10-10 Thread Boromir Widas
Hey Larry, I have been trying to figure this out for standalone clusters as well. http://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-Block-Manager-td12833.html has an answer as to what block manager is for. >From the documentation, what I understood was if you assign X GB to each execu

Re: Handling tree reduction algorithm with Spark in parallel

2014-10-03 Thread Boromir Widas
thms do tree reduction in 1.1: >> http://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html. >> You can check out how they implemented it -- it is a series of reduce >> operations. >> >> Matei >> >> On Oct 1, 2014, at 11:0

Re: Relation between worker memory and executor memory in standalone mode

2014-10-01 Thread Boromir Widas
at 11:33 AM, Akshat Aranya wrote: > >> >> >> On Wed, Oct 1, 2014 at 11:00 AM, Boromir Widas >> wrote: >> >>> 1. worker memory caps executor. >>> 2. With default config, every job gets one executor per worker. This >>> executor runs with all co

Re: Handling tree reduction algorithm with Spark in parallel

2014-10-01 Thread Boromir Widas
1 (assuming T is connected) >> >> If T cannot fit in memory, or is very deep, then there are more exotic >> techniques, but hopefully this suffices. >> >> Andy >> >> >> -- >> http://www.cs.ox.ac.uk/people/andy.twigg/ >> >> On 30 Septembe

Re: Relation between worker memory and executor memory in standalone mode

2014-10-01 Thread Boromir Widas
1. worker memory caps executor. 2. With default config, every job gets one executor per worker. This executor runs with all cores available to the worker. On Wed, Oct 1, 2014 at 11:04 AM, Akshat Aranya wrote: > Hi, > > What's the relationship between Spark worker and executor memory settings >

Handling tree reduction algorithm with Spark in parallel

2014-09-30 Thread Boromir Widas
Hello Folks, I have been trying to implement a tree reduction algorithm recently in spark but could not find suitable parallel operations. Assuming I have a general tree like the following - I have to do the following - 1) Do some computation at each leaf node to get an array of doubles.(This c

Re: Memory under-utilization

2014-09-16 Thread Boromir Widas
I see, what does http://localhost:4040/executors/ show for memory usage? I personally find it easier to work with a standalone cluster with a single worker by using the sbin/start-master.sh and then connecting to the master. On Tue, Sep 16, 2014 at 6:04 PM, francisco wrote: > Thanks for the rep

Re: Memory under-utilization

2014-09-16 Thread Boromir Widas
Perhaps your job does not use more than 9g. Even though the dashboard shows 64g the process only uses whats needed and grows to 64g max. On Tue, Sep 16, 2014 at 5:40 PM, francisco wrote: > Hi, I'm a Spark newbie. > > We had installed spark-1.0.2-bin-cdh4 on a 'super machine' with 256gb > memory

Compiler issues for multiple map on RDD

2014-09-15 Thread Boromir Widas
Hello Folks, I am trying to chain a couple of map operations and it seems the second map fails with a mismatch in arguments(event though the compiler prints them to be the same.) I checked the function and variable types using :t and they look ok to me. Have you seen this earlier? I am posting th