ne entry for the same key
>
> Code snippet is appreciated because I am new to Spark.
>
> Ningjun
>
>
>
> *From:* Boromir Widas [mailto:vcsub...@gmail.com]
> *Sent:* Friday, February 13, 2015 1:28 PM
> *To:* Wang, Ningjun (LNG-NPV)
> *Cc:* user@spark.apache.org
&g
reducebyKey should work, but you need to define the ordering by using some
sort of index.
On Fri, Feb 13, 2015 at 12:38 PM, Wang, Ningjun (LNG-NPV) <
ningjun.w...@lexisnexis.com> wrote:
>
>
> I have multiple RDD[(String, String)] that store (docId, docText) pairs,
> e.g.
>
>
>
> rdd1: (“id1”, “
You can check out https://github.com/spark-jobserver/spark-jobserver - this
allows several users to upload their jars and run jobs with a REST
interface.
However, if all users are using the same functionality, you can write a
simple spray server which will act as the driver and hosts the spark
con
At least a part of it is due to connection refused, can you check if curl
can reach the URL with proxies -
[FATAL] Non-resolvable parent POM: Could not transfer artifact
org.apache:apache:pom:14 from/to central (
http://repo.maven.apache.org/maven2): Error transferring file: Connection
refused from
The local mode still parallelizes calculations and it is useful for
debugging as it goes through the steps of serialization/deserialization as
a cluster would.
On Fri, Jan 23, 2015 at 5:44 PM, olegshirokikh wrote:
> I'm trying to understand the basics of Spark internals and Spark
> documentation
Hello,
I am trying to do a groupBy on 5 attributes to get results in a form like a
pivot table in microsoft excel. The keys are the attribute tuples and
values are double arrays(maybe very large). Based on the code below, I am
getting back correct results, but would like to optimize it further(I
p
I do not understand Chinese but the diagrams on that page are very helpful.
On Tue, Jan 6, 2015 at 9:46 PM, eric wong wrote:
> A good beginning if you are chinese.
>
> https://github.com/JerryLead/SparkInternals/tree/master/markdown
>
> 2015-01-07 10:13 GMT+08:00 bit1...@163.com :
>
>> Thank you
ystem; spark context also creates an
>> akka actor system, is it possible there are some conflict ?
>>
>>
>>
>> Sent from my iPad
>>
>> On Jan 4, 2015, at 7:42 PM, Boromir Widas wrote:
>>
>> Hello,
>>
>> I am trying to laun
Hello,
I am trying to launch a Spark app(client mode for standalone cluster) from
a Spray server, using the following code.
When I run it as
$> java -cp SprayServer
the SimpleApp.getA() call from SprayService returns -1(which means it sees
the logData RDD as null for HTTP requests), but the s
it should be under
> ls assembly/target/scala-2.10/*
On Sat, Jan 3, 2015 at 10:11 PM, j_soft wrote:
>
>- thanks, it is success builded
>- .but where is builded zip file? I not find finished .zip or .tar.gz
>package
>
>
> 2014-12-31 19:22 GMT+08:00 xhudik [via Apache Spark User List]
It would be very helpful if there is any such tool, but the distributed
nature may be difficult to capture.
I had been trying to run a task where merging the accumulators was taking
an inordinately long time and was not reflecting in the standalone
cluster's web UI.
What I think will be useful is
If you are looking to reduce network traffic then setting
spark.deploy.spreadOut
to false may help.
On Mon, Dec 22, 2014 at 11:44 AM, Ashic Mahtab wrote:
>
> Hi Josh,
> I'm not looking to change the 1:1 ratio.
>
> What I'm trying to do is get both cores on two machines working, rather
> than one
flatMap should help, it returns a Seq for every input.
On Mon, Oct 20, 2014 at 12:31 PM, HARIPRIYA AYYALASOMAYAJULA <
aharipriy...@gmail.com> wrote:
> Hello,
>
> I am facing a problem with implementing this - My mapper should emit
> multiple keys for the same value -> for every input (k, v) it sh
make it a case class should work.
On Thu, Oct 16, 2014 at 8:30 PM, ll wrote:
> i got an exception complaining about serializable. the sample code is
> below...
>
> class HelloWorld(val count: Int) {
> ...
> ...
> }
>
> object Test extends App {
> ...
> val data = sc.parallelize(List(new
Hey Larry,
I have been trying to figure this out for standalone clusters as well.
http://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-Block-Manager-td12833.html
has an answer as to what block manager is for.
>From the documentation, what I understood was if you assign X GB to each
execu
thms do tree reduction in 1.1:
>> http://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html.
>> You can check out how they implemented it -- it is a series of reduce
>> operations.
>>
>> Matei
>>
>> On Oct 1, 2014, at 11:0
at 11:33 AM, Akshat Aranya wrote:
>
>>
>>
>> On Wed, Oct 1, 2014 at 11:00 AM, Boromir Widas
>> wrote:
>>
>>> 1. worker memory caps executor.
>>> 2. With default config, every job gets one executor per worker. This
>>> executor runs with all co
1 (assuming T is connected)
>>
>> If T cannot fit in memory, or is very deep, then there are more exotic
>> techniques, but hopefully this suffices.
>>
>> Andy
>>
>>
>> --
>> http://www.cs.ox.ac.uk/people/andy.twigg/
>>
>> On 30 Septembe
1. worker memory caps executor.
2. With default config, every job gets one executor per worker. This
executor runs with all cores available to the worker.
On Wed, Oct 1, 2014 at 11:04 AM, Akshat Aranya wrote:
> Hi,
>
> What's the relationship between Spark worker and executor memory settings
>
Hello Folks,
I have been trying to implement a tree reduction algorithm recently in
spark but could not find suitable parallel operations. Assuming I have a
general tree like the following -
I have to do the following -
1) Do some computation at each leaf node to get an array of doubles.(This
c
I see, what does http://localhost:4040/executors/ show for memory usage?
I personally find it easier to work with a standalone cluster with a single
worker by using the sbin/start-master.sh and then connecting to the master.
On Tue, Sep 16, 2014 at 6:04 PM, francisco wrote:
> Thanks for the rep
Perhaps your job does not use more than 9g. Even though the dashboard shows
64g the process only uses whats needed and grows to 64g max.
On Tue, Sep 16, 2014 at 5:40 PM, francisco wrote:
> Hi, I'm a Spark newbie.
>
> We had installed spark-1.0.2-bin-cdh4 on a 'super machine' with 256gb
> memory
Hello Folks,
I am trying to chain a couple of map operations and it seems the second map
fails with a mismatch in arguments(event though the compiler prints them to
be the same.) I checked the function and variable types using :t and they
look ok to me.
Have you seen this earlier? I am posting th
23 matches
Mail list logo