Saurav,
We have the same issue. Our application runs fine on 32 nodes with 4 cores
each and 256 partitions but gives an OOM on the driver when run on 64 nodes
with 512 partitions. Did you get to know the reason behind this behavior or
the relation between number of partitions and driver RAM usage?
lly on OutOfMemory error. Analyze it and
> share your finding :)
>
>
>
> On Wed, Jun 22, 2016 at 4:33 PM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> Ok. Would be able to shed more light on what exact meta data it manages
>> and what is the relati
age scheduling across all
> your executors. I assume with 64 nodes you have more executors as well.
> Simple way to test is to increase driver memory.
>
> On Wed, Jun 22, 2016 at 10:10 AM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> It is an i
/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
> On Wed, Jun 22, 2016 at 9:57 PM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> Hello All,
>>
>> We have a S
Hello All,
We have a Spark cluster where driver and master are running on the same
node. We are using Spark Standalone cluster manager. If the number of nodes
(and the partitions) are increased, the same dataset that used to run to
completion on lesser number of nodes is now giving an out of memor
(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
On Fri, May 13, 2016 at 6:33 AM, Raghava Mutharaju <
m.vijayaragh...@gmail.com> wrote:
> Thank you for the response.
>
> I use
some modules/profiles for your build. What command did
> you use to build ?
>
> On Thu, May 12, 2016 at 9:01 PM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> Hello All,
>>
>> I built Spark from the source code available at
>> https://github.com
Hello All,
I built Spark from the source code available at
https://github.com/apache/spark/. Although I haven't specified the
"-Dscala-2.11" option (to build with Scala 2.11), from the build messages I
see that it ended up using Scala 2.11. Now, for my application sbt, what
should be the spark ver
(leftItr, rightItr) =>
> leftItr.filter(p => !rightItr.contains(p))
> }
> sum.foreach(println)
>
>
>
> Regards,
> Rishitesh Mishra,
> SnappyData . (http://www.snappydata.io/)
>
> https://in.linkedin.com/in/rishiteshmishra
>
> On Mon, May 9, 2016 at 7:35
at 6:27 AM, ayan guha wrote:
> How about outer join?
> On 9 May 2016 13:18, "Raghava Mutharaju"
> wrote:
>
>> Hello All,
>>
>> We have two PairRDDs (rdd1, rdd2) which are hash partitioned on key
>> (number of partitions are same for both the RDDs).
Hello All,
We have two PairRDDs (rdd1, rdd2) which are hash partitioned on key (number
of partitions are same for both the RDDs). We would like to subtract rdd2
from rdd1.
The subtract code at
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala
seems to
ection, which would, I think, only
> complicate your trace debugging.
>
> I've attached a python script that dumps relevant info from the Spark
> JSON logs into a CSV for easier analysis in you language of choice;
> hopefully it can aid in finer grained debugging (the headers of the
this other than creating a dummy task to synchronize the executors, but
> hopefully someone from there can suggest other possibilities.
>
> Mike
> On Apr 23, 2016 5:53 AM, "Raghava Mutharaju"
> wrote:
>
>> Mike,
>>
>> It turns out the executor delay, as
a dummy task to synchronize the executors, but
> hopefully someone from there can suggest other possibilities.
>
> Mike
> On Apr 23, 2016 5:53 AM, "Raghava Mutharaju"
> wrote:
>
>> Mike,
>>
>> It turns out the executor delay, as you mentioned, is the cau
gt; 3. Make the tasks longer, i.e. with some silly computational work.
>
> Mike
>
>
> On 4/17/16, Raghava Mutharaju wrote:
> > Yes its the same data.
> >
> > 1) The number of partitions are the same (8, which is an argument to the
> > HashPartitioner). In
om/in/sonalgoyal>
>
>
>
> On Mon, Apr 18, 2016 at 6:26 PM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> Mike,
>>
>> We tried that. This map task is actually part of a larger set of
>> operations. I pointed out this map task since i
onfirm:
> >> 1. Examine the starting times of the tasks alongside their
> >> executor
> >> 2. Make a "dummy" stage execute before your real stages to
> >> synchronize the executors by creating and materializing any random RDD
> >&g
max will help. Also, for
> 52MB of data you need not have 12GB allocated to executors. Better to
> assign 512MB or so and increase the number of executors per worker node.
> Try reducing that executor memory to 512MB or so for this case.
>
> On Mon, Apr 18, 2016 at 9:07 AM,
un via Scala program?
>
> On Mon, Apr 18, 2016 at 4:41 AM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> Hello All,
>>
>> We are using HashPartitioner in the following way on a 3 node cluster (1
>> master and 2 worker nodes).
>>
>&
Hello All,
We are using HashPartitioner in the following way on a 3 node cluster (1
master and 2 worker nodes).
val u = sc.textFile("hdfs://x.x.x.x:8020/user/azureuser/s.txt").map[(Int,
Int)](line => { line.split("\\|") match { case Array(x, y) => (y.toInt,
x.toInt) } }).partitionBy(new HashParti
Hello All,
If Kryo serialization is enabled, doesn't Spark take care of registration
of built-in classes, i.e., are we not supposed to register just the custom
classes?
When using DataFrames, this does not seem to be the case. I had to register
the following classes
conf.registerKryoClasses(Arra
Hello All,
I implemented an algorithm using both the RDDs and the Dataset API (in
Spark 1.6). Dataset version takes lot more memory than the RDDs. Is this
normal? Even for very small input data, it is running out of memory and I
get a java heap exception.
I tried the Kryo serializer by registerin
"...")
> e.g. if your DataSet has two columns, you can write:
> ds.select(expr("_2 / _1").as[Int])
>
> where _1 refers to first column and _2 refers to second.
>
> On Tue, Feb 9, 2016 at 3:31 PM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
&
> ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>
> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
> m.vijayaragh...@gmail.com> wrote:
>
>> Hello All,
>>
>> joinWith() method in Dataset takes a condition of type C
Hello All,
joinWith() method in Dataset takes a condition of type Column. Without
converting a Dataset to a DataFrame, how can we get a specific column?
For eg: case class Pair(x: Long, y: Long)
A, B are Datasets of type Pair and I want to join A.x with B.y
A.joinWith(B, A.toDF().col("x") == B.
25 matches
Mail list logo