So I'm having this code:
rdd.foreach(p => {
print(p)
})
Where can I see this output? Currently I'm running my spark program on a
cluster. When I run the jar using sbt run, I see only INFO logs on the
console. Where should I check to see the application sysouts?
I have an RDD of (K, Array[V]) pairs.
For example: ((key1, (1,2,3)), (key2, (3,2,4)), (key1, (4,3,2)))
How can I do a groupByKey such that I get back an RDD of the form (K,
Array[V]) pairs.
Ex: ((key1, (1,2,3,4,3,2)), (key2, (3,2,4)))
Is there any guide available on creating a custom RDD?
What is the concept of Block and BlockManager in Spark? How is a Block
related to a Partition of a RDD?
For example, is distinct() transformation lazy?
when I see the Spark source code, distinct applies a map-> reduceByKey ->
map function to the RDD elements. Why is this lazy? Won't the function be
applied immediately to the elements of RDD when I call someRDD.distinct?
/**
* Return a new RDD
ld be lazy, but
> apparently uses an RDD.count call in its implementation:
> https://spark-project.atlassian.net/browse/SPARK-1021).
>
> David Thomas
> March 11, 2014 at 9:49 PM
> For example, is distinct() transformation lazy?
>
> when I see the Spark source code, distin
Spark runtime/scheduler traverses the DAG starting from
> that RDD and triggers evaluation of anything parent RDDs it needs that
> aren't computed and cached yet.
>
> Any future operations build on the same DAG as long as you use the same
> RDD objects and, if you used cache
Is it possible to parition the RDD elements in a round robin fashion? Say I
have 5 nodes in the cluster and 5 elements in the RDD. I need to ensure
each element gets mapped to each node in the cluster.
How can we replicate RDD elements? Say I have 1 element and 100 nodes in
the cluster. I need to replicate this one item on all the nodes i.e.
effectively create an RDD of 100 elements.
ttp://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Fri, Mar 28, 2014 at 9:24 AM, David Thomas wrote:
>
>> How can we replicate RDD elements? Say I have 1 element and 100 nodes in
>> the cluster. I need to replicate this one item on all the nodes i.e.
>> effectively create an RDD of 100 elements.
>>
>
>
Is there a way to see 'Application Detail UI' page (at master:4040) for
completed applications? Currently, I can see that page only for running
applications, I would like to see various numbers for the application after
it has completed.
Can someone explain how RDD is resilient? If one of the partition is lost,
who is responsible to recreate that partition - is it the driver program?
but the
> re-computation will occur on an executor. So if several partitions are
> lost, e.g. due to a few machines failing, the re-computation can be striped
> across the cluster making it fast.
>
>
> On Wed, Apr 2, 2014 at 11:27 AM, David Thomas wrote:
>
>> Can someone e
What is the difference between checkpointing and caching an RDD?
During a Spark stage, how are tasks split among the workers? Specifically
for a HadoopRDD, who determines which worker has to get which task?
15 matches
Mail list logo