Just as best practice, dataframe and datasets are preferred way, so try not
to resort to rdd unless you absolutely have to...
On Sun, 5 Mar 2017 at 7:10 pm, khwunchai jaengsawang
wrote:
> Hi Old-Scool,
>
>
> For the first question, you can specify the number of partition in any
> DataFrame by us
Hi Old-Scool,
For the first question, you can specify the number of partition in any
DataFrame by using repartition(numPartitions: Int, partitionExprs: Column*).
Example:
val partitioned = data.repartition(numPartitions=10).cache()
For your second question, you can transform your RDD in
Rdd operation:
rdd.map(x => (word, count)).reduceByKey(_+_)
Get Outlook for Android
On Sat, Mar 4, 2017 at 8:59 AM -0500, "Old-School"
wrote:
Hi,
I want to perform some simple transformations and check the execution time,
under various configurations (e.g. number of
Hi,
Yes, I believe people do that. I also believe that SparkML is able to
figure out when to cache some internal RDD also. That's definitely true for
random forest algo. It doesn't harm to cache the same RDD twice, too.
But it's not clear what'd you want to know...
--
Be well!
Jean Morozov
On S
What is the size of each RDD? Size of your cluster & spark configurations
that you tried out.
On Tue, Jul 28, 2015 at 9:54 PM, ponkin wrote:
> Hi, Alice
>
> Did you find solution?
> I have exactly the same problem.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.100156
Hi, Alice
Did you find solution?
I have exactly the same problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-join-problem-incorrect-result-tp19928p24049.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
The above is a great example using thread.
Does any one have an example using scala/Akka Future to do the same.
I am looking for an example like that which uses a Akka Future and does
something if the Future "Timesout"
On Tue, Mar 3, 2015 at 9:16 AM, Manas Kar
wrote:
> The above is a great examp
The above is a great example using thread.
Does any one have an example using scala/Akka Future to do the same.
I am looking for an example like that which uses a Akka Future and does
something if the Future "Timesout"
On Tue, Mar 3, 2015 at 7:00 AM, Kartheek.R wrote:
> Hi TD,
> "You can always
Hi TD,
"You can always run two jobs on the same cached RDD, and they can run in
parallel (assuming you launch the 2 jobs from two different threads)"
Is this a correct way to launch jobs from two different threads?
val threadA = new Thread(new Runnable {
def run() {
for(i<- 0 until e
RDD.persist() can be useful here.
On 11 December 2014 at 14:34, ankits [via Apache Spark User List] <
ml-node+s1001560n20613...@n3.nabble.com> wrote:
>
> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too
> fast. How can i inspect the size of RDD in memory and get more informa
I was having similar issues with my persistent RDDs. After some digging
around, I noticed that the partitions were not balanced evenly across the
available nodes. After a "repartition", the RDD was spread evenly across
all available memory. Not sure if that is something that would help your
use-cas
The ContextCleaner uncaches RDDs that have gone out of scope on the driver.
So it's possible that the given RDD is no longer reachable in your
program's control flow, or else it'd be a bug in the ContextCleaner.
On Wed, Dec 10, 2014 at 5:34 PM, ankits wrote:
> I'm using spark 1.1.0 and am seeing
what do you mean by incorrect? could you please share some examples from
both the RDD and resultant RDD also If you get any exception paste that
too. it helps to debug where is the issue
On 27 November 2014 at 17:07, liuboya [via Apache Spark User List] <
ml-node+s1001560n19928...@n3.nabble.com> w
Have you tried using RDD.map() to transform some of the RDD elements from 0
to 1? Why doesn’t that work? That’s how you change data in Spark, by
defining a new RDD that’s a transformation of an old one.
On Sat, Sep 13, 2014 at 5:39 AM, Deep Pradhan
wrote:
> Hi,
> We all know that RDDs are immu
Thank you yuanbosoft.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-
theek.R [mailto:kartheek.m...@gmail.com]
> Sent: Thursday, September 04, 2014 1:24 PM
> To: u...@spark.incubator.apache.org
> Subject: RE: RDDs
>
> Thank you Raymond and Tobias.
> Yeah, I am very clear about what I was asking. I was talking about
> "replicated" rdd only. Now
bject: RE: RDDs
Thank you Raymond and Tobias.
Yeah, I am very clear about what I was asking. I was talking about "replicated"
rdd only. Now that I've got my understanding about job and application
validated, I wanted to know if we can replicate an rdd and run two jobs (that
Thank you Raymond and Tobias.
Yeah, I am very clear about what I was asking. I was talking about
"replicated" rdd only. Now that I've got my understanding about job and
application validated, I wanted to know if we can replicate an rdd and run
two jobs (that need same rdd) of an application in par
Not sure what did you refer to when saying replicated rdd, if you actually mean
RDD, then, yes , read the API doc and paper as Tobias mentioned.
If you actually focus on the word "replicated", then that is for fault
tolerant, and probably mostly used in the streaming case for receiver created
RD
Hello,
On Wed, Sep 3, 2014 at 6:02 PM, rapelly kartheek
wrote:
>
> Can someone tell me what kind of operations can be performed on a
> replicated rdd?? What are the use-cases of a replicated rdd.
>
I suggest you read
https://spark.apache.org/docs/latest/programming-guide.html#resilient-distrib
20 matches
Mail list logo