subject:"RE\: RDDs"

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-05 Thread ayan guha

Just as best practice, dataframe and datasets are preferred way, so try not to resort to rdd unless you absolutely have to... On Sun, 5 Mar 2017 at 7:10 pm, khwunchai jaengsawang wrote: > Hi Old-Scool, > > > For the first question, you can specify the number of partition in any > DataFrame by us

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-05 Thread khwunchai jaengsawang

Hi Old-Scool, For the first question, you can specify the number of partition in any DataFrame by using repartition(numPartitions: Int, partitionExprs: Column*). Example: val partitioned = data.repartition(numPartitions=10).cache() For your second question, you can transform your RDD in

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-04 Thread bryan . jeffrey

Rdd operation: rdd.map(x => (word, count)).reduceByKey(_+_) Get Outlook for Android On Sat, Mar 4, 2017 at 8:59 AM -0500, "Old-School" wrote: Hi, I want to perform some simple transformations and check the execution time, under various configurations (e.g. number of

Re: RDDs caching in typical machine learning use cases

2016-04-04 Thread Eugene Morozov

Hi, Yes, I believe people do that. I also believe that SparkML is able to figure out when to cache some internal RDD also. That's definitely true for random forest algo. It doesn't harm to cache the same RDD twice, too. But it's not clear what'd you want to know... -- Be well! Jean Morozov On S

Re: RDDs join problem: incorrect result

2015-07-28 Thread ๏̯͡๏

What is the size of each RDD? Size of your cluster & spark configurations that you tried out. On Tue, Jul 28, 2015 at 9:54 PM, ponkin wrote: > Hi, Alice > > Did you find solution? > I have exactly the same problem. > > > > -- > View this message in context: > http://apache-spark-user-list.100156

Re: RDDs join problem: incorrect result

2015-07-28 Thread ponkin

Hi, Alice Did you find solution? I have exactly the same problem. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-join-problem-incorrect-result-tp19928p24049.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: RDDs

2015-03-03 Thread Manas Kar

The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 9:16 AM, Manas Kar wrote: > The above is a great examp

Re: RDDs

2015-03-03 Thread Manas Kar

The above is a great example using thread. Does any one have an example using scala/Akka Future to do the same. I am looking for an example like that which uses a Akka Future and does something if the Future "Timesout" On Tue, Mar 3, 2015 at 7:00 AM, Kartheek.R wrote: > Hi TD, > "You can always

Re: RDDs

2015-03-03 Thread Kartheek.R

Hi TD, "You can always run two jobs on the same cached RDD, and they can run in parallel (assuming you launch the 2 jobs from two different threads)" Is this a correct way to launch jobs from two different threads? val threadA = new Thread(new Runnable { def run() { for(i<- 0 until e

Re: RDDs being cleaned too fast

2014-12-16 Thread Harihar Nahak

RDD.persist() can be useful here. On 11 December 2014 at 14:34, ankits [via Apache Spark User List] < ml-node+s1001560n20613...@n3.nabble.com> wrote: > > I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too > fast. How can i inspect the size of RDD in memory and get more informa

Re: RDDs being cleaned too fast

2014-12-11 Thread Ranga

I was having similar issues with my persistent RDDs. After some digging around, I noticed that the partitions were not balanced evenly across the available nodes. After a "repartition", the RDD was spread evenly across all available memory. Not sure if that is something that would help your use-cas

Re: RDDs being cleaned too fast

2014-12-10 Thread Aaron Davidson

The ContextCleaner uncaches RDDs that have gone out of scope on the driver. So it's possible that the given RDD is no longer reachable in your program's control flow, or else it'd be a bug in the ContextCleaner. On Wed, Dec 10, 2014 at 5:34 PM, ankits wrote: > I'm using spark 1.1.0 and am seeing

Re: RDDs join problem: incorrect result

2014-11-30 Thread Harihar Nahak

what do you mean by incorrect? could you please share some examples from both the RDD and resultant RDD also If you get any exception paste that too. it helps to debug where is the issue On 27 November 2014 at 17:07, liuboya [via Apache Spark User List] < ml-node+s1001560n19928...@n3.nabble.com> w

Re: RDDs and Immutability

2014-09-13 Thread Nicholas Chammas

Have you tried using RDD.map() to transform some of the RDD elements from 0 to 1? Why doesn’t that work? That’s how you change data in Spark, by defining a new RDD that’s a transformation of an old one. On Sat, Sep 13, 2014 at 5:39 AM, Deep Pradhan wrote: > Hi, > We all know that RDDs are immu

Re: RDDs

2014-09-04 Thread Kartheek.R

Thank you yuanbosoft. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-

Re: RDDs

2014-09-03 Thread Tathagata Das

theek.R [mailto:kartheek.m...@gmail.com] > Sent: Thursday, September 04, 2014 1:24 PM > To: u...@spark.incubator.apache.org > Subject: RE: RDDs > > Thank you Raymond and Tobias. > Yeah, I am very clear about what I was asking. I was talking about > "replicated" rdd only. Now

RE: RDDs

2014-09-03 Thread Liu, Raymond

bject: RE: RDDs Thank you Raymond and Tobias. Yeah, I am very clear about what I was asking. I was talking about "replicated" rdd only. Now that I've got my understanding about job and application validated, I wanted to know if we can replicate an rdd and run two jobs (that

RE: RDDs

2014-09-03 Thread Kartheek.R

Thank you Raymond and Tobias. Yeah, I am very clear about what I was asking. I was talking about "replicated" rdd only. Now that I've got my understanding about job and application validated, I wanted to know if we can replicate an rdd and run two jobs (that need same rdd) of an application in par

RE: RDDs

2014-09-03 Thread Liu, Raymond

Not sure what did you refer to when saying replicated rdd, if you actually mean RDD, then, yes , read the API doc and paper as Tobias mentioned. If you actually focus on the word "replicated", then that is for fault tolerant, and probably mostly used in the streaming case for receiver created RD

Re: RDDs

2014-09-03 Thread Tobias Pfeiffer

Hello, On Wed, Sep 3, 2014 at 6:02 PM, rapelly kartheek wrote: > > Can someone tell me what kind of operations can be performed on a > replicated rdd?? What are the use-cases of a replicated rdd. > I suggest you read https://spark.apache.org/docs/latest/programming-guide.html#resilient-distrib

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

Re: RDDs caching in typical machine learning use cases

Re: RDDs join problem: incorrect result

Re: RDDs join problem: incorrect result

Re: RDDs

Re: RDDs

Re: RDDs

Re: RDDs being cleaned too fast

Re: RDDs being cleaned too fast

Re: RDDs being cleaned too fast

Re: RDDs join problem: incorrect result

Re: RDDs and Immutability

Re: RDDs

Re: RDDs

RE: RDDs

RE: RDDs

RE: RDDs

Re: RDDs

20 matches

Site Navigation

Mail list logo

Footer information