from:"Carol McDonald"

Re: dataframe average error: Float does not take parameters

2015-10-21 Thread Carol McDonald

DataFrame = [min(count): bigint, avg(count): > double] > > scala> res.show > +--+--+ > |min(count)|avg(count)| > +--+--+ > | 1| 1.0| > +--+--+ > > > scala> res.printSchema > root > |-- min(coun

dataframe average error: Float does not take parameters

2015-10-21 Thread Carol McDonald

This used to work : // What's the min number of bids per item? what's the average? what's the max? auction.groupBy("item", "auctionid").count.agg(min("count"), avg("count"),max("count")).show // MIN(count) AVG(count)MAX(count) // 1 16.992025518341308 75 but this now gives an error val

Re: Top 10 count

2015-10-20 Thread Carol McDonald

// sort by 2nd element Sorting.quickSort(pairs)(Ordering.by[(String, Int, Int), Int](_._2)) // sort by the 3rd element, then 1st Sorting.quickSort(pairs)(Ordering[(Int, String)].on(x => (x._3, x._1))) On Tue, Oct 20, 2015 at 11:33 AM, Carol McDonald wrote: > this works > &g

Re: Top 10 count

2015-10-20 Thread Carol McDonald

n write "Ordering.by(_._2)" to be more concise > (not 100% sure about the syntax off the top of my head). > > > > On Tue, Oct 20, 2015 at 3:56 PM, Carol McDonald > wrote: > >> To find the top 10 counts , which is better using top(10) with Ordering >> on the

Top 10 count

2015-10-20 Thread Carol McDonald

To find the top 10 counts , which is better using top(10) with Ordering on the value, or swapping the key value and ordering on the key ? For example which is better below ? Or does it matter val top10 = logs.filter(log => log.responseCode != 200).map(log => (log.endpoint, 1)).reduceByKey(_ + _)

Re: correct use of DStream foreachRDD

2015-08-28 Thread Carol McDonald

nvertToPut)" should be sufficient. In slightly > older versions of Spark you have to import SparkContext._ to get these > implicits.) > > On Fri, Aug 28, 2015 at 3:29 PM, Carol McDonald > wrote: > > I would like to make sure that I am using the DStream foreachRDD > opera

correct use of DStream foreachRDD

2015-08-28 Thread Carol McDonald

I would like to make sure that I am using the DStream foreachRDD operation correctly. I would like to read from a DStream transform the input and write to HBase. The code below works , but I became confused when I read "Note that the function *func* is executed in the driver process" ? val

Re: Spark - Eclipse IDE - Maven

2015-07-28 Thread Carol McDonald

I agree, I found this book very useful for getting started with spark and eclipse On Tue, Jul 28, 2015 at 11:10 AM, Petar Zecevic wrote: > > Sorry about self-promotion, but there's a really nice tutorial for setting > up Eclipse for Spark in "Spark in Action" book: > http://www.manning.com/bona

Re: dataframes sql order by not total ordering

2015-07-21 Thread Carol McDonald

therwise subsequent > operations (such as the join) could reorder the tuples. > > On Mon, Jul 20, 2015 at 9:25 AM, Carol McDonald > wrote: > >> the following query on the Movielens dataset , is sorting by the count of >> ratings for a movie. It looks like the results are or

dataframes sql order by not total ordering

2015-07-20 Thread Carol McDonald

the following query on the Movielens dataset , is sorting by the count of ratings for a movie. It looks like the results are ordered by partition ? scala> val results =sqlContext.sql("select movies.title, movierates.maxr, movierates.minr, movierates.cntu from(SELECT ratings.product, max(ratings.

Re: ALS run method versus ALS train versus ALS fit and transform

2015-07-17 Thread Carol McDonald

t; API. Similar ideas, > but a different API. > > On Wed, Jul 15, 2015 at 9:55 PM, Carol McDonald > wrote: > > In the Spark mllib examples MovieLensALS.scala ALS run is used, however > in > > the movie recommendation with mllib tutorial ALS train is used , What is > th

ALS run method versus ALS train versus ALS fit and transform

2015-07-15 Thread Carol McDonald

In the Spark mllib examples MovieLensALS.scala ALS run is used, however in the movie recommendation with mllib tutorial ALS train is used , What is the difference, when should you use one versus the other val model = new ALS() .setRank(params.rank) .setIterations(params.numIterati

Re: dataframe average error: Float does not take parameters

dataframe average error: Float does not take parameters

Re: Top 10 count

Re: Top 10 count

Top 10 count

Re: correct use of DStream foreachRDD

correct use of DStream foreachRDD

Re: Spark - Eclipse IDE - Maven

Re: dataframes sql order by not total ordering

dataframes sql order by not total ordering

Re: ALS run method versus ALS train versus ALS fit and transform

ALS run method versus ALS train versus ALS fit and transform

12 matches

Site Navigation

Mail list logo

Footer information