DataFrame = [min(count): bigint, avg(count):
> double]
>
> scala> res.show
> +--+--+
> |min(count)|avg(count)|
> +--+--+
> | 1| 1.0|
> +--+--+
>
>
> scala> res.printSchema
> root
> |-- min(coun
This used to work :
// What's the min number of bids per item? what's the average? what's the
max?
auction.groupBy("item", "auctionid").count.agg(min("count"),
avg("count"),max("count")).show
// MIN(count) AVG(count)MAX(count)
// 1 16.992025518341308 75
but this now gives an error
val
// sort by 2nd element
Sorting.quickSort(pairs)(Ordering.by[(String, Int, Int), Int](_._2))
// sort by the 3rd element, then 1st
Sorting.quickSort(pairs)(Ordering[(Int, String)].on(x => (x._3, x._1)))
On Tue, Oct 20, 2015 at 11:33 AM, Carol McDonald
wrote:
> this works
>
&g
n write "Ordering.by(_._2)" to be more concise
> (not 100% sure about the syntax off the top of my head).
>
>
>
> On Tue, Oct 20, 2015 at 3:56 PM, Carol McDonald
> wrote:
>
>> To find the top 10 counts , which is better using top(10) with Ordering
>> on the
To find the top 10 counts , which is better using top(10) with Ordering on
the value,
or swapping the key value and ordering on the key ? For example which is
better below ?
Or does it matter
val top10 = logs.filter(log => log.responseCode != 200).map(log =>
(log.endpoint, 1)).reduceByKey(_ + _)
nvertToPut)" should be sufficient. In slightly
> older versions of Spark you have to import SparkContext._ to get these
> implicits.)
>
> On Fri, Aug 28, 2015 at 3:29 PM, Carol McDonald
> wrote:
> > I would like to make sure that I am using the DStream foreachRDD
> opera
I would like to make sure that I am using the DStream foreachRDD
operation correctly. I would like to read from a DStream transform the
input and write to HBase. The code below works , but I became confused
when I read "Note that the function *func* is executed in the driver
process" ?
val
I agree, I found this book very useful for getting started with spark and
eclipse
On Tue, Jul 28, 2015 at 11:10 AM, Petar Zecevic
wrote:
>
> Sorry about self-promotion, but there's a really nice tutorial for setting
> up Eclipse for Spark in "Spark in Action" book:
> http://www.manning.com/bona
therwise subsequent
> operations (such as the join) could reorder the tuples.
>
> On Mon, Jul 20, 2015 at 9:25 AM, Carol McDonald
> wrote:
>
>> the following query on the Movielens dataset , is sorting by the count of
>> ratings for a movie. It looks like the results are or
the following query on the Movielens dataset , is sorting by the count of
ratings for a movie. It looks like the results are ordered by partition
?
scala> val results =sqlContext.sql("select movies.title, movierates.maxr,
movierates.minr, movierates.cntu from(SELECT ratings.product,
max(ratings.
t; API. Similar ideas,
> but a different API.
>
> On Wed, Jul 15, 2015 at 9:55 PM, Carol McDonald
> wrote:
> > In the Spark mllib examples MovieLensALS.scala ALS run is used, however
> in
> > the movie recommendation with mllib tutorial ALS train is used , What is
> th
In the Spark mllib examples MovieLensALS.scala ALS run is used, however in
the movie recommendation with mllib tutorial ALS train is used , What is
the difference, when should you use one versus the other
val model = new ALS()
.setRank(params.rank)
.setIterations(params.numIterati
12 matches
Mail list logo