Hi All,
I have an RDD having the data in the following form :
tempRDD: RDD[(String, (String, String))]
(brand , (product, key))
("amazon",("book1","tech"))
("eBay",("book1","tech"))
("barns&noble",("book","tech"))
("amazon",("book2","tech"))
I would like to group the data by Brand and wou
further processing.
I am kind of stuck.
On Tue, Mar 15, 2016 at 10:50 AM, Suniti Singh
wrote:
> Is it always the case that one title is a substring of another ? -- Not
> always. One title can have values like D.O.C, doctor_{areacode},
> doc_{dep,areacode}
>
> On Mon, Mar 14, 2
at one title is a substring of another ?
>
> On Tue, Mar 15, 2016 at 6:46 AM, Suniti Singh
> wrote:
>
>> Hi All,
>>
>> I have two tables with same schema but different data. I have to join the
>> tables based on one column and then do a group by the same column name
Hi All,
I have two tables with same schema but different data. I have to join the
tables based on one column and then do a group by the same column name.
now the data in that column in two table might/might not exactly match. (Ex
- column name is "title". Table1. title = "doctor" and Table2. ti
Hi Suniti,
>
> why are you mixing spark-sql version 1.2.0 with spark-core, spark-hive v
> 1.6.0?
>
> I’d suggest you try to keep all the libs at the same version.
>
> On Mar 7, 2016, at 6:15 PM, Suniti Singh wrote:
>
>
>
> org.apache.spa
Hi All,
I am trying to create a hive context in a scala prog as follows in eclipse:
Note -- i have added the maven dependency for spark -core , hive , and sql.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
object D