Hi Selvan, is table called sel,?
And are these assumptions correct? site -> ColA requests -> ColB I don't think you are using ColC here? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 16 August 2016 at 12:06, Selvam Raman <sel...@gmail.com> wrote: > Hi All, > > Please suggest me the best approach to achieve result. [ Please comment if > the existing logic is fine or not] > > Input Record : > > ColA ColB ColC > 1 2 56 > 1 2 46 > 1 3 45 > 1 5 34 > 1 5 90 > 2 1 89 > 2 5 45 > > Expected Result > > ResA ResB > 1 2:2|3:3|5:5 > 2 1:1|5:5 > > I followd the below Spark steps > > (Spark version - 1.5.0) > > def valsplit(elem :scala.collection.mutable.WrappedArray[String]) : > String = > { > > elem.map(e => e+":"+e).mkString("|") > } > > sqlContext.udf.register("valudf",valsplit(_:scala.collection.mutable. > WrappedArray[String])) > > > val x =sqlContext.sql("select site,valudf(collect_set(requests)) as test > from sel group by site").first > > > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >