Re: Spark Pair RDD write to Hive

2021-09-06 Thread Anil Dasari
2nd try From: Anil Dasari Date: Sunday, September 5, 2021 at 10:42 AM To: "user@spark.apache.org" Subject: Spark Pair RDD write to Hive Hello, I have a use case where users of group id are persisted to hive table. // pseudo code looks like below usersRDD = sc.parallelize(..) us

Spark Pair RDD write to Hive

2021-09-05 Thread Anil Dasari
Hello, I have a use case where users of group id are persisted to hive table. // pseudo code looks like below usersRDD = sc.parallelize(..) usersPairRDD = usersRDD.map(u => (u.groupId, u)) groupedUsers = usersPairRDD.groupByKey() Can I save groupedUsers RDD into hive tables where table name is k

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread ayan guha
; > > > Thank you > > Anil Langote > > +1-425-633-9747 <+1%20425-633-9747> > > > > > > *From: *ayan guha > *Date: *Sunday, January 8, 2017 at 10:32 PM > *To: *Anil Langote > > *Subject: *Re: Efficient look up in Key Pair RDD > > >

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread Anil Langote
you Anil Langote +1-425-633-9747 From: ayan guha Date: Sunday, January 8, 2017 at 10:26 PM To: Anil Langote Cc: Holden Karau , user Subject: Re: Efficient look up in Key Pair RDD Have you tried something like GROUPING SET? That seems to be the exact thing you are looking for

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread ayan guha
Have you tried something like GROUPING SET? That seems to be the exact thing you are looking for On Mon, Jan 9, 2017 at 12:37 PM, Anil Langote wrote: > Sure. Let me explain you my requirement I have an input file which has > attributes (25) and las column is array of doubles (14500 elements

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread Anil Langote
Sure. Let me explain you my requirement I have an input file which has attributes (25) and las column is array of doubles (14500 elements in original file) Attribute_0 Attribute_1 Attribute_2 Attribute_3 DoubleArray 5 3 5 3 0.2938933463658645 0.0437040427073041 0.23002681025029648 0.18003221

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread Holden Karau
To start with caching and having a known partioner will help a bit, then there is also the IndexedRDD project, but in general spark might not be the best tool for the job. Have you considered having Spark output to something like memcache? What's the goal of you are trying to accomplish? On Sun,

Efficient look up in Key Pair RDD

2017-01-08 Thread Anil Langote
Hi All, I have a requirement where I wanted to build a distributed HashMap which holds 10M key value pairs and provides very efficient lookups for each key. I tried loading the file into JavaPairedRDD and tried calling lookup method its very slow. How can I achieve very very faster lookup by a gi

Re: Maintaining order of pair rdd

2016-07-26 Thread Kuchekar
> x++y ) The idea is create an ArrayBuffer (This maintains insertion order). More elegant solution would be using zipByIndex on the pair RDD and then sort by the index in each groupByKey RDD.zipByIndex ==> This will give you something like this value ((x,y),index) now map it like this (x

Re: Maintaining order of pair rdd

2016-07-26 Thread janardhan shetty
t;janardhan shetty" >>>>> wrote: >>>>> >>>>>> Array( >>>>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272, >>>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076, >>&g

Re: Maintaining order of pair rdd

2016-07-26 Thread Marco Mistroni
Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272, >>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076, >>>>> 45431, 100136)), >>>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244, >>>

Re: Maintaining order of pair rdd

2016-07-25 Thread janardhan shetty
t;> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431, >>>> 36318, 162076)) >>>> ) >>>> >>>> I need to compare first 5 elements of ID1 with first five element of >>>> ID3 next first 5 elements of ID1 to ID2. Sim

Re: Maintaining order of pair rdd

2016-07-25 Thread Marco Mistroni
s >>> >>> >>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni >>> wrote: >>> >>>> Apologies I misinterpreted could you post two use cases? >>>> Kr >>>> >>>> On 24 Jul 2016 3:41 pm, "janardhan shett

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
pm, "janardhan shetty" >>> wrote: >>> >>>> Marco, >>>> >>>> Thanks for the response. It is indexed order and not ascending or >>>> descending order. >>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" wr

Re: Maintaining order of pair rdd

2016-07-24 Thread Marco Mistroni
37 AM, "Marco Mistroni" wrote: >>> >>>> Use map values to transform to an rdd where values are sorted? >>>> Hth >>>> >>>> On 24 Jul 2016 6:23 am, "janardhan shetty" >>>> wrote: >>>>

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
> Hth >>> >>> On 24 Jul 2016 6:23 am, "janardhan shetty" >>> wrote: >>> >>>> I have a key,value pair rdd where value is an array of Ints. I need to >>>> maintain the order of the value in order to execute downstream >>

Re: Maintaining order of pair rdd

2016-07-24 Thread Marco Mistroni
uot; wrote: > >> Use map values to transform to an rdd where values are sorted? >> Hth >> >> On 24 Jul 2016 6:23 am, "janardhan shetty" >> wrote: >> >>> I have a key,value pair rdd where value is an array of Ints. I need to >>> maintain th

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
Marco, Thanks for the response. It is indexed order and not ascending or descending order. On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote: > Use map values to transform to an rdd where values are sorted? > Hth > > On 24 Jul 2016 6:23 am, "janardhan shetty" wro

Maintaining order of pair rdd

2016-07-23 Thread janardhan shetty
I have a key,value pair rdd where value is an array of Ints. I need to maintain the order of the value in order to execute downstream modifications. How do we maintain the order of values? Ex: rdd = (id1,[5,2,3,15], Id2,[9,4,2,5]) Followup question how do we compare between one element in rdd

Re: Writing output of key-value Pair RDD

2016-05-05 Thread Afshartous, Nick
: Nicholas Chammas; user@spark.apache.org Subject: Re: Writing output of key-value Pair RDD Thanks, I got the example below working. Though it writes both the keys and values to the output file. Is there any way to write just the values ? -- Nick String[] strings = { "Abcd&qu

Re: Writing output of key-value Pair RDD

2016-05-05 Thread Afshartous, Nick
shartous, Nick mailto:nafshart...@turbine.com>> wrote: Hi, Is there any way to write out to S3 the values of a f key-value Pair RDD ? I'd like each value of a pair to be written to its own file where the file name corresponds to the key name. Thanks, -- Nick

Re: Writing output of key-value Pair RDD

2016-05-04 Thread Nicholas Chammas
o write out to S3 the values of a f key-value Pair RDD ? > > > I'd like each value of a pair to be written to its own file where the file > name corresponds to the key name. > > > Thanks, > > -- > > Nick >

Writing output of key-value Pair RDD

2016-05-04 Thread Afshartous, Nick
Hi, Is there any way to write out to S3 the values of a f key-value Pair RDD ? I'd like each value of a pair to be written to its own file where the file name corresponds to the key name. Thanks, -- Nick

Re: pyspark split pair rdd to multiple

2016-04-20 Thread Gourav Sengupta
df1.show() > > > On Tue, Apr 19, 2016 at 12:51 PM, pth001 > wrote: > >> Hi, >> >> How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in >> Pyspark? >> >> Best, >> Patcharee >> >>

Re: pyspark split pair rdd to multiple

2016-04-20 Thread Wei Chen
ot;, udf1("V").alias("arrayV")) df1.show() On Tue, Apr 19, 2016 at 12:51 PM, pth001 wrote: > Hi, > > How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in > Pyspark? > > Best, > Patcharee > > ---

Re: pyspark split pair rdd to multiple

2016-04-20 Thread patcharee
n I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <mailto:user-unsubscr...@spark.apache.org>

Re: pyspark split pair rdd to multiple

2016-04-20 Thread Gourav Sengupta
Is there any reason why you are not using data frames? Regards, Gourav On Tue, Apr 19, 2016 at 8:51 PM, pth001 wrote: > Hi, > > How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in > Pyspark? > >

pyspark split pair rdd to multiple

2016-04-19 Thread pth001
Hi, How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Best way to save key-value pair rdd ?

2015-12-07 Thread Anup Sawant
Hello, what would be the best way to save key-value pair rdd so that I don't have to convert the saved record into tuple while reading the rdd back into spark ? -- Best, Anup

Re: How to return a pair RDD from an RDD that has foreachPartition applied?

2015-11-18 Thread Sathish Kumaran Vairavelu
it. How do I apply foreachPartition and do a save and at the same >> return a >> pair RDD. >> >> def saveDataPointsBatchNew(records: RDD[(String, (Long, >> java.util.LinkedHashMap[java.lang.Long, >> java.lang.Float],java.util.LinkedHashMap[java.lang.Long, java.lang

Re: How to return a pair RDD from an RDD that has foreachPartition applied?

2015-11-18 Thread swetha kasireddy
oks like an RDD that has foreachPartition can have only the return type > as > Unit. How do I apply foreachPartition and do a save and at the same return > a > pair RDD. > > def saveDataPointsBatchNew(records: RDD[(String, (Long, > java.util.LinkedHashMap[java.lang.Long, > j

How to return a pair RDD from an RDD that has foreachPartition applied?

2015-11-17 Thread swetha
xt: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-return-a-pair-RDD-from-an-RDD-that-has-foreachPartition-applied-tp25411.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-m

Re: Loading json data into Pair RDD in Spark using java

2015-09-09 Thread Ted Yu
irs.foreach(println) > > val pairsastuple = pairs.map(x => if(x.split("=").length>1) > (x.split("=")(0), x.split("=")(1)) else (x.split("=")(0), x)) > > > > > > -- > View this message in context: > http://apache-sp

Loading json data into Pair RDD in Spark using java

2015-09-09 Thread prachicsa
current code: val pairs = setECrecords.flatMap(x => (x.split(","))) pairs.foreach(println) val pairsastuple = pairs.map(x => if(x.split("=").length>1) (x.split("=")(0), x.split("=")(1)) else (x.split("=")(0), x)) -- View this message i

Re: Is pair rdd join more efficient than regular rdd

2015-02-02 Thread Akhil Das
spark sql and running into shuffle > issues. We have explored multiple options - using coalesce to reduce number > of partitions, tuning various parameters like disk buffer, reducing data in > chunks etc. which all seem to help btw. What I would like to know is, > is having a pair rdd ov

Is pair rdd join more efficient than regular rdd

2015-02-01 Thread Sunita Arvind
, is having a pair rdd over regular rdd one of the solutions ? Will it make the joining more efficient as spark can shuffle better since it knows the key? Logically speaking I think it should help but I haven't found any evidence on the internet including the spark sql documentation. It is a l

Re: Is sorting persisted after pair rdd transformations?

2014-11-19 Thread Aniket Bhatnagar
8 is the first >>> number that demonstrates this.) >>> >>> On Wed, Nov 19, 2014 at 9:05 AM, Akhil Das >>> wrote: >>> >>>> If something is persisted you can easily see them under the Storage tab >>>> in the web ui. >>

Re: Is sorting persisted after pair rdd transformations?

2014-11-19 Thread Daniel Darabos
il Das >> wrote: >> >>> If something is persisted you can easily see them under the Storage tab >>> in the web ui. >>> >>> Thanks >>> Best Regards >>> >>> On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar < >>>

Re: Is sorting persisted after pair rdd transformations?

2014-11-19 Thread Aniket Bhatnagar
, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar < >> aniket.bhatna...@gmail.com> wrote: >> >>> I am trying to figure out if sorting is persisted after applying Pair >>> RDD transformations and I am not able to decisively tell after reading the >>>

Re: Is sorting persisted after pair rdd transformations?

2014-11-19 Thread Daniel Darabos
mail.com> wrote: > >> I am trying to figure out if sorting is persisted after applying Pair RDD >> transformations and I am not able to decisively tell after reading the >> documentation. >> >> For example: >> val numbers = .. // RDD of numbers >> v

Re: Is sorting persisted after pair rdd transformations?

2014-11-19 Thread Akhil Das
If something is persisted you can easily see them under the Storage tab in the web ui. Thanks Best Regards On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar < aniket.bhatna...@gmail.com> wrote: > I am trying to figure out if sorting is persisted after applying Pair RDD > transform

Is sorting persisted after pair rdd transformations?

2014-11-18 Thread Aniket Bhatnagar
I am trying to figure out if sorting is persisted after applying Pair RDD transformations and I am not able to decisively tell after reading the documentation. For example: val numbers = .. // RDD of numbers val pairedNumbers = numbers.map(number => (number % 100, number)) val sortedPairedNumb

Re: How to make operation like cogrop() , groupbykey() on pair RDD = [ [ ], [ ] , [ ] ]

2014-10-16 Thread Gen
> 0x4b1b4d0. > > I want to show data for pyspark.resultiterable.ResultIterable at > 0x4b1bd50. > > > Could please tell me the way to show data for those object . I m using > python > > > > Thanks, -- View this message in context: http://apache-spark-u

Re: How to make operation like cogrop() , groupbykey() on pair RDD = [ [ ], [ ] , [ ] ]

2014-10-15 Thread Gen
-make-operation-like-cogrop-groupbykey-on-pair-RDD-tp16487p16489.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional comm

Re: Pair RDD

2014-08-26 Thread Yanbo Liang
val node = textFile.map(line => { val fileds = line.split("\\s+") (fileds(1),fileds(2)) }) then you can manipulate node RDD with PairRDD function. 2014-08-26 12:55 GMT+08:00 Deep Pradhan : > Hi, > I have an input file of a graph in the format > When I use sc.textFile, it will c

Pair RDD

2014-08-25 Thread Deep Pradhan
Hi, I have an input file of a graph in the format When I use sc.textFile, it will change the entire text file into an RDD. How can I transform the file into key, value pair and then eventually into paired RDDs. Thank You