2nd try
From: Anil Dasari
Date: Sunday, September 5, 2021 at 10:42 AM
To: "user@spark.apache.org"
Subject: Spark Pair RDD write to Hive
Hello,
I have a use case where users of group id are persisted to hive table.
// pseudo code looks like below
usersRDD = sc.parallelize(..)
us
Hello,
I have a use case where users of group id are persisted to hive table.
// pseudo code looks like below
usersRDD = sc.parallelize(..)
usersPairRDD = usersRDD.map(u => (u.groupId, u))
groupedUsers = usersPairRDD.groupByKey()
Can I save groupedUsers RDD into hive tables where table name is k
;
>
>
> Thank you
>
> Anil Langote
>
> +1-425-633-9747 <+1%20425-633-9747>
>
>
>
>
>
> *From: *ayan guha
> *Date: *Sunday, January 8, 2017 at 10:32 PM
> *To: *Anil Langote
>
> *Subject: *Re: Efficient look up in Key Pair RDD
>
>
>
you
Anil Langote
+1-425-633-9747
From: ayan guha
Date: Sunday, January 8, 2017 at 10:26 PM
To: Anil Langote
Cc: Holden Karau , user
Subject: Re: Efficient look up in Key Pair RDD
Have you tried something like GROUPING SET? That seems to be the exact thing
you are looking for
Have you tried something like GROUPING SET? That seems to be the exact
thing you are looking for
On Mon, Jan 9, 2017 at 12:37 PM, Anil Langote
wrote:
> Sure. Let me explain you my requirement I have an input file which has
> attributes (25) and las column is array of doubles (14500 elements
Sure. Let me explain you my requirement I have an input file which has
attributes (25) and las column is array of doubles (14500 elements in original
file)
Attribute_0
Attribute_1
Attribute_2
Attribute_3
DoubleArray
5
3
5
3
0.2938933463658645 0.0437040427073041 0.23002681025029648 0.18003221
To start with caching and having a known partioner will help a bit, then
there is also the IndexedRDD project, but in general spark might not be the
best tool for the job. Have you considered having Spark output to
something like memcache?
What's the goal of you are trying to accomplish?
On Sun,
Hi All,
I have a requirement where I wanted to build a distributed HashMap which
holds 10M key value pairs and provides very efficient lookups for each key.
I tried loading the file into JavaPairedRDD and tried calling lookup method
its very slow.
How can I achieve very very faster lookup by a gi
> x++y )
The idea is create an ArrayBuffer (This maintains insertion order).
More elegant solution would be using zipByIndex on the pair RDD and then
sort by the index in each groupByKey
RDD.zipByIndex ==> This will give you something like this value
((x,y),index)
now map it like this (x
t;janardhan shetty"
>>>>> wrote:
>>>>>
>>>>>> Array(
>>>>>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>>>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>>&g
Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>>>>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>>>>> 45431, 100136)),
>>>>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
>>>
t;> 44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431,
>>>> 36318, 162076))
>>>> )
>>>>
>>>> I need to compare first 5 elements of ID1 with first five element of
>>>> ID3 next first 5 elements of ID1 to ID2. Sim
s
>>>
>>>
>>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni
>>> wrote:
>>>
>>>> Apologies I misinterpreted could you post two use cases?
>>>> Kr
>>>>
>>>> On 24 Jul 2016 3:41 pm, "janardhan shett
pm, "janardhan shetty"
>>> wrote:
>>>
>>>> Marco,
>>>>
>>>> Thanks for the response. It is indexed order and not ascending or
>>>> descending order.
>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" wr
37 AM, "Marco Mistroni" wrote:
>>>
>>>> Use map values to transform to an rdd where values are sorted?
>>>> Hth
>>>>
>>>> On 24 Jul 2016 6:23 am, "janardhan shetty"
>>>> wrote:
>>>>
> Hth
>>>
>>> On 24 Jul 2016 6:23 am, "janardhan shetty"
>>> wrote:
>>>
>>>> I have a key,value pair rdd where value is an array of Ints. I need to
>>>> maintain the order of the value in order to execute downstream
>>
uot; wrote:
>
>> Use map values to transform to an rdd where values are sorted?
>> Hth
>>
>> On 24 Jul 2016 6:23 am, "janardhan shetty"
>> wrote:
>>
>>> I have a key,value pair rdd where value is an array of Ints. I need to
>>> maintain th
Marco,
Thanks for the response. It is indexed order and not ascending or
descending order.
On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote:
> Use map values to transform to an rdd where values are sorted?
> Hth
>
> On 24 Jul 2016 6:23 am, "janardhan shetty" wro
I have a key,value pair rdd where value is an array of Ints. I need to
maintain the order of the value in order to execute downstream
modifications. How do we maintain the order of values?
Ex:
rdd = (id1,[5,2,3,15],
Id2,[9,4,2,5])
Followup question how do we compare between one element in rdd
: Nicholas Chammas; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD
Thanks, I got the example below working. Though it writes both the keys and
values to the output file.
Is there any way to write just the values ?
--
Nick
String[] strings = { "Abcd&qu
shartous, Nick
mailto:nafshart...@turbine.com>> wrote:
Hi,
Is there any way to write out to S3 the values of a f key-value Pair RDD ?
I'd like each value of a pair to be written to its own file where the file name
corresponds to the key name.
Thanks,
--
Nick
o write out to S3 the values of a f key-value Pair RDD ?
>
>
> I'd like each value of a pair to be written to its own file where the file
> name corresponds to the key name.
>
>
> Thanks,
>
> --
>
> Nick
>
Hi,
Is there any way to write out to S3 the values of a f key-value Pair RDD ?
I'd like each value of a pair to be written to its own file where the file name
corresponds to the key name.
Thanks,
--
Nick
df1.show()
>
>
> On Tue, Apr 19, 2016 at 12:51 PM, pth001
> wrote:
>
>> Hi,
>>
>> How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in
>> Pyspark?
>>
>> Best,
>> Patcharee
>>
>>
ot;, udf1("V").alias("arrayV"))
df1.show()
On Tue, Apr 19, 2016 at 12:51 PM, pth001 wrote:
> Hi,
>
> How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in
> Pyspark?
>
> Best,
> Patcharee
>
> ---
n I split pair rdd [K, V] to map [K, Array(V)] efficiently
in Pyspark?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
Is there any reason why you are not using data frames?
Regards,
Gourav
On Tue, Apr 19, 2016 at 8:51 PM, pth001 wrote:
> Hi,
>
> How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in
> Pyspark?
>
>
Hi,
How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hello,
what would be the best way to save key-value pair rdd so that I don't have
to convert the saved record into tuple while reading the rdd back into
spark ?
--
Best,
Anup
it. How do I apply foreachPartition and do a save and at the same
>> return a
>> pair RDD.
>>
>> def saveDataPointsBatchNew(records: RDD[(String, (Long,
>> java.util.LinkedHashMap[java.lang.Long,
>> java.lang.Float],java.util.LinkedHashMap[java.lang.Long, java.lang
oks like an RDD that has foreachPartition can have only the return type
> as
> Unit. How do I apply foreachPartition and do a save and at the same return
> a
> pair RDD.
>
> def saveDataPointsBatchNew(records: RDD[(String, (Long,
> java.util.LinkedHashMap[java.lang.Long,
> j
xt:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-return-a-pair-RDD-from-an-RDD-that-has-foreachPartition-applied-tp25411.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-m
irs.foreach(println)
>
> val pairsastuple = pairs.map(x => if(x.split("=").length>1)
> (x.split("=")(0), x.split("=")(1)) else (x.split("=")(0), x))
>
>
>
>
>
> --
> View this message in context:
> http://apache-sp
current code:
val pairs = setECrecords.flatMap(x => (x.split(",")))
pairs.foreach(println)
val pairsastuple = pairs.map(x => if(x.split("=").length>1)
(x.split("=")(0), x.split("=")(1)) else (x.split("=")(0), x))
--
View this message i
spark sql and running into shuffle
> issues. We have explored multiple options - using coalesce to reduce number
> of partitions, tuning various parameters like disk buffer, reducing data in
> chunks etc. which all seem to help btw. What I would like to know is,
> is having a pair rdd ov
,
is having a pair rdd over regular rdd one of the solutions ? Will it make
the joining more efficient as spark can shuffle better since it knows the
key? Logically speaking I think it should help but I haven't found any
evidence on the internet including the spark sql documentation.
It is a l
8 is the first
>>> number that demonstrates this.)
>>>
>>> On Wed, Nov 19, 2014 at 9:05 AM, Akhil Das
>>> wrote:
>>>
>>>> If something is persisted you can easily see them under the Storage tab
>>>> in the web ui.
>>
il Das
>> wrote:
>>
>>> If something is persisted you can easily see them under the Storage tab
>>> in the web ui.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar <
>>>
, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar <
>> aniket.bhatna...@gmail.com> wrote:
>>
>>> I am trying to figure out if sorting is persisted after applying Pair
>>> RDD transformations and I am not able to decisively tell after reading the
>>>
mail.com> wrote:
>
>> I am trying to figure out if sorting is persisted after applying Pair RDD
>> transformations and I am not able to decisively tell after reading the
>> documentation.
>>
>> For example:
>> val numbers = .. // RDD of numbers
>> v
If something is persisted you can easily see them under the Storage tab in
the web ui.
Thanks
Best Regards
On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:
> I am trying to figure out if sorting is persisted after applying Pair RDD
> transform
I am trying to figure out if sorting is persisted after applying Pair RDD
transformations and I am not able to decisively tell after reading the
documentation.
For example:
val numbers = .. // RDD of numbers
val pairedNumbers = numbers.map(number => (number % 100, number))
val sortedPairedNumb
> 0x4b1b4d0.
>
> I want to show data for pyspark.resultiterable.ResultIterable at
> 0x4b1bd50.
>
>
> Could please tell me the way to show data for those object . I m using
> python
>
>
>
> Thanks,
--
View this message in context:
http://apache-spark-u
-make-operation-like-cogrop-groupbykey-on-pair-RDD-tp16487p16489.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional comm
val node = textFile.map(line => {
val fileds = line.split("\\s+")
(fileds(1),fileds(2))
})
then you can manipulate node RDD with PairRDD function.
2014-08-26 12:55 GMT+08:00 Deep Pradhan :
> Hi,
> I have an input file of a graph in the format
> When I use sc.textFile, it will c
Hi,
I have an input file of a graph in the format
When I use sc.textFile, it will change the entire text file into an RDD.
How can I transform the file into key, value pair and then eventually into
paired RDDs.
Thank You
46 matches
Mail list logo