Re: creating a distributed index

2015-11-06 Thread swetha kasireddy
Hi Ankur, I have the following questions on IndexedRDD. 1. Does the IndexedRDD support the key types of String? As per the current documentation, it looks like it supports only Long? 2. Is IndexedRDD efficient when joined with another RDD. So, basically my usecase is that I need to create an I

Re: creating a distributed index

2015-07-15 Thread Jem Tucker
This is very interesting, do you know if this version will be backwards compatible with older versions of Spark (1.2.0)? Thanks, Jem On Wed, Jul 15, 2015 at 10:04 AM Ankur Dave wrote: > The latest version of IndexedRDD supports any key type with a defined > serializer >

Re: creating a distributed index

2015-07-15 Thread Ankur Dave
The latest version of IndexedRDD supports any key type with a defined serializer , including Strings. It's not released yet, but you can use it from the master branch i

Re: creating a distributed index

2015-07-15 Thread Jem Tucker
With regards to Indexed structures in Spark are there any alternatives to IndexedRDD for more generic keys including Strings? Thanks Jem On Wed, Jul 15, 2015 at 7:41 AM Burak Yavuz wrote: > Hi Swetha, > > IndexedRDD is available as a package on Spark Packages >

Re: creating a distributed index

2015-07-14 Thread Burak Yavuz
Hi Swetha, IndexedRDD is available as a package on Spark Packages . Best, Burak On Tue, Jul 14, 2015 at 5:23 PM, swetha wrote: > Hi Ankur, > > Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark > Streaming to do

Re: creating a distributed index

2015-07-14 Thread swetha
Hi Ankur, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cr

RE: creating a distributed index

2014-08-04 Thread Daniel, Ronald (ELS-SDG)
-- > From: Philip Ogren [mailto:philip.og...@oracle.com] > Sent: Monday, August 04, 2014 11:08 AM > To: user@spark.apache.org > Subject: Re: creating a distributed index > > After playing around with mapPartition I think this does exactly what I want. > I can pass in a f

Re: creating a distributed index

2014-08-04 Thread Philip Ogren
After playing around with mapPartition I think this does exactly what I want. I can pass in a function to mapPartition that looks like this: def f1(iter: Iterator[String]): Iterator[MyIndex] = { val idx: MyIndex = new MyIndex() while (iter.hasNext) {

Re: creating a distributed index

2014-08-04 Thread Philip Ogren
This looks like a really cool feature and it seems likely that this will be extremely useful for things we are doing. However, I'm not sure it is quite what I need here. With an inverted index you don't actually look items up by their keys but instead try to match against some input string.

Re: creating a distributed index

2014-08-01 Thread Ankur Dave
At 2014-08-01 14:50:22 -0600, Philip Ogren wrote: > It seems that I could do this with mapPartition so that each element in a > partition gets added to an index for that partition. > [...] > Would it then be possible to take a string and query each partition's index > with it? Or better yet, take

Re: creating a distributed index

2014-08-01 Thread andy petrella
Hey, There is some work that started on IndexedRDD (on master I think). Meanwhile, checking what has been done in GraphX regarding vertex index in partitions could be worthwhile I guess Hth Andy Le 1 août 2014 22:50, "Philip Ogren" a écrit : > > Suppose I want to take my large text data input and