I’m not sure where to post this since its a bit of a philosophical question in 
terms of design and vision for spark. 

If we look at SparkSQL and performance… where does Secondary indexing fit in? 

The reason this is a bit awkward is that if you view Spark as querying RDDs 
which are temporary, indexing doesn’t make sense until you consider your use 
case and how long is ‘temporary’.
Then if you consider your RDD result set could be based on querying tables… and 
you could end up with an inverted table as an index… then indexing could make 
sense. 

Does it make sense to discuss this in user or dev email lists? Has anyone given 
this any thought in the past? 

Thx

-Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to