Its not completely transparent, but you can do something like the following today:
CACHE TABLE hotData AS SELECT columns, I, care, about FROM fullTable On Sun, Feb 1, 2015 at 3:03 AM, Mick Davies <michael.belldav...@gmail.com> wrote: > I have been working a lot recently with denormalised tables with lots of > columns, nearly 600. We are using this form to avoid joins. > > I have tried to use cache table with this data, but it proves too expensive > as it seems to try to cache all the data in the table. > > For data sets such as the one I am using you find that certain columns will > be hot, referenced frequently in queries, others will be used very > infrequently. > > Therefore it would be great if caches could be column based. I realise that > this may not be optimal for all use cases, but I think it could be quite a > common need. Has something like this been considered? > > Thanks Mick > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Caching-tables-at-column-level-tp10377.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >