Thanks for the quick reply Bin. Phenix is something I'm going to try for sure but is seems somehow useless if I can use Spark. Probably, as you said, since Phoenix use a dedicated data structure within each HBase Table has a more effective memory usage but if I need to deserialize data stored in a HBase cell I still have to read in memory that object and thus I need Spark. From what I understood Phoenix is good if I have to query a simple column of HBase but things get really complicated if I have to add an index for each column in my table and I store complex object within the cells. Is it correct?
Best, Flavio On Tue, Apr 8, 2014 at 6:05 PM, Bin Wang <binwang...@gmail.com> wrote: > Hi Flavio, > > I happened to attend, actually attending the 2014 Apache Conf, I heard a > project called "Apache Phoenix", which fully leverage HBase and suppose to > be 1000x faster than Hive. And it is not memory bounded, in which case sets > up a limit for Spark. It is still in the incubating group and the "stats" > functions spark has already implemented are still on the roadmap. I am not > sure whether it will be good but might be something interesting to check > out. > > /usr/bin > > > On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier > <pomperma...@okkam.it>wrote: > >> Hi to everybody, >> >> in these days I looked a bit at the recent evolution of the big data >> stacks and it seems that HBase is somehow fading away in favour of >> Spark+HDFS. Am I correct? >> Do you think that Spark and HBase should work together or not? >> >> Best regards, >> Flavio >> >