Probably for the XML case the best resource I found iare http://stevenskelton.ca/real-time-data-mining-spark/ and http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/ . And about JSON? If I have to work with JSON and I want to use fasterxml implementation? Is there any suggestion about how to start?
On Wed, Apr 9, 2014 at 11:37 PM, Flavio Pompermaier <pomperma...@okkam.it>wrote: > Any help about this...? > On Apr 9, 2014 9:19 AM, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > >> Hi to everybody, >> >> In my current scenario I have complex objects stored as xml in an HBase >> Table. >> What's the best strategy to work with them? My final goal would be to >> define operators on those objects (like filter, equals, append, join, >> merge, etc) and then work with multiple RDDs to perform some kind of >> comparison between those objects. What do you suggest me? Is it possible? >> >> Best, >> Flavio >> >