Probably for the XML case the best resource I found iare
http://stevenskelton.ca/real-time-data-mining-spark/ and
http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/
.
And about JSON? If I have to work with JSON and I want to use fasterxml
implementation? Is there any suggestion about how to start?

On Wed, Apr 9, 2014 at 11:37 PM, Flavio Pompermaier <pomperma...@okkam.it>wrote:

> Any help about this...?
> On Apr 9, 2014 9:19 AM, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
>
>> Hi to everybody,
>>
>> In my current scenario I have complex objects stored as xml in an HBase
>> Table.
>> What's the best strategy to work with them? My final goal would be to
>> define operators on those objects (like filter, equals, append, join,
>> merge, etc) and then work with multiple RDDs to perform some kind of
>> comparison between those objects. What do you suggest me? Is it possible?
>>
>> Best,
>> Flavio
>>
>

Reply via email to