Thanks for the update. 
Aaron


On 17 Jan, 2011,at 02:51 PM, Brandon Williams <[email protected]> wrote:

2011/1/16 Jun Young Kim <[email protected]>

Hi aron.

I think that if the pig is able to support to map it, the same job could be represented in java code itself.

I believe that we can call a map function by loading a file and cassandra at the same time.

Ps) I dont need to join from them. I just wanna compare each keys which are read from them.


We went over this on irc, but I will repeat the summary for posterity.

This is a case where using the thrift API, rather than a o.a.c.hadoop construct is probably better (right now) because ColumnFamilyInputFormat expects to go over the entire CF, and a join the reducer is costly.  Instead what you really want is per-row access after reading an entry from the file in the map task, so using Hector inside the hadoop job makes the most sense.

-Brandon 

Reply via email to