That makes sense. The problem is I jumped directly to using pig, which is
abstracting some of the data flow from me. I guess I'll have to figure out
what it's doing under the covers, to know how to optimize/fix bottlenecks.
But for now, I'm taking this information to mean "I should run datanodes
On Wed, Jul 6, 2011 at 2:48 PM, William Oberman wrote:
> I have a few cassandra/hadoop/pig questions. I currently have things set
> up in a test environment, and for the most part everything works. But,
> before I start to roll things out to production, I wanted to check
> on/confirm some things