Okay, I also saw your previous response which analyzed queries into two tables built around two files in the same directory. I guess I was simply wrong in my understanding that a Hive table is fundamentally associated with a directory instead of a file. Turns out, it be can either one. A directory table uses all files in the directory while a file table uses one specific file and properly avoids sibling files. My bad.
Thanks for the careful analysis and clarification. TIL! Cheers! On Mar 27, 2013, at 02:58 , Tony Burton wrote: > A bit more info - do an extended description of the table: > > $ desc extended gsrc1; > > And the “location” field is “location:s3://mybucket/path/to/data/src1.txt” > > Do the same on a table created with a location pointing at the directory and > the same info gives (not surprisingly) “location:s3://mybucket/path/to/data/” > ________________________________________________________________________________ Keith Wiley [email protected] keithwiley.com music.keithwiley.com "I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me." -- Abe (Grandpa) Simpson ________________________________________________________________________________
