Thank you. On Wed, 11 Jan 2017 at 07:21, Chris Drome <cdr...@yahoo-inc.com> wrote:
> Elliot, > > Mithun already created the following ticket to track the issue: > > https://issues.apache.org/jira/browse/HIVE-15575 > > chris > > > On Tuesday, January 10, 2017 11:05 PM, Elliot West <tea...@gmail.com> > wrote: > > > Thanks Rohini, > > This is good to know. Could you perhaps raise an issue in the Hive JIRA? > > Thanks, > > Elliot. > > On Tue, 10 Jan 2017 at 22:55, Rohini Palaniswamy <rohini.adi...@gmail.com> > wrote: > > The implementation in hive does look wrong. The concept of VertexGroups > was added in Tez specifically for the case of union to support writing to > same directory from different vertices. Sub-directories should not be > required as a workaround. > > Regards, > Rohini > > > On Sun, Dec 25, 2016 at 10:58 AM, Stephen Sprague <sprag...@gmail.com> > wrote: > > Thanks Elliot. Nice christmas present. Those settings in that > stackoverflow link look to me to be exactly what i need to set for MR jobs > to pick that data up that Tez created. > > Cheers, > Stephen. > > On Sun, Dec 25, 2016 at 2:45 AM, Elliot West <tea...@gmail.com> wrote: > > I believe that tez will generate subfolders for unioned data. As far as I > know, this is the expected behaviour and there is no alternative. > Presumably this is to prevent multiple tasks from attempting to write the > same file? > > We've experienced issues when switching from mr to tez; downstream jobs > weren't expecting subfolders and had trouble reading previously accessible > datasets. > > Apparently there are workarounds within Hive: > > http://stackoverflow.com/questions/39511585/hive-create-table-not-insert-data > > Merry Christmas, > > Elliot. > > On Sun, 25 Dec 2016 at 03:11, Rajesh Balamohan <rbalamo...@apache.org> > wrote: > > Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the > select query? > > Assuming you are creating the table in staging.db, it would have created > the table location as staging.db/foo (as you have not specified the > location). > > Adding user@hive.apache.org as this is hive related. > > > ~Rajesh.B > > On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague <sprag...@gmail.com> > wrote: > > all, > > i'm running tez with the sql pattern: > > * create table foo as select * from (select... UNION select... UNION > select...) > > in the logs the final step is this: > > * Moving data to directory hdfs:// > dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4 > from hdfs:// > dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/.hive-staging_hive_2016-12-24_10-05-40_048_4896412314807355668-899/-ext-10002 > > > when querying the table i got zero rows returned which made me curious. so > i queried the hdfs location and see this: > > $ hdfs dfs -ls hdfs:// > dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4 > > Found 3 items > drwxrwxrwx - dwr supergroup 0 2016-12-24 10:05 hdfs:// > dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/1 > drwxrwxrwx - dwr supergroup 0 2016-12-24 10:06 hdfs:// > dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/2 > drwxrwxrwx - dwr supergroup 0 2016-12-24 10:06 hdfs:// > dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/3 > > and yes the data files are under these three dirs. > > so i ask... i'm not used to seeing sub-directories under the tablename > unless the table is partitioned. is this legit? might there be some config > settings i need to set to see this data via sql? > > thanks, > Stephen. > > > > > > > > > > > > > > > >