> Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is > called by initTableSnapshotMapperJob (in master branch) > > Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation > between the two. > > Do you have stack trace or something else showing the relationship ?
Right. That’s what I meant by ‘indirectly’. This is a stack trace that was caused by an ownership conflict: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301) at net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77) at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > > Cheers > > > On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema < > [email protected]> wrote: > >> initTableSnapshotMapperJob writes into this directory (indirectly) via >> RestoreSnapshotHelper.restoreHdfsRegions >> >> Is this expected? I would have expected writes to be limited to the temp >> directory passed in the init call >> >> Brian >> >> On Sep 7, 2014, at 8:17 AM, Ted Yu <[email protected]> wrote: >> >>> The files under archive directory are referenced by snapshots. >>> Please don't delete them manually. >>> >>> You can delete unused snapshots. >>> >>> Cheers >>> >>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema < >> [email protected]> wrote: >>> >>>> >>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <[email protected]> wrote: >>>> >>>>> Can you post your hbase-site.xml ? >>>>> >>>>> /apps/hbase/data/archive/data/default is where HFiles are archived >> (e.g. >>>>> when a column family is deleted, HFiles for this column family are >> stored >>>>> here). >>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir >>>> >>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think >> that's the default that Ambari creates. >>>> >>>> So the HFiles in the archive subdirectory have been discarded and can >> be deleted safely? >>>> >>>>> bq. a problem I'm having running map/reduce jobs against snapshots >>>>> >>>>> Can you describe the problem in a bit more detail ? >>>> >>>> I don't understand what I'm seeing well enough to ask an intelligent >> question yet. >>>> I appear to be scanning duplicate rows when using >> initTableSnapshotMapperJob, >>>> but I'm trying to get a better understanding of how this works, since >> It's probably just >>>> something I'm doing wrong. >>>> >>>> Brian >>>> >>>>> Cheers >>>>> >>>>> >>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema < >>>>> [email protected]> wrote: >>>>> >>>>>> I'm trying to track down a problem I'm having running map/reduce jobs >>>>>> against snapshots. >>>>>> Can someone explain the difference between files stored in: >>>>>> >>>>>> /apps/hbase/data/archive/data/default >>>>>> >>>>>> and files stored in >>>>>> >>>>>> /apps/hbase/data/data/default >>>>>> >>>>>> (Hadoop 2.4, HBase 0.98) >>>>>> >>>>>> Thanks >>>> >>> >> >>
