Enno Shioji created HADOOP-11444: ------------------------------------ Summary: Jets3tFileSystemStore fails to remove initial slash from object keys, resulting in objects with double forward slashes being stored Key: HADOOP-11444 URL: https://issues.apache.org/jira/browse/HADOOP-11444 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.2.0 Environment: java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode) Reporter: Enno Shioji Priority: Minor
While writing to S3 using Spark 1.2.0's ReceiverInputDStream#saveAsTextFiles with a S3 URL ("s3://fake-test/1234"), I noticed that files are written with double forward slashes (e.g. "s3://fake-test//1234/-1419334280000/"). After debugging, it seems this is caused by Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." for the input "s3://fake-test/1234/...". when it should hack off the first forward slash. When I used a s3n URL and hence Jets3tNativeFileSystemStore, the double slashes went away. Here are the comparison between their pathToKey implementation: Jets3tNativeFileSystemStore's implementation of pathToKey is: ====== private static String pathToKey(Path path) { if (path.toUri().getScheme() != null && path.toUri().getPath().isEmpty()) { // allow uris without trailing slash after bucket to refer to root, // like s3n://mybucket return ""; } if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } String ret = path.toUri().getPath().substring(1); // remove initial slash if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) { ret = ret.substring(0, ret.length() -1); } return ret; } ====== whereas Jets3tFileSystemStore uses: ====== private String pathToKey(Path path) { if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } return path.toUri().getPath(); } ====== -- This message was sent by Atlassian JIRA (v6.3.4#6332)