[
https://issues.apache.org/jira/browse/SOLR-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424704#comment-16424704
]
Boris Pasko commented on SOLR-6305:
-----------------------------------
Here is the patch
[^0001-OIQ-23224-SOLR-6305-Fixed-SOLR-6305-by-reading-the-r.patch] for solr
6.6.3.
It is very simple. Instead of relying on server-provided default, reread the
replication factor from DFS client config.
{code:java}
private static final OutputStream getOutputStream(FileSystem fileSystem, Path
path) throws IOException {
Configuration conf = fileSystem.getConf();
FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);
+ short replication = fileSystem.getDefaultReplication(path);
EnumSet<CreateFlag> flags = EnumSet.of(CreateFlag.CREATE,
CreateFlag.OVERWRITE);
if (Boolean.getBoolean(HDFS_SYNC_BLOCK)) {
flags.add(CreateFlag.SYNC_BLOCK);
}
return fileSystem.create(path, FsPermission.getDefault()
.applyUMask(FsPermission.getUMask(conf)), flags, fsDefaults
+ .getFileBufferSize(), replication, fsDefaults
.getBlockSize(), null);
}{code}
I have tested this on real hardware cluster and it generates files with
replication factor set in /etc/hbase/conf/hdfs-site.xml (provided in
solrconfig.xml).
I haven't found any HdfsFileWriter unit tests so haven't modified any.
I'm running 'ant test' with the patch.
> Ability to set the replication factor for index files created by
> HDFSDirectoryFactory
> -------------------------------------------------------------------------------------
>
> Key: SOLR-6305
> URL: https://issues.apache.org/jira/browse/SOLR-6305
> Project: Solr
> Issue Type: Improvement
> Components: hdfs
> Environment: hadoop-2.2.0
> Reporter: Timothy Potter
> Priority: Major
> Attachments:
> 0001-OIQ-23224-SOLR-6305-Fixed-SOLR-6305-by-reading-the-r.patch
>
>
> HdfsFileWriter doesn't allow us to create files in HDFS with a different
> replication factor than the configured DFS default because it uses:
> {{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}
> Since we have two forms of replication going on when using
> HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication
> factor for the Solr directories to a lower value than the default. I realize
> this might reduce the chance of data locality but since Solr cores each have
> their own path in HDFS, we should give operators the option to reduce it.
> My original thinking was to just use Hadoop setrep to customize the
> replication factor, but that's a one-time shot and doesn't affect new files
> created. For instance, I did:
> {{hadoop fs -setrep -R 1 solr49/coll1}}
> My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an
> example
> Then added some more docs to the coll1 and did:
> {{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}
> 3 <-- should be 1
> So it looks like new files don't inherit the repfact from their parent
> directory.
> Not sure if we need to go as far as allowing different replication factor per
> collection but that should be considered if possible.
> I looked at the Hadoop 2.2.0 code to see if there was a way to work through
> this using the Configuration object but nothing jumped out at me ... and the
> implementation for getServerDefaults(path) is just:
> public FsServerDefaults getServerDefaults(Path p) throws IOException {
> return getServerDefaults();
> }
> Path is ignored ;-)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]