[ https://issues.apache.org/jira/browse/SOLR-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705640#comment-17705640 ]
Jason Gerlowski commented on SOLR-16697: ---------------------------------------- I've merged this to main and plan on backporting soon. But I noticed that over the weekend there were a few test failures for S3InstallShardTest, a new test added in this commit. AFAICT those failures are from a OOM in the test JVM. {code} 2> Caused by: java.lang.OutOfMemoryError: Java heap space 2> at org.apache.solr.s3.S3BackupRepository.copyIndexFileTo(S3BackupRepository.java:348) ~[main/:?] 2> at org.apache.solr.core.TrackingBackupRepository.copyIndexFileTo(TrackingBackupRepository.java:149) ~[solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT 81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]] 2> at org.apache.solr.core.backup.repository.BackupRepository.copyFileTo(BackupRepository.java:191) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT 81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]] 2> at org.apache.solr.handler.RestoreCore$BasicRestoreRepository.repoCopy(RestoreCore.java:242) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT 81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]] 2> at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:132) ~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT 81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]] {code} It might just be a fluke, but it's also possible that S3InstallShardTest is a bad citizen memory-wise. Anyway, I plan to give this a few more days to see if the failure recurs before backporting. > New API support to import index files generated by Embedded SOLR into SOLR > Cloud > -------------------------------------------------------------------------------- > > Key: SOLR-16697 > URL: https://issues.apache.org/jira/browse/SOLR-16697 > Project: Solr > Issue Type: New Feature > Components: Backup/Restore > Reporter: Indumathy Rajagopalan > Assignee: Jason Gerlowski > Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Offline indexing is a popular option when really large data sets needs to be > indexed into SOLR. > Data is loaded from data source ( eg. c*) and index creation pipelines > produce index files per shard using embedded SOLR. > > With older versions of SOLR, we would copy these index files into SOLR Cloud > data directories using a custom tools and reload the collection to be able to > search/update on the newly uploaded collection. > Ideally, we should use the Restore API to import the index files from backup > repository. However, the file structure expected for the Restore API to work > is complex enough that massaging the index files in every shard into Restore > compatible format is infeasible. > > It would be good for SOLR to support a 'Restore' like API that would allow us > to import index files generated by embedded SOLR into SOLR Cloud ? This API > should operate on shard level and be able to import the index files into a > single shard (per invocation) > > *With the new API , offline indexing could look like this :* > > 1. Generate index files per shard using embedded SOLR as a part of hadoop MR > /Spark jobs and copy all index files for every shard into backup repository. > > 2. The New API should be able to import the index from backup repository > location into each shard on SOLR Cloud. The API would handle things like > marking the collection as read-only, trigger replication etc. along the lines > of what the 'RESTORE' API currently does. > > The new API should be able to support relevant parameters from Restore API ( > location & repository ) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org