gerlowskija commented on code in PR #3047: URL: https://github.com/apache/solr/pull/3047#discussion_r1929150292
########## solr/api/src/java/org/apache/solr/client/api/endpoint/ClusterFileStoreApis.java: ########## @@ -70,4 +97,23 @@ SolrJerseyResponse deleteFile( "Indicates whether the deletion should only be done on the receiving node. For internal use only") @QueryParam("localDelete") Boolean localDelete); + + @POST + @Operation( + summary = + "Pushes a file to other nodes, or pulls a file from other nodes in the Solr cluster.", Review Comment: (This ends up being moot since I take your "split up the endpoint" suggestion on L116) ########## solr/core/src/java/org/apache/solr/filestore/ClusterFileStore.java: ########## @@ -141,6 +151,103 @@ public UploadToFileStoreResponse uploadFile( return response; } + @Override + @PermissionName(PermissionNameProvider.Name.FILESTORE_READ_PERM) + public SolrJerseyResponse getFile(String path) { + final var response = instantiateJerseyResponse(SolrJerseyResponse.class); + attachFileToResponse(path, fileStore, req, rsp); + return response; + } + + @Override + @PermissionName(PermissionNameProvider.Name.FILESTORE_READ_PERM) + public FileStoreDirectoryListingResponse getMetadata(String path) { + if (path == null) { + path = ""; + } + FileStore.FileType type = fileStore.getType(path, false); + return getMetadata(type, path, fileStore); + } + + public static void attachFileToResponse( + String path, FileStore fileStore, SolrQueryRequest req, SolrQueryResponse rsp) { + ModifiableSolrParams solrParams = new ModifiableSolrParams(); + solrParams.add(CommonParams.WT, FILE_STREAM); + req.setParams(SolrParams.wrapDefaults(solrParams, req.getParams())); + rsp.add( + CONTENT, + (SolrCore.RawWriter) + os -> + fileStore.get( + path, + it -> { + try { + InputStream inputStream = it.getInputStream(); + if (inputStream != null) { Review Comment: I'm not sure honestly. I could imagine it being null in some testing scenarios (a mock-based test, or something that runs with less than the full cluster like SolrCloudTestCase-based tests establishes) But these lines aren't new to this PR, they're just moved from [NodeFileStore](https://github.com/apache/solr/pull/3047/files#diff-659576ead9b46b2a7b9719a94417add657c7dc8e01b39cc416a3a6cfb6922ffcL148), so I'll probably avoid trying to tweak them too much here... ########## solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java: ########## @@ -507,7 +503,8 @@ public void delete(String path) { final var solrParams = new ModifiableSolrParams(); solrParams.add("localDelete", "true"); - final var solrRequest = new GenericSolrRequest(DELETE, "/cluster/files" + path, solrParams); + final var solrRequest = + new GenericSolrRequest(DELETE, "/cluster/filestore/files" + path, solrParams); Review Comment: Huh - I wonder why I didn't change this at the time? Anyway, fixed. ########## solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java: ########## @@ -189,26 +187,32 @@ private boolean fetchFileFromNodeAndPersist(String fromNode) { var solrClient = coreContainer.getDefaultHttpSolrClient(); try { - GenericSolrRequest request = new GenericSolrRequest(GET, "/node/files" + getMetaPath()); - request.setResponseParser(new InputStreamResponseParser(null)); - var response = solrClient.requestWithBaseUrl(baseUrl, request::process).getResponse(); - is = (InputStream) response.get("stream"); + final var metadataRequest = new FileStoreApi.GetFile(getMetaPath()); + final var client = coreContainer.getSolrClientCache().getHttpSolrClient(baseUrl); + final var response = metadataRequest.process(client); metadata = - Utils.newBytesConsumer((int) MAX_PKG_SIZE).accept((InputStream) response.get("stream")); + Utils.newBytesConsumer((int) MAX_PKG_SIZE) + .accept(response.getResponseStreamIfSuccessful()); m = (Map<?, ?>) Utils.fromJSON(metadata.array(), metadata.arrayOffset(), metadata.limit()); - } catch (SolrServerException | IOException e) { + } catch (Exception e) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Error fetching metadata", e); } finally { org.apache.solr.common.util.IOUtils.closeQuietly(is); } + ByteBuffer filedata = null; + try { + final var fileRequest = new FileStoreApi.GetFile(path); + final var client = coreContainer.getSolrClientCache().getHttpSolrClient(baseUrl); Review Comment: Reverted to `requestWithBaseUrl`. I'm a little confused about SolrClientCache going away, or only being suitable for streaming expressions though. (Perhaps I'm forgetting some context I previously had on this?) Anyway, I followed up with some questions on SOLR-17630 and we can continue that aspect of discussion there... ########## solr/core/src/java/org/apache/solr/filestore/NodeFileStore.java: ########## @@ -76,135 +66,54 @@ public SolrJerseyResponse getFile(String path, Boolean sync, String getFrom, Boo final var response = instantiateJerseyResponse(SolrJerseyResponse.class); if (Boolean.TRUE.equals(sync)) { - try { - fileStore.syncToAllNodes(path); - return response; - } catch (IOException e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error getting file ", e); - } + ClusterFileStore.syncToAllNodes(fileStore, path); Review Comment: Yeah, the name conflicts are unfortunate. On the bright side - I think we can probably make "NodeFileStore" and "NodeFileStoreApi" go away as part of this PR. (Because with these APIs changes, all of our filestore endpoints will be under the "/cluster/filestore" path). That would just leave: - `ClusterFileStoreApi` - annotated interface defining our filestore APIs - `ClusterFileStore` - implementations of those APIs - `DistribFileStore` - internal implementation of various filestore operations (syncing, deleting, creating, etc.) ########## solr/core/src/java/org/apache/solr/filestore/NodeFileStore.java: ########## @@ -76,135 +66,54 @@ public SolrJerseyResponse getFile(String path, Boolean sync, String getFrom, Boo final var response = instantiateJerseyResponse(SolrJerseyResponse.class); if (Boolean.TRUE.equals(sync)) { - try { - fileStore.syncToAllNodes(path); - return response; - } catch (IOException e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error getting file ", e); - } + ClusterFileStore.syncToAllNodes(fileStore, path); + return response; } if (path == null) { path = ""; } final var pathCopy = path; if (getFrom != null) { - coreContainer - .getUpdateShardHandler() - .getUpdateExecutor() - .submit( - () -> { - log.debug("Downloading file {}", pathCopy); - try { - fileStore.fetch(pathCopy, getFrom); - } catch (Exception e) { - log.error("Failed to download file: {}", pathCopy, e); - } - log.info("downloaded file: {}", pathCopy); - }); + ClusterFileStore.pullFileFromNode(coreContainer, fileStore, pathCopy, getFrom); return response; } FileStore.FileType type = fileStore.getType(path, false); - if (type == FileStore.FileType.NOFILE) { - final var fileMissingResponse = - instantiateJerseyResponse(FileStoreDirectoryListingResponse.class); - fileMissingResponse.files = Collections.singletonMap(path, null); - return fileMissingResponse; - } - if (type == FileStore.FileType.DIRECTORY) { - final var directoryListingResponse = - instantiateJerseyResponse(FileStoreDirectoryListingResponse.class); - final var directoryContents = - fileStore.list(path, null).stream() - .map(details -> convertToResponse(details)) - .collect(Collectors.toList()); - directoryListingResponse.files = Collections.singletonMap(path, directoryContents); - return directoryListingResponse; + if (type == FileStore.FileType.NOFILE Review Comment: Agreed. I think the reason this is finicky is that we cram a bunch of different cases into the "metadata" POJO. It's trying to represent too much. It gets used not just for file-metadata. but also directory listings, representing 404s, etc. A good refactor IMO would to: 1. create a `class DirectoryMetadata extends Metadata`, to use in the directory case. 2. Have the 404 case `throw new SolrException(ErrorCode.NOT_FOUND, "...");` That said - this code all pre-exists this PR, and I'm leery to wade into refactors for scope reasons. ########## solr/core/src/java/org/apache/solr/filestore/ClusterFileStore.java: ########## @@ -194,6 +301,48 @@ public SolrJerseyResponse deleteFile(String filePath, Boolean localDelete) { return response; } + @Override + @PermissionName(PermissionNameProvider.Name.FILESTORE_WRITE_PERM) + public SolrJerseyResponse executeFileStoreCommand(String path, String getFrom, Boolean sync) { + final var response = instantiateJerseyResponse(SolrJerseyResponse.class); + + if (Boolean.TRUE.equals(sync)) { + syncToAllNodes(fileStore, path); + } else if (getFrom != null) { + if (path == null) { + path = ""; + } + pullFileFromNode(coreContainer, fileStore, path, getFrom); + } + + return response; + } + + public static void syncToAllNodes(FileStore fileStore, String path) { + try { + fileStore.syncToAllNodes(path); + } catch (IOException e) { + throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error getting file ", e); Review Comment: True. But this code isn't new to this PR, it's just refactored from [here](https://github.com/apache/solr/pull/3047/files#diff-659576ead9b46b2a7b9719a94417add657c7dc8e01b39cc416a3a6cfb6922ffcL83). There's a lot of things I'd love to improve in the filestore, but I'm leery of detouring into those because it'd be a big expansion in scope (and isn't the primary aim of this PR). ########## solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java: ########## @@ -382,16 +380,14 @@ private void distribute(FileInfo info) { // trying to avoid the thundering herd problem when there are a very large no:of nodes Review Comment: Done ########## solr/core/src/java/org/apache/solr/filestore/ClusterFileStore.java: ########## @@ -194,6 +301,48 @@ public SolrJerseyResponse deleteFile(String filePath, Boolean localDelete) { return response; } + @Override + @PermissionName(PermissionNameProvider.Name.FILESTORE_WRITE_PERM) + public SolrJerseyResponse executeFileStoreCommand(String path, String getFrom, Boolean sync) { Review Comment: We're definitely trying to avoid it where-ever possible. But it's not always possible unfortunately. We have a [documented convention](https://github.com/apache/solr/blob/main/dev-docs/v2-api-conventions.adoc#exceptional-cases---command-apis) for handling these (hopefully rare) cases, which is pretty close to what this PR does. Open to coming up for a different pattern for these cases if you have a suggestion for what they might look like? ########## solr/api/src/java/org/apache/solr/client/api/endpoint/ClusterFileStoreApis.java: ########## @@ -70,4 +97,23 @@ SolrJerseyResponse deleteFile( "Indicates whether the deletion should only be done on the receiving node. For internal use only") @QueryParam("localDelete") Boolean localDelete); + + @POST + @Operation( + summary = + "Pushes a file to other nodes, or pulls a file from other nodes in the Solr cluster.", + tags = {"file-store"}) + @Path("/commands{path:.+}") + SolrJerseyResponse executeFileStoreCommand( + @Parameter(description = "Path to a file or directory within the filestore") + @PathParam("path") + String path, + @Parameter(description = "An optional Solr node name to fetch the file from") + @QueryParam("getFrom") + String getFrom, + @Parameter( + description = + "If true, triggers syncing for this file across all nodes in the filestore") Review Comment: Huh, I guess splitting these into separate endpoints is a little more in keeping with [the currently documented convention](https://github.com/apache/solr/blob/main/dev-docs/v2-api-conventions.adoc#exceptional-cases---command-apis). Done 👍 > do we somehow eliminate the need for this? In terms of eliminating the need entirely: I'm open to that if you have ideas? As I understand the current filestore impl, these two commands are essentially internal APIs that Solr relies on to ensure filestore entries are (eventually) present on all nodes. So we'd need some other way of doing that? Relatedly, it'd be really nice if we had some way to flag the "internal-ness" of these APIs. We want the code-generation to cover them, so that solr-core itself has nice classes to use. But ideally something could signal to end-users that they should never be using these APIs themselves. A different package name? Some sort of Javadoc tag? Maybe that's worth its own JIRA ticket to spur some brainstorming... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org