Re: [PR] SOLR-17351: Decompose filestore "get file" API [solr]

via GitHub Sun, 26 Jan 2025 05:35:14 -0800


gerlowskija commented on code in PR #3047:
URL: https://github.com/apache/solr/pull/3047#discussion_r1929150292



##########
solr/api/src/java/org/apache/solr/client/api/endpoint/ClusterFileStoreApis.java:
##########
@@ -70,4 +97,23 @@ SolrJerseyResponse deleteFile(
                   "Indicates whether the deletion should only be done on the 
receiving node.  For internal use only")
           @QueryParam("localDelete")
           Boolean localDelete);
+
+  @POST
+  @Operation(
+      summary =
+          "Pushes a file to other nodes, or pulls a file from other nodes in 
the Solr cluster.",

Review Comment:
   (This ends up being moot since I take your "split up the endpoint" 
suggestion on L116)



##########
solr/core/src/java/org/apache/solr/filestore/ClusterFileStore.java:
##########
@@ -141,6 +151,103 @@ public UploadToFileStoreResponse uploadFile(
     return response;
   }
 
+  @Override
+  @PermissionName(PermissionNameProvider.Name.FILESTORE_READ_PERM)
+  public SolrJerseyResponse getFile(String path) {
+    final var response = instantiateJerseyResponse(SolrJerseyResponse.class);
+    attachFileToResponse(path, fileStore, req, rsp);
+    return response;
+  }
+
+  @Override
+  @PermissionName(PermissionNameProvider.Name.FILESTORE_READ_PERM)
+  public FileStoreDirectoryListingResponse getMetadata(String path) {
+    if (path == null) {
+      path = "";
+    }
+    FileStore.FileType type = fileStore.getType(path, false);
+    return getMetadata(type, path, fileStore);
+  }
+
+  public static void attachFileToResponse(
+      String path, FileStore fileStore, SolrQueryRequest req, 
SolrQueryResponse rsp) {
+    ModifiableSolrParams solrParams = new ModifiableSolrParams();
+    solrParams.add(CommonParams.WT, FILE_STREAM);
+    req.setParams(SolrParams.wrapDefaults(solrParams, req.getParams()));
+    rsp.add(
+        CONTENT,
+        (SolrCore.RawWriter)
+            os ->
+                fileStore.get(
+                    path,
+                    it -> {
+                      try {
+                        InputStream inputStream = it.getInputStream();
+                        if (inputStream != null) {

Review Comment:
   I'm not sure honestly.  I could imagine it being null in some testing 
scenarios (a mock-based test, or something that runs with less than the full 
cluster like SolrCloudTestCase-based tests establishes)
   
   But these lines aren't new to this PR, they're just moved from 
[NodeFileStore](https://github.com/apache/solr/pull/3047/files#diff-659576ead9b46b2a7b9719a94417add657c7dc8e01b39cc416a3a6cfb6922ffcL148),
 so I'll probably avoid trying to tweak them too much here...



##########
solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java:
##########
@@ -507,7 +503,8 @@ public void delete(String path) {
 
     final var solrParams = new ModifiableSolrParams();
     solrParams.add("localDelete", "true");
-    final var solrRequest = new GenericSolrRequest(DELETE, "/cluster/files" + 
path, solrParams);
+    final var solrRequest =
+        new GenericSolrRequest(DELETE, "/cluster/filestore/files" + path, 
solrParams);

Review Comment:
   Huh - I wonder why I didn't change this at the time?  Anyway, fixed.



##########
solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java:
##########
@@ -189,26 +187,32 @@ private boolean fetchFileFromNodeAndPersist(String 
fromNode) {
       var solrClient = coreContainer.getDefaultHttpSolrClient();
 
       try {
-        GenericSolrRequest request = new GenericSolrRequest(GET, "/node/files" 
+ getMetaPath());
-        request.setResponseParser(new InputStreamResponseParser(null));
-        var response = solrClient.requestWithBaseUrl(baseUrl, 
request::process).getResponse();
-        is = (InputStream) response.get("stream");
+        final var metadataRequest = new FileStoreApi.GetFile(getMetaPath());
+        final var client = 
coreContainer.getSolrClientCache().getHttpSolrClient(baseUrl);
+        final var response = metadataRequest.process(client);
         metadata =
-            Utils.newBytesConsumer((int) MAX_PKG_SIZE).accept((InputStream) 
response.get("stream"));
+            Utils.newBytesConsumer((int) MAX_PKG_SIZE)
+                .accept(response.getResponseStreamIfSuccessful());
         m = (Map<?, ?>) Utils.fromJSON(metadata.array(), 
metadata.arrayOffset(), metadata.limit());
-      } catch (SolrServerException | IOException e) {
+      } catch (Exception e) {
         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Error 
fetching metadata", e);
       } finally {
         org.apache.solr.common.util.IOUtils.closeQuietly(is);
       }
 
+      ByteBuffer filedata = null;
+      try {
+        final var fileRequest = new FileStoreApi.GetFile(path);
+        final var client = 
coreContainer.getSolrClientCache().getHttpSolrClient(baseUrl);

Review Comment:
   Reverted to `requestWithBaseUrl`.
   
   I'm a little confused about SolrClientCache going away, or only being 
suitable for streaming expressions though.  (Perhaps I'm forgetting some 
context I previously had on this?)  Anyway, I followed up with some questions 
on SOLR-17630 and we can continue that aspect of discussion there...



##########
solr/core/src/java/org/apache/solr/filestore/NodeFileStore.java:
##########
@@ -76,135 +66,54 @@ public SolrJerseyResponse getFile(String path, Boolean 
sync, String getFrom, Boo
     final var response = instantiateJerseyResponse(SolrJerseyResponse.class);
 
     if (Boolean.TRUE.equals(sync)) {
-      try {
-        fileStore.syncToAllNodes(path);
-        return response;
-      } catch (IOException e) {
-        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error 
getting file ", e);
-      }
+      ClusterFileStore.syncToAllNodes(fileStore, path);

Review Comment:
   Yeah, the name conflicts are unfortunate.
   
   On the bright side - I think we can probably make "NodeFileStore" and 
"NodeFileStoreApi" go away as part of this PR.  (Because with these APIs 
changes, all of our filestore endpoints will be under the "/cluster/filestore" 
path).
   
   That would just leave:
   
   - `ClusterFileStoreApi` - annotated interface defining our filestore APIs
   - `ClusterFileStore` - implementations of those APIs
   - `DistribFileStore` - internal implementation of various filestore 
operations (syncing, deleting, creating, etc.)



##########
solr/core/src/java/org/apache/solr/filestore/NodeFileStore.java:
##########
@@ -76,135 +66,54 @@ public SolrJerseyResponse getFile(String path, Boolean 
sync, String getFrom, Boo
     final var response = instantiateJerseyResponse(SolrJerseyResponse.class);
 
     if (Boolean.TRUE.equals(sync)) {
-      try {
-        fileStore.syncToAllNodes(path);
-        return response;
-      } catch (IOException e) {
-        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error 
getting file ", e);
-      }
+      ClusterFileStore.syncToAllNodes(fileStore, path);
+      return response;
     }
 
     if (path == null) {
       path = "";
     }
     final var pathCopy = path;
     if (getFrom != null) {
-      coreContainer
-          .getUpdateShardHandler()
-          .getUpdateExecutor()
-          .submit(
-              () -> {
-                log.debug("Downloading file {}", pathCopy);
-                try {
-                  fileStore.fetch(pathCopy, getFrom);
-                } catch (Exception e) {
-                  log.error("Failed to download file: {}", pathCopy, e);
-                }
-                log.info("downloaded file: {}", pathCopy);
-              });
+      ClusterFileStore.pullFileFromNode(coreContainer, fileStore, pathCopy, 
getFrom);
       return response;
     }
 
     FileStore.FileType type = fileStore.getType(path, false);
-    if (type == FileStore.FileType.NOFILE) {
-      final var fileMissingResponse =
-          instantiateJerseyResponse(FileStoreDirectoryListingResponse.class);
-      fileMissingResponse.files = Collections.singletonMap(path, null);
-      return fileMissingResponse;
-    }
-    if (type == FileStore.FileType.DIRECTORY) {
-      final var directoryListingResponse =
-          instantiateJerseyResponse(FileStoreDirectoryListingResponse.class);
-      final var directoryContents =
-          fileStore.list(path, null).stream()
-              .map(details -> convertToResponse(details))
-              .collect(Collectors.toList());
-      directoryListingResponse.files = Collections.singletonMap(path, 
directoryContents);
-      return directoryListingResponse;
+    if (type == FileStore.FileType.NOFILE

Review Comment:
   Agreed.  I think the reason this is finicky is that we cram a bunch of 
different cases into the "metadata" POJO.  It's trying to represent too much.  
It gets used not just for file-metadata. but also directory listings, 
representing 404s, etc.
   
   A good refactor IMO would to:
   1. create a `class DirectoryMetadata extends Metadata`, to use in the 
directory case.
   2. Have the 404 case `throw new SolrException(ErrorCode.NOT_FOUND, "...");`
   
   That said - this code all pre-exists this PR, and I'm leery to wade into 
refactors for scope reasons.  



##########
solr/core/src/java/org/apache/solr/filestore/ClusterFileStore.java:
##########
@@ -194,6 +301,48 @@ public SolrJerseyResponse deleteFile(String filePath, 
Boolean localDelete) {
     return response;
   }
 
+  @Override
+  @PermissionName(PermissionNameProvider.Name.FILESTORE_WRITE_PERM)
+  public SolrJerseyResponse executeFileStoreCommand(String path, String 
getFrom, Boolean sync) {
+    final var response = instantiateJerseyResponse(SolrJerseyResponse.class);
+
+    if (Boolean.TRUE.equals(sync)) {
+      syncToAllNodes(fileStore, path);
+    } else if (getFrom != null) {
+      if (path == null) {
+        path = "";
+      }
+      pullFileFromNode(coreContainer, fileStore, path, getFrom);
+    }
+
+    return response;
+  }
+
+  public static void syncToAllNodes(FileStore fileStore, String path) {
+    try {
+      fileStore.syncToAllNodes(path);
+    } catch (IOException e) {
+      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error 
getting file ", e);

Review Comment:
   True.  But this code isn't new to this PR, it's just refactored from 
[here](https://github.com/apache/solr/pull/3047/files#diff-659576ead9b46b2a7b9719a94417add657c7dc8e01b39cc416a3a6cfb6922ffcL83).
   
   There's a lot of things I'd love to improve in the filestore, but I'm leery 
of detouring into those because it'd be a big expansion in scope (and isn't the 
primary aim of this PR).



##########
solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java:
##########
@@ -382,16 +380,14 @@ private void distribute(FileInfo info) {
           // trying to avoid the thundering herd problem when there are a very 
large no:of nodes

Review Comment:
   Done



##########
solr/core/src/java/org/apache/solr/filestore/ClusterFileStore.java:
##########
@@ -194,6 +301,48 @@ public SolrJerseyResponse deleteFile(String filePath, 
Boolean localDelete) {
     return response;
   }
 
+  @Override
+  @PermissionName(PermissionNameProvider.Name.FILESTORE_WRITE_PERM)
+  public SolrJerseyResponse executeFileStoreCommand(String path, String 
getFrom, Boolean sync) {

Review Comment:
   We're definitely trying to avoid it where-ever possible.  But it's not 
always possible unfortunately.  We have a [documented 
convention](https://github.com/apache/solr/blob/main/dev-docs/v2-api-conventions.adoc#exceptional-cases---command-apis)
 for handling these (hopefully rare) cases, which is pretty close to what this 
PR does.
   
   Open to coming up for a different pattern for these cases if you have a 
suggestion for what they might look like?



##########
solr/api/src/java/org/apache/solr/client/api/endpoint/ClusterFileStoreApis.java:
##########
@@ -70,4 +97,23 @@ SolrJerseyResponse deleteFile(
                   "Indicates whether the deletion should only be done on the 
receiving node.  For internal use only")
           @QueryParam("localDelete")
           Boolean localDelete);
+
+  @POST
+  @Operation(
+      summary =
+          "Pushes a file to other nodes, or pulls a file from other nodes in 
the Solr cluster.",
+      tags = {"file-store"})
+  @Path("/commands{path:.+}")
+  SolrJerseyResponse executeFileStoreCommand(
+      @Parameter(description = "Path to a file or directory within the 
filestore")
+          @PathParam("path")
+          String path,
+      @Parameter(description = "An optional Solr node name to fetch the file 
from")
+          @QueryParam("getFrom")
+          String getFrom,
+      @Parameter(
+              description =
+                  "If true, triggers syncing for this file across all nodes in 
the filestore")

Review Comment:
   Huh, I guess splitting these into separate endpoints is a little more in 
keeping with [the currently documented 
convention](https://github.com/apache/solr/blob/main/dev-docs/v2-api-conventions.adoc#exceptional-cases---command-apis).
  Done 👍 
   
   > do we somehow eliminate the need for this?
   
   In terms of eliminating the need entirely: I'm open to that if you have 
ideas?  As I understand the current filestore impl, these two commands are 
essentially internal APIs that Solr relies on to ensure filestore entries are 
(eventually) present on all nodes.  So we'd need some other way of doing that?
   
   Relatedly, it'd be really nice if we had some way to flag the 
"internal-ness" of these APIs.  We want the code-generation to cover them, so 
that solr-core itself has nice classes to use.  But ideally something could 
signal to end-users that they should never be using these APIs themselves.  A 
different package name?  Some sort of Javadoc tag?  Maybe that's worth its own 
JIRA ticket to spur some brainstorming...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17351: Decompose filestore "get file" API [solr]

Reply via email to