Re: [PR] Migrate schema designer to filestoreapi part deux [solr]

via GitHub Tue, 14 Jan 2025 21:42:07 -0800


dsmiley commented on code in PR #3031:
URL: https://github.com/apache/solr/pull/3031#discussion_r1915959864



##########
solr/core/src/java/org/apache/solr/filestore/FileStore.java:
##########
@@ -38,8 +42,13 @@ public interface FileStore {
    */
   void put(FileEntry fileEntry) throws IOException;
 
-  /** read file content from a given path */
-  void get(String path, Consumer<FileEntry> filecontent, boolean getMissing) 
throws IOException;
+  /**
+   * Read file content from a given path.
+   *
+   * <p>TODO: Is fetchMissing actually used? I don't see it being used, but 
the IDE doesn't flag it
+   * not being used!

Review Comment:
   Lucene/Solr devs use a "nocommit" comment for something like this.  
Precommit will fail so we remember to address it.  I also configure IntelliJ's 
"TODO" feature to consider "nocommit" and thus it shows in blood red (gets my 
attention)



##########
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java:
##########
@@ -494,62 +498,59 @@ protected void validateTypeChange(String configSet, 
SchemaField field, FieldType
           SolrException.ErrorCode.BAD_REQUEST,
           "Cannot change type of the _version_ field; it must be a plong.");
     }
-    List<SolrInputDocument> docs = getStoredSampleDocs(configSet);
+    List<SolrInputDocument> docs = retrieveSampleDocs(configSet);
     if (!docs.isEmpty()) {
       schemaSuggester.validateTypeChange(field, toType, docs);
     }
   }
 
   void deleteStoredSampleDocs(String configSet) {
-    try {
-      cloudClient().deleteByQuery(BLOB_STORE_ID, "id:" + configSet + 
"_sample/*", 10);
-    } catch (IOException | SolrServerException | SolrException exc) {
-      final String excStr = exc.toString();
-      log.warn("Failed to delete sample docs from blob store for {} due to: 
{}", configSet, excStr);
-    }
+    String path =
+        "blob" + "/" + configSet
+            + "_sample"; //  needs to be made unique to support multiple 
uploads.  Maybe hash the
+    // docs?
+    // why do I have to do this in two stages?
+    DistribFileStore.deleteZKFileEntry(cc.getZkController().getZkClient(), 
path);
+    cc.getFileStore().delete(path);
   }
 
+  // I don't like this guy just hanging out here to support retrieveSampleDocs.
+  List<SolrInputDocument> docs = Collections.emptyList();
+
   @SuppressWarnings("unchecked")
-  List<SolrInputDocument> getStoredSampleDocs(final String configSet) throws 
IOException {
-    List<SolrInputDocument> docs = null;
+  List<SolrInputDocument> retrieveSampleDocs(final String configSet) throws 
IOException {
 
-    final URI uri;
-    try {
-      uri =
-          collectionApiEndpoint(BLOB_STORE_ID, "blob", configSet + "_sample")
-              .setParameter(CommonParams.WT, "filestream")
-              .build();
-    } catch (URISyntaxException e) {
-      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
-    }
+    String path =
+        "blob" + "/" + configSet
+            + "_sample"; //  needs to be made unique to support multiple 
uploads.  Maybe hash the
+    // docs?
 
-    HttpGet httpGet = new HttpGet(uri);
     try {
-      HttpResponse entity =
-          ((CloudLegacySolrClient) 
cloudClient()).getHttpClient().execute(httpGet);
-      int statusCode = entity.getStatusLine().getStatusCode();
-      if (statusCode == HttpStatus.SC_OK) {
-        byte[] bytes = readAllBytes(() -> entity.getEntity().getContent());
-        if (bytes.length > 0) {
-          docs = (List<SolrInputDocument>) Utils.fromJavabin(bytes);
-        }
-      } else if (statusCode != HttpStatus.SC_NOT_FOUND) {
-        byte[] bytes = readAllBytes(() -> entity.getEntity().getContent());
-        throw new IOException(
-            "Failed to lookup stored docs for "
-                + configSet
-                + " due to: "
-                + new String(bytes, StandardCharsets.UTF_8));
-      } // else not found is ok
-    } finally {
-      httpGet.releaseConnection();
+      cc.getFileStore()
+          .get(
+              path,
+              entry -> {
+                try (InputStream is = entry.getInputStream()) {
+                  byte[] bytes = is.readAllBytes();
+                  if (bytes.length > 0) {
+                    docs = (List<SolrInputDocument>) Utils.fromJavabin(bytes);
+                  }
+                  // Do something with content...
+                } catch (IOException e) {
+                  log.error("Error reading file content", e);
+                }
+              },
+              true);
+    } catch (java.io.FileNotFoundException e) {

Review Comment:
   not sure why fully qualified



##########
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java:
##########
@@ -494,62 +498,59 @@ protected void validateTypeChange(String configSet, 
SchemaField field, FieldType
           SolrException.ErrorCode.BAD_REQUEST,
           "Cannot change type of the _version_ field; it must be a plong.");
     }
-    List<SolrInputDocument> docs = getStoredSampleDocs(configSet);
+    List<SolrInputDocument> docs = retrieveSampleDocs(configSet);
     if (!docs.isEmpty()) {
       schemaSuggester.validateTypeChange(field, toType, docs);
     }
   }
 
   void deleteStoredSampleDocs(String configSet) {
-    try {
-      cloudClient().deleteByQuery(BLOB_STORE_ID, "id:" + configSet + 
"_sample/*", 10);
-    } catch (IOException | SolrServerException | SolrException exc) {
-      final String excStr = exc.toString();
-      log.warn("Failed to delete sample docs from blob store for {} due to: 
{}", configSet, excStr);
-    }
+    String path =
+        "blob" + "/" + configSet
+            + "_sample"; //  needs to be made unique to support multiple 
uploads.  Maybe hash the
+    // docs?
+    // why do I have to do this in two stages?
+    DistribFileStore.deleteZKFileEntry(cc.getZkController().getZkClient(), 
path);
+    cc.getFileStore().delete(path);
   }
 
+  // I don't like this guy just hanging out here to support retrieveSampleDocs.
+  List<SolrInputDocument> docs = Collections.emptyList();
+
   @SuppressWarnings("unchecked")
-  List<SolrInputDocument> getStoredSampleDocs(final String configSet) throws 
IOException {
-    List<SolrInputDocument> docs = null;
+  List<SolrInputDocument> retrieveSampleDocs(final String configSet) throws 
IOException {
 
-    final URI uri;
-    try {
-      uri =
-          collectionApiEndpoint(BLOB_STORE_ID, "blob", configSet + "_sample")
-              .setParameter(CommonParams.WT, "filestream")
-              .build();
-    } catch (URISyntaxException e) {
-      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
-    }
+    String path =
+        "blob" + "/" + configSet
+            + "_sample"; //  needs to be made unique to support multiple 
uploads.  Maybe hash the
+    // docs?
 
-    HttpGet httpGet = new HttpGet(uri);
     try {
-      HttpResponse entity =
-          ((CloudLegacySolrClient) 
cloudClient()).getHttpClient().execute(httpGet);
-      int statusCode = entity.getStatusLine().getStatusCode();
-      if (statusCode == HttpStatus.SC_OK) {
-        byte[] bytes = readAllBytes(() -> entity.getEntity().getContent());
-        if (bytes.length > 0) {
-          docs = (List<SolrInputDocument>) Utils.fromJavabin(bytes);
-        }
-      } else if (statusCode != HttpStatus.SC_NOT_FOUND) {
-        byte[] bytes = readAllBytes(() -> entity.getEntity().getContent());
-        throw new IOException(
-            "Failed to lookup stored docs for "
-                + configSet
-                + " due to: "
-                + new String(bytes, StandardCharsets.UTF_8));
-      } // else not found is ok
-    } finally {
-      httpGet.releaseConnection();
+      cc.getFileStore()
+          .get(
+              path,
+              entry -> {
+                try (InputStream is = entry.getInputStream()) {
+                  byte[] bytes = is.readAllBytes();
+                  if (bytes.length > 0) {
+                    docs = (List<SolrInputDocument>) Utils.fromJavabin(bytes);
+                  }

Review Comment:
   I suggest a little comment `// TODO stream it; no byte array`



##########
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java:
##########
@@ -559,29 +560,12 @@ static byte[] readAllBytes(IOSupplier<InputStream> 
hasStream) throws IOException
     }
   }
 
-  protected void postDataToBlobStore(CloudSolrClient cloudClient, String 
blobName, byte[] bytes)
-      throws IOException {
-    final URI uri;
-    try {
-      uri = collectionApiEndpoint(BLOB_STORE_ID, "blob", blobName).build();
-    } catch (URISyntaxException e) {
-      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
-    }
+  protected void storeSampleDocs(String blobName, byte[] bytes) throws 
IOException {
+    String filePath = "blob" + "/" + blobName;
 
-    HttpPost httpPost = new HttpPost(uri);
-    try {
-      httpPost.setHeader("Content-Type", "application/octet-stream");
-      httpPost.setEntity(new ByteArrayEntity(bytes));
-      HttpResponse resp = ((CloudLegacySolrClient) 
cloudClient).getHttpClient().execute(httpPost);
-      int statusCode = resp.getStatusLine().getStatusCode();
-      if (statusCode != HttpStatus.SC_OK) {
-        throw new SolrException(
-            SolrException.ErrorCode.getErrorCode(statusCode),
-            EntityUtils.toString(resp.getEntity(), StandardCharsets.UTF_8));
-      }
-    } finally {
-      httpPost.releaseConnection();
-    }
+    FileStoreAPI.MetaData meta = ClusterFileStore._createJsonMetaData(bytes, 
null);

Review Comment:
   is the meta necessary in general for the FileStoreAPI -- 2 step?  Sad if so.



##########
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java:
##########
@@ -494,62 +498,59 @@ protected void validateTypeChange(String configSet, 
SchemaField field, FieldType
           SolrException.ErrorCode.BAD_REQUEST,
           "Cannot change type of the _version_ field; it must be a plong.");
     }
-    List<SolrInputDocument> docs = getStoredSampleDocs(configSet);
+    List<SolrInputDocument> docs = retrieveSampleDocs(configSet);
     if (!docs.isEmpty()) {
       schemaSuggester.validateTypeChange(field, toType, docs);
     }
   }
 
   void deleteStoredSampleDocs(String configSet) {
-    try {
-      cloudClient().deleteByQuery(BLOB_STORE_ID, "id:" + configSet + 
"_sample/*", 10);
-    } catch (IOException | SolrServerException | SolrException exc) {
-      final String excStr = exc.toString();
-      log.warn("Failed to delete sample docs from blob store for {} due to: 
{}", configSet, excStr);
-    }
+    String path =
+        "blob" + "/" + configSet
+            + "_sample"; //  needs to be made unique to support multiple 
uploads.  Maybe hash the
+    // docs?
+    // why do I have to do this in two stages?
+    DistribFileStore.deleteZKFileEntry(cc.getZkController().getZkClient(), 
path);
+    cc.getFileStore().delete(path);
   }
 
+  // I don't like this guy just hanging out here to support retrieveSampleDocs.
+  List<SolrInputDocument> docs = Collections.emptyList();
+
   @SuppressWarnings("unchecked")
-  List<SolrInputDocument> getStoredSampleDocs(final String configSet) throws 
IOException {
-    List<SolrInputDocument> docs = null;
+  List<SolrInputDocument> retrieveSampleDocs(final String configSet) throws 
IOException {
 
-    final URI uri;
-    try {
-      uri =
-          collectionApiEndpoint(BLOB_STORE_ID, "blob", configSet + "_sample")
-              .setParameter(CommonParams.WT, "filestream")
-              .build();
-    } catch (URISyntaxException e) {
-      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
-    }
+    String path =
+        "blob" + "/" + configSet

Review Comment:
   Definitely.  I don't even think we should use "blob" prefix.  Maybe 
"schema_designer" since this is for this tool.



##########
solr/core/src/java/org/apache/solr/handler/designer/SchemaDesignerConfigSetHelper.java:
##########
@@ -494,62 +498,59 @@ protected void validateTypeChange(String configSet, 
SchemaField field, FieldType
           SolrException.ErrorCode.BAD_REQUEST,
           "Cannot change type of the _version_ field; it must be a plong.");
     }
-    List<SolrInputDocument> docs = getStoredSampleDocs(configSet);
+    List<SolrInputDocument> docs = retrieveSampleDocs(configSet);
     if (!docs.isEmpty()) {
       schemaSuggester.validateTypeChange(field, toType, docs);
     }
   }
 
   void deleteStoredSampleDocs(String configSet) {
-    try {
-      cloudClient().deleteByQuery(BLOB_STORE_ID, "id:" + configSet + 
"_sample/*", 10);
-    } catch (IOException | SolrServerException | SolrException exc) {
-      final String excStr = exc.toString();
-      log.warn("Failed to delete sample docs from blob store for {} due to: 
{}", configSet, excStr);
-    }
+    String path =
+        "blob" + "/" + configSet
+            + "_sample"; //  needs to be made unique to support multiple 
uploads.  Maybe hash the
+    // docs?
+    // why do I have to do this in two stages?
+    DistribFileStore.deleteZKFileEntry(cc.getZkController().getZkClient(), 
path);
+    cc.getFileStore().delete(path);
   }
 
+  // I don't like this guy just hanging out here to support retrieveSampleDocs.
+  List<SolrInputDocument> docs = Collections.emptyList();
+
   @SuppressWarnings("unchecked")
-  List<SolrInputDocument> getStoredSampleDocs(final String configSet) throws 
IOException {
-    List<SolrInputDocument> docs = null;
+  List<SolrInputDocument> retrieveSampleDocs(final String configSet) throws 
IOException {
 
-    final URI uri;
-    try {
-      uri =
-          collectionApiEndpoint(BLOB_STORE_ID, "blob", configSet + "_sample")
-              .setParameter(CommonParams.WT, "filestream")
-              .build();
-    } catch (URISyntaxException e) {
-      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
-    }
+    String path =
+        "blob" + "/" + configSet

Review Comment:
   If the format of this data is "javabin", lets name the file with that suffix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] Migrate schema designer to filestoreapi part deux [solr]

Reply via email to