[jira] [Created] (SOLR-16813) PackageLoader not copying manifest.json file to filestore

2023-05-22 Thread Will White (Jira)
Will White created SOLR-16813:
-

 Summary: PackageLoader not copying manifest.json file to filestore 
 Key: SOLR-16813
 URL: https://issues.apache.org/jira/browse/SOLR-16813
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Package Manager
Affects Versions: 9.2.1, 9.2
Reporter: Will White


When a new Solr node enters the cluster and the package management system 
recreates the package from the source node, the filestore location is set up 
and the JAR files are copied across correctly, but the {{manifest.json}} file 
isn't (and the {{.manifest.json.json}} file isn't generated). This means that 
when trying to interact with the package (such as {{bin/solr package 
list-installed}}) the command errors out because of the missing expected file.

This can be reproduced by:
* Setup Zookeeper cluster
* Create two solr nodes ("solr1" and "solr2") in cluster
* Install a package on solr1
** {{bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}
** {{bin/solr package install data-import-handler}}
** {{find /var/solr/data/filestore}} ({{manifest.json}} and 
{{.manifest.json.json}} should exist)
** {{bin/solr package list-installed}}
* At this point, the package exists as expected on solr2
* Stop solr1 and solr2
* Restart the cluster, now with an additional solr3 node
* solr3 will have the JAR files and their {{.json}} files, but the 
{{manifest.json}} file won't be present
** Running {{bin/solr package list-installed}} will throw an exception




The difference appears to be that the 
{{org.apache.solr.packagemanager.RepositoryManager::installPackage}} method 
contains the following lines:
{code:java}
// org/apache/solr/packagemanager/RepositoryManager.java#182
if (release.manifest == null) {
  String manifestJson = PackageUtils.getFileFromJarsAsString(downloaded, 
"manifest.json");
  if (manifestJson == null) {
throw new SolrException(ErrorCode.NOT_FOUND, "No manifest found for 
package: " + packageName + ", version: " + version);
  }
  release.manifest = getMapper().readValue(manifestJson, 
SolrPackage.Manifest.class);
}
String manifestJson = getMapper().writeValueAsString(release.manifest);
// We go on to write this file later
{code}

But in the equivalent space in 
{{org.apache.solr.pkg.SolrPackageLoader.Version()}}, the equivalent section to 
fetch the JAR files if missing doesn't reference the version.manifest anywhere:
{code:java}
// org/apache/solr/pkg/SolrPackageLoader.java#276
coreContainer.getPackageStoreAPI().validateFiles(version.files, true, s -> 
errs.add(s));
if (!errs.isEmpty()) {
  ...
}
for (String file : version.files) {
  
paths.add(coreContainer.getPackageStoreAPI().getPackageStore().getRealpath(file));
}
{code}

Something like [this 
commit|https://github.com/apache/solr/commit/9fc5253eef573dc5e53edf52d98d4a2e55e00e1e]
 solves the problem by manually copying the file across, but I'm not sure if 
this is the best approach to take.

I'm happy to put a full PR together, although I'd appreciate it if someone 
could confirm whether the {{manifest.json}} file should be transferred from the 
source node like this (and can be treated as mandatory), or if I've 
misunderstood how this should be working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16813) PackageLoader not copying manifest.json file to filestore

2023-05-22 Thread Will White (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725017#comment-17725017
 ] 

Will White commented on SOLR-16813:
---

Brief discussion in -IRC- Slack: 
https://apachesolr.slack.com/archives/C01GVPZSSK0/p1684680323785689

Tossed a coin on priority, raised as Major because without manually copying the 
{{manifest.json}} file across to the new node, all commands to the new package 
break (which I've interpreted as a Major issue on the assumption of how the 
package manager might be used in a clustered environment). Happy to downgrade 
to a Minor if that's a less common use case than I've supposed.

> PackageLoader not copying manifest.json file to filestore 
> --
>
> Key: SOLR-16813
> URL: https://issues.apache.org/jira/browse/SOLR-16813
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Package Manager
>Affects Versions: 9.2, 9.2.1
>Reporter: Will White
>Priority: Major
>
> When a new Solr node enters the cluster and the package management system 
> recreates the package from the source node, the filestore location is set up 
> and the JAR files are copied across correctly, but the {{manifest.json}} file 
> isn't (and the {{.manifest.json.json}} file isn't generated). This means that 
> when trying to interact with the package (such as {{bin/solr package 
> list-installed}}) the command errors out because of the missing expected file.
> This can be reproduced by:
> * Setup Zookeeper cluster
> * Create two solr nodes ("solr1" and "solr2") in cluster
> * Install a package on solr1
> ** {{bin/solr package add-repo data-import-handler 
> "https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}
> ** {{bin/solr package install data-import-handler}}
> ** {{find /var/solr/data/filestore}} ({{manifest.json}} and 
> {{.manifest.json.json}} should exist)
> ** {{bin/solr package list-installed}}
> * At this point, the package exists as expected on solr2
> * Stop solr1 and solr2
> * Restart the cluster, now with an additional solr3 node
> * solr3 will have the JAR files and their {{.json}} files, but the 
> {{manifest.json}} file won't be present
> ** Running {{bin/solr package list-installed}} will throw an exception
> 
> The difference appears to be that the 
> {{org.apache.solr.packagemanager.RepositoryManager::installPackage}} method 
> contains the following lines:
> {code:java}
> // org/apache/solr/packagemanager/RepositoryManager.java#182
> if (release.manifest == null) {
>   String manifestJson = PackageUtils.getFileFromJarsAsString(downloaded, 
> "manifest.json");
>   if (manifestJson == null) {
> throw new SolrException(ErrorCode.NOT_FOUND, "No manifest found for 
> package: " + packageName + ", version: " + version);
>   }
>   release.manifest = getMapper().readValue(manifestJson, 
> SolrPackage.Manifest.class);
> }
> String manifestJson = getMapper().writeValueAsString(release.manifest);
> // We go on to write this file later
> {code}
> But in the equivalent space in 
> {{org.apache.solr.pkg.SolrPackageLoader.Version()}}, the equivalent section 
> to fetch the JAR files if missing doesn't reference the version.manifest 
> anywhere:
> {code:java}
> // org/apache/solr/pkg/SolrPackageLoader.java#276
> coreContainer.getPackageStoreAPI().validateFiles(version.files, true, s -> 
> errs.add(s));
> if (!errs.isEmpty()) {
>   ...
> }
> for (String file : version.files) {
>   
> paths.add(coreContainer.getPackageStoreAPI().getPackageStore().getRealpath(file));
> }
> {code}
> Something like [this 
> commit|https://github.com/apache/solr/commit/9fc5253eef573dc5e53edf52d98d4a2e55e00e1e]
>  solves the problem by manually copying the file across, but I'm not sure if 
> this is the best approach to take.
> I'm happy to put a full PR together, although I'd appreciate it if someone 
> could confirm whether the {{manifest.json}} file should be transferred from 
> the source node like this (and can be treated as mandatory), or if I've 
> misunderstood how this should be working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16814) Package tool 'add-repo' command fails on mime type handling inside Docker image for text/plain sources

2023-05-22 Thread Will White (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will White updated SOLR-16814:
--
Description: 
When building a Docker image from {{main}} the mime type handling no longer 
handles application/json presented as text/plain when it previously did so (as 
of 9.2).

Cause of change: https://github.com/apache/solr/pull/1575

Reproduction steps:
* Checkout Solr to {{main}}
* Build local image via {{./gradlew docker}}
* Run this image in cloud mode, including {{-Denable.packages=true}}
* Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
* Run {{bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}

This will attempt to pull down the repository.json file (which 
githubusercontent provides as text/plain), and it will throw an error.

Error message:
{code:bash}
/opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";

Found 1 Solr nodes: 

Solr process 14 running on port 8983
com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
 Expected mime type in [application/json] but got text/plain.
{code}

  was:
When building a Docker image from {main} the mime type handling no longer 
handles application/json presented as text/plain when it previously did so (as 
of 9.2).

Cause of change: https://github.com/apache/solr/pull/1575

Reproduction steps:
* Checkout Solr to {{main}}
* Build local image via {{./gradlew docker}}
* Run this image in cloud mode, including {{-Denable.packages=true}}
* Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
* Run {{bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}

This will attempt to pull down the repository.json file (which 
githubusercontent provides as text/plain), and it will throw an error.

Error message:
{code:bash}
/opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";

Found 1 Solr nodes: 

Solr process 14 running on port 8983
com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
 Expected mime type in [application/json] but got text/plain.
{code}


> Package tool 'add-repo' command fails on mime type handling inside Docker 
> image for text/plain sources
> --
>
> Key: SOLR-16814
> URL: https://issues.apache.org/jira/browse/SOLR-16814
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Package Manager
>Affects Versions: main (10.0)
>Reporter: Will White
>Priority: Minor
>
> When building a Docker image from {{main}} the mime type handling no longer 
> handles application/json presented as text/plain when it previously did so 
> (as of 9.2).
> Cause of change: https://github.com/apache/solr/pull/1575
> Reproduction steps:
> * Checkout Solr to {{main}}
> * Build local image via {{./gradlew docker}}
> * Run this image in cloud mode, including {{-Denable.packages=true}}
> * Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
> * Run {{bin/solr package add-repo data-import-handler 
> "https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}
> This will attempt to pull down the repository.json file (which 
> githubusercontent provides as text/plain), and it will throw an error.
> Error message:
> {code:bash}
> /opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
> "https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";
> Found 1 Solr nodes: 
> Solr process 14 running on port 8983
> com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
> https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
>  Expected mime type in [application/json] but got text/plain.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-16814) Package tool 'add-repo' command fails on mime type handling inside Docker image for text/plain sources

2023-05-22 Thread Will White (Jira)
Will White created SOLR-16814:
-

 Summary: Package tool 'add-repo' command fails on mime type 
handling inside Docker image for text/plain sources
 Key: SOLR-16814
 URL: https://issues.apache.org/jira/browse/SOLR-16814
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Package Manager
Affects Versions: main (10.0)
Reporter: Will White


When building a Docker image from {main} the mime type handling no longer 
handles application/json presented as text/plain when it previously did so (as 
of 9.2).

Cause of change: https://github.com/apache/solr/pull/1575

Reproduction steps:
* Checkout Solr to {{main}}
* Build local image via {{./gradlew docker}}
* Run this image in cloud mode, including {{-Denable.packages=true}}
* Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
* Run {{bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}

This will attempt to pull down the repository.json file (which 
githubusercontent provides as text/plain), and it will throw an error.

Error message:
{code:bash}
/opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";

Found 1 Solr nodes: 

Solr process 14 running on port 8983
com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
 Expected mime type in [application/json] but got text/plain.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16814) Package tool 'add-repo' command fails on mime type handling inside Docker image for text/plain sources

2023-05-22 Thread Will White (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will White updated SOLR-16814:
--
Description: 
When building a Docker image from {{main}} the mime type handling no longer 
handles application/json presented as text/plain when it previously did so (as 
of 9.2).

Cause of change: https://github.com/apache/solr/pull/1575

Reproduction steps:
* Checkout Solr to {{main}}
* Build local image via {{./gradlew docker}}
* Run this image in cloud mode, including {{-Denable.packages=true}}
* Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
* Run {{bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}

This will attempt to pull down the repository.json file (which 
githubusercontent provides as text/plain), and it will throw an error.

Error message:
{code:bash}
/opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";

Found 1 Solr nodes: 

Solr process 14 running on port 8983
com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
 Expected mime type in [application/json] but got text/plain.
{code}

The contents of that file are valid JSON, but don't have the correct header 
type sent back.

This does correctly reduce the attack surface of the image (as SOLR-16752 
intends), but I've opened this ticket as a loss of behaviour which might impact 
SolrCloud users.

  was:
When building a Docker image from {{main}} the mime type handling no longer 
handles application/json presented as text/plain when it previously did so (as 
of 9.2).

Cause of change: https://github.com/apache/solr/pull/1575

Reproduction steps:
* Checkout Solr to {{main}}
* Build local image via {{./gradlew docker}}
* Run this image in cloud mode, including {{-Denable.packages=true}}
* Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
* Run {{bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}

This will attempt to pull down the repository.json file (which 
githubusercontent provides as text/plain), and it will throw an error.

Error message:
{code:bash}
/opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
"https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";

Found 1 Solr nodes: 

Solr process 14 running on port 8983
com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
 Expected mime type in [application/json] but got text/plain.
{code}


> Package tool 'add-repo' command fails on mime type handling inside Docker 
> image for text/plain sources
> --
>
> Key: SOLR-16814
> URL: https://issues.apache.org/jira/browse/SOLR-16814
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Package Manager
>Affects Versions: main (10.0)
>Reporter: Will White
>Priority: Minor
>
> When building a Docker image from {{main}} the mime type handling no longer 
> handles application/json presented as text/plain when it previously did so 
> (as of 9.2).
> Cause of change: https://github.com/apache/solr/pull/1575
> Reproduction steps:
> * Checkout Solr to {{main}}
> * Build local image via {{./gradlew docker}}
> * Run this image in cloud mode, including {{-Denable.packages=true}}
> * Open shell on container (e.g. {{docker compose exec NAME /bin/bash}})
> * Run {{bin/solr package add-repo data-import-handler 
> "https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo"}}
> This will attempt to pull down the repository.json file (which 
> githubusercontent provides as text/plain), and it will throw an error.
> Error message:
> {code:bash}
> /opt/solr-10.0.0-SNAPSHOT$ bin/solr package add-repo data-import-handler 
> "https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo";
> Found 1 Solr nodes: 
> Solr process 14 running on port 8983
> com.fasterxml.jackson.databind.JsonMappingException: Error from server at 
> https://raw.githubusercontent.com/searchscale/dataimporthandler/master/repo/repository.json?wt=json&version=2.2:
>  Expected mime type in [application/json] but got text/plain.
> {code}
> The contents of that file are valid JSON, but don't have the correct header 
> type sent back.
> This does correctly reduce the attack surface of the image (as SOLR-16752 
> intends), but I've o

[jira] [Created] (SOLR-16820) PackageUtils collection validation is more restrictive than CreateCollectionAPI allows

2023-05-25 Thread Will White (Jira)
Will White created SOLR-16820:
-

 Summary: PackageUtils collection validation is more restrictive 
than CreateCollectionAPI allows
 Key: SOLR-16820
 URL: https://issues.apache.org/jira/browse/SOLR-16820
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Package Manager
Reporter: Will White


It's possible to create a collection via the CreateCollectionAPI which [passes 
validation from the 
SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
 (a regex which includes the '.' character), but that same collection name 
won't then pass validation when deployed/undeployed via the PackageTool because 
of the [packagemanager.PackageUtils validateCollection() 
method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].

A change [like this, using the existing 
SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
 would bring the two validation steps back in line, although there's presumably 
a better approach.

*Potential risks*

As highlighted by Gus Heck [in this 
thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
changing the validation of collection names could be a risky change to make. 
The source of the PackageUtils regex appears to be 
[https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
the Lucene project, and it seems that the regex wasn't crafted for a specific 
subset of use cases that specifically excluded the '.' character - it just 
appears to be the regex implemented at the time.

Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
other than disallowing a collection name that begins with a '-' character, the 
{{SolrIdentifierValidator.identifierPattern}} would be a strict expansion of 
the allowed collection names for the {{PackageUtils.validateCollections}}. Any 
other solution (such as [this more naive 
example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
 could similarly mitigate a lot of the potential risk by only expanding the 
allowed collection names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16820) PackageUtils collection validation is more restrictive than CreateCollectionAPI allows

2023-05-25 Thread Will White (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will White updated SOLR-16820:
--
Description: 
It's possible to create a collection via the CreateCollectionAPI which [passes 
validation from the 
SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
 (a regex which among other elements includes the '.' character), but that same 
collection name won't then pass validation when deployed/undeployed via the 
PackageTool because of the [packagemanager.PackageUtils validateCollection() 
method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].

A change [like this, using the existing 
SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
 would bring the two validation steps back in line, although there's presumably 
a better approach.

*Potential risks*

As highlighted by Gus Heck [in this 
thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
changing the validation of collection names could be a risky change to make. 
The source of the PackageUtils regex appears to be 
[https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
the Lucene project, and it seems that the regex wasn't crafted for a specific 
subset of use cases that specifically excluded the '.' character - it just 
appears to be the regex implemented at the time.

Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
other than disallowing a collection name that begins with a '-' character, the 
{{SolrIdentifierValidator.identifierPattern}} would be a strict expansion of 
the allowed collection names for the {{{}PackageUtils.validateCollections{}}}. 
Any other solution (such as [this more naive 
example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
 could similarly mitigate a lot of the potential risk by only expanding the 
allowed collection names.

  was:
It's possible to create a collection via the CreateCollectionAPI which [passes 
validation from the 
SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
 (a regex which includes the '.' character), but that same collection name 
won't then pass validation when deployed/undeployed via the PackageTool because 
of the [packagemanager.PackageUtils validateCollection() 
method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].

A change [like this, using the existing 
SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
 would bring the two validation steps back in line, although there's presumably 
a better approach.

*Potential risks*

As highlighted by Gus Heck [in this 
thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
changing the validation of collection names could be a risky change to make. 
The source of the PackageUtils regex appears to be 
[https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
the Lucene project, and it seems that the regex wasn't crafted for a specific 
subset of use cases that specifically excluded the '.' character - it just 
appears to be the regex implemented at the time.

Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
other than disallowing a collection name that begins with a '-' character, the 
{{SolrIdentifierValidator.identifierPattern}} would be a strict expansion of 
the allowed collection names for the {{PackageUtils.validateCollections}}. Any 
other solution (such as [this more naive 
example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
 could similarly mitigate a lot of the potential risk by only expanding the 
allowed collection names.


> PackageUtils collection validation is more restrictive than 
> CreateCollectionAPI allows
> --
>
> Key: SOLR-16820
> URL: https://issues.apache.org/jira/browse/SOLR-16820
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Package Manager
>Reporter: Will White
>Priority: Minor
>  Labels: packagemanager
>
> It's possible to create a collection via the CreateCollectionAPI which 
> [passes validation from the 
> SolrIdentifierValidation|https://github.com/apache/solr/blob/main/so

[jira] [Commented] (SOLR-16820) PackageUtils collection validation is more restrictive than CreateCollectionAPI allows

2023-05-30 Thread Will White (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727656#comment-17727656
 ] 

Will White commented on SOLR-16820:
---

Sure thing [~gerlowskija], I'd be happy to (y)

It'll likely be Sunday at the earliest (busy week ahead!), hope that's ok 
time-wise.

> PackageUtils collection validation is more restrictive than 
> CreateCollectionAPI allows
> --
>
> Key: SOLR-16820
> URL: https://issues.apache.org/jira/browse/SOLR-16820
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Package Manager
>Reporter: Will White
>Priority: Minor
>  Labels: packagemanager
>
> It's possible to create a collection via the CreateCollectionAPI which 
> [passes validation from the 
> SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
>  (a regex which among other elements includes the '.' character), but that 
> same collection name won't then pass validation when deployed/undeployed via 
> the PackageTool because of the [packagemanager.PackageUtils 
> validateCollection() 
> method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].
> A change [like this, using the existing 
> SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
>  would bring the two validation steps back in line, although there's 
> presumably a better approach.
> *Potential risks*
> As highlighted by Gus Heck [in this 
> thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
> changing the validation of collection names could be a risky change to make. 
> The source of the PackageUtils regex appears to be 
> [https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
> the Lucene project, and it seems that the regex wasn't crafted for a specific 
> subset of use cases that specifically excluded the '.' character - it just 
> appears to be the regex implemented at the time.
> Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
> other than disallowing a collection name that begins with a '-' character, 
> the {{SolrIdentifierValidator.identifierPattern}} would be a strict expansion 
> of the allowed collection names for the 
> {{{}PackageUtils.validateCollections{}}}. Any other solution (such as [this 
> more naive 
> example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
>  could similarly mitigate a lot of the potential risk by only expanding the 
> allowed collection names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org