[
https://issues.apache.org/jira/browse/SOLR-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Potter updated SOLR-5653:
---------------------------------
Attachment: SOLR-5653.patch
Here is the first attempt at a solution for the RestManager and implementations
for managing stop words and synonyms via a REST API.
A few things to notice about this implementation:
1) The RestManager needs to be able to read/write data from/to ZooKeeper if in
cloud mode or the local FS if in standalone mode. This is the purpose of the
ManagedResourceStorage.StorageIO interface. The idea here is that the
RestManager receives the StorageIO in its constructor from the SolrCore during
initialization. Currently, this is done in the SolrCore object, which has to do
an instanceof on the SolrResourceLoader to determine if it is ZK aware. This is
a bit hacky but I didn't see a better way to determine if a core is running in
ZK mode from within the SolrCore object. Currently, I provide 3 implementations
of StorageIO: ZooKeeperStorageIO, FileStorageIO, and InMemoryStorageIO.
2) A ManagedResource should be able to choose its own storage format, with the
storageIO being determined by the container.
This gives the ManagedResource developer flexibility in how they store data
without having to fuss with knowing how load/store bytes to ZK or local FS.
Currently, the only provided storage format is JSON, see:
ManagedResourceStorage.JsonStorage.
3) I'm using a "registry" object that is available from the SolrResourceLoader
to capture Solr components that declare themselves as being "managed". This is
needed because parsing the solrconfig.xml may encounter managed components
before it parses and initializes the RestManager. Basically, I wanted to
separate the registration of managed components from the initialization of the
RestManager and those components as I didn't want to force the position of the
<restManager/> element in the solrconfig.xml to be before all other components.
4) The design is based around the concept that there may be many different Solr
components that share a single ManagedResource. For instance, there may be many
ManagedStopFilterFactory instances declared in schema.xml that share a common
set of managed English stop words. Thus, I'm using the "observer" pattern which
allows Solr components to register as an observer of a shared ManagedResource.
This way we don't end up with 10 different managers of the same stop word list.
5) ManagedResourceObserver instances are notified once during core
initialization (load or reload) when the managed data is available. This is
their signal to internalize the managed data, such as the
ManagedStopFilterFactory converting the managed set of terms into a
CharArraySet used for creating StopFilters. This is a critical part of the
design in that updates to the managed data are not applied until a core is
reloaded. This is to avoid having analysis components with different views of
managed data, i.e. we don't want some of the replicas for a shard to have a
different set of stop words than the other shards.
6) I've provided one concrete ManagedResource implementation for managing a
word set, which is useful for stop words and protected words
(KeywordMarkerFilter). This implementation shows how to handle initArgs and a
managedList of words.
Known Issues:
a. The current RestManager attaches its registered endpoints using SolrRestApi,
which is configured to process requests to /collection/schema. While this path
works for stop words and synonyms, it doesn't work in the general case of any
type of ManagedResource. We need to figure out a better path for which to
configure the RestManager, but re-working that should be minor.
b. I had to make a few things public in the BaseSchemaResource class and
extended the RestManager.ManagedEndpoint class from it. We should refactor
BaseSchemaResource into a BaseSolrResource as it has usefulness beyond schema
related resources.
c. Deletes - the ManagedResource framework supports deletes but I wasn't sure
how to enable them in Restlet; again probably a minor issue in the restlet
config / setup.
> Create a RESTManager to provide REST API endpoints for reconfigurable plugins
> -----------------------------------------------------------------------------
>
> Key: SOLR-5653
> URL: https://issues.apache.org/jira/browse/SOLR-5653
> Project: Solr
> Issue Type: Sub-task
> Reporter: Steve Rowe
> Attachments: SOLR-5653.patch
>
>
> It should be possible to reconfigure Solr plugins' resources and init params
> without directly editing the serialized schema or {{solrconfig.xml}} (see
> Hoss's arguments about this in the context of the schema, which also apply to
> {{solrconfig.xml}}, in the description of SOLR-4658)
> The RESTManager should allow plugins declared in either the schema or in
> {{solrconfig.xml}} to register one or more REST endpoints, one endpoint per
> reconfigurable resource, including init params. To allow for multiple plugin
> instances, registering plugins will need to provide a handle of some form to
> distinguish the instances.
> This RESTManager should also be able to create new instances of plugins that
> it has been configured to allow. The RESTManager will need its own
> serialized configuration to remember these plugin declarations.
> Example endpoints:
> * SynonymFilterFactory
> ** init params: {{/solr/collection1/config/syns/myinstance/options}}
> ** synonyms resource:
> {{/solr/collection1/config/syns/myinstance/synonyms-list}}
> * "/select" request handler
> ** init params: {{/solr/collection1/config/requestHandlers/select/options}}
> We should aim for full CRUD over init params and structured resources. The
> plugins will bear responsibility for handling resource modification requests,
> though we should provide utility methods to make this easy.
> However, since we won't be directly modifying the serialized schema and
> {{solrconfig.xml}}, anything configured in those two places can't be
> invalidated by configuration serialized elsewhere. As a result, it won't be
> possible to remove plugins declared in the serialized schema or
> {{solrconfig.xml}}. Similarly, any init params declared in either place
> won't be modifiable. Instead, there should be some form of init param that
> declares that the plugin is reconfigurable, maybe using something like
> "managed" - note that request handlers already provide a "handle" - the
> request handler name - and so don't need that to be separately specified:
> {code:xml}
> <requestHandler name="/select" class="solr.SearchHandler">
> <managed/>
> </requestHandler>
> {code}
> and in the serialized schema - a handle needs to be specified here:
> {code:xml}
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> ...
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" managed="english-synonyms"/>
> ...
> {code}
> All of the above examples use the existing plugin factory class names, but
> we'll have to create new RESTManager-aware classes to handle registration
> with RESTManager.
> Core/collection reloading should not be performed automatically when a REST
> API call is made to one of these RESTManager-mediated REST endpoints, since
> for batched config modifications, that could take way too long. But maybe
> reloading could be a query parameter to these REST API calls.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]