[
https://issues.apache.org/jira/browse/SOLR-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041623#comment-16041623
]
Ishan Chattopadhyaya edited comment on SOLR-10574 at 6/7/17 9:14 PM:
---------------------------------------------------------------------
Apologies and a bit of an update on my radio silence. I had offline discussions
with [~noblepaul], [~hossman], [~shalinmangar].
There were various approaches that I was considering:
# The initParams based enabling/disabling mechanism for data driven nature.
Discarded this, considering Noble's concerns that initParams with
globbing/wildcards support is a risky tool for user to shoot himself on the
foot (if he gets the wildcards wrong), and hence it is a possibility that we
may want to remove initParams support going forward.
# Trying to create the chain programmatically was not easy, since the
AddSchemaFieldsUpdateProcessorFactory needs field type names as defined in the
managed-schema/schema.xml. Hence, if the chain is created programmatically, the
user would not be able to switch them to point fields instead of trie fields or
vice versa for example.
# Letting the user enable/disable the data driven nature by adding
"update.chain=add-unknown-fields-to-the-schema" to every paramset in
ImplicitPlugins.json and then letting the user use the config API to update the
"update.chain" parameter's value for enabling/disabling. This approach exposed
too much of the internals like "update chain" and the name of the chain etc. in
the command to enable/disable data driven nature and hence potentially
confusing.
A very important consideration in setting up this enable/disable data driven
feature was that if we are going to use the "add-unknown-fields-to-schema"
update chain exactly as it is defined in data-driven-schema-configs as of
today, then it would be impossible for the user to modify the update chain (or
parts of the chain) using the config API, as the config API cannot edit URPs
that are within an update chain, and also it doesn't support creating/editing
update chains.
So, the solution (as in the patch) was to break out the individual URPs in the
add-unknown-fields-to-the-schema chain into top level named URPs (hence they
would be editable using config APIs) and creating a chain using those named
URPs that is functionally similar. There is a nice, not well documented,
default=true|false attribute for update chains that has been (and should have
been all along) used to enable/disable the data driven nature (based on a
variable).
So, *TLDR*; check out the new {{_default}} configset in the patch. It has data
driven nature enabled by default. The data driven nature can be
enabled/disabled using the following:
{code}
Disable schemaless/data driven nature:
curl http://host:8983/solr/coll1/config -d '{"set-user-property":
{"update.autoCreateFields":"false"}}'
Enable schemaless/data driven nature:
curl http://host:8983/solr/coll1/config -d '{"set-user-property":
{"update.autoCreateFields":"true"}}'
{code}
Would appreciate a review.
Note: the patch contains only the new default configset. However, we also need
to remove the existing data_driven_schema_configs and basic_configs and update
the script. Also, I haven't consolidated the managed-schema differences between
basic_configs and data_driven_schema_configs into this {{_default}} configset
yet.
was (Author: ichattopadhyaya):
Apologies and a bit of an update on my radio silence. I had offline discussions
with [~noblepaul], [~hossman], [~shalinmangar].
There were various approaches that I was considering:
# The initParams based enabling/disabling mechanism for data driven nature.
Discarded this, considering Noble's concerns that initParams with
globbing/wildcards support is a risky tool for user to shoot himself on the
foot (if he gets the wildcards wrong), and hence it is a possibility that we
may want to remove initParams support going forward.
# Trying to create the chain programmatically was not easy, since the
AddSchemaFieldsUpdateProcessorFactory needs field type names as defined in the
managed-schema/schema.xml. Hence, if the chain is created programmatically, the
user would not be able to switch them to point fields instead of trie fields or
vice versa for example.
# Letting the user enable/disable the data driven nature by adding
"update.chain=add-unknown-fields-to-the-schema" to every paramset in
ImplicitPlugins.json and then letting the user use the config API to update the
"update.chain" parameter's value for enabling/disabling. This approach exposed
too much of the internals like "update chain" and the name of the chain etc. in
the command to enable/disable data driven nature and hence potentially
confusing.
A very important consideration in setting up this enable/disable data driven
feature was that if we are going to use the "add-unknown-fields-to-schema"
update chain exactly as it is defined in data-driven-schema-configs as of
today, then it would be impossible for the user to modify the update chain (or
parts of the chain) using the config API, as the config API cannot edit URPs
that are within an update chain, and also it doesn't support creating/editing
update chains.
So, the solution (as in the patch) was to break out the individual URPs in the
add-unknown-fields-to-the-schema chain into top level named URPs (hence they
would be editable using config APIs) and creating a chain using those named
URPs that is functionally similar. There is a nice, not well documented,
default=true|false attribute for update chains that has been (and should have
been all along) used to enable/disable the data driven nature (based on a
variable).
So, *TLDR*; check out the new {{_default}} configset in the patch. It has data
driven nature enabled by default. The data driven nature can be
enabled/disabled using the following:
{code}
Disable schemaless/data driven nature:
curl http://localhost:8983/solr/mycollection/config -d '{"set-user-property":
{"update.autoCreateFields":"false"}}'
Enable schemaless/data driven nature:
curl http://localhost:8983/solr/mycollection/config -d '{"set-user-property":
{"update.autoCreateFields":"true"}}'
{code}
Would appreciate a review.
Note: the patch contains only the new default configset. However, we also need
to remove the existing data_driven_schema_configs and basic_configs and update
the script. Also, I haven't consolidated the managed-schema differences between
basic_configs and data_driven_schema_configs into this {{_default}} configset
yet.
> Choose a default configset for Solr 7
> -------------------------------------
>
> Key: SOLR-10574
> URL: https://issues.apache.org/jira/browse/SOLR-10574
> Project: Solr
> Issue Type: Task
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Ishan Chattopadhyaya
> Assignee: Ishan Chattopadhyaya
> Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-10574.patch
>
>
> Currently, the data_driven_schema_configs is the default configset when
> collections are created using the bin/solr script and no configset is
> specified.
> However, that may not be the best choice. We need to decide which is the best
> choice, out of the box, considering many users might create collections
> without knowing about the concept of a configset going forward.
> (See also SOLR-10272)
> Proposed changes:
> # Lets deprecate what we know as data_driven_schema_configs
> # Build a "toggleable" data driven functionality into the basic_configs
> configset (and make it the default)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]