Hoss Man created SOLR-6939:
------------------------------
Summary: UpdateProcessor to buffer & sample documents and then
batch create neccessary fields
Key: SOLR-6939
URL: https://issues.apache.org/jira/browse/SOLR-6939
Project: Solr
Issue Type: Improvement
Reporter: Hoss Man
spun off of an idea in SOLR-6016...
{quote}
bq. We could add a SchemaGeneratorHandler which would generate the "best"
schema.
You wouldn't need/want a handler for this – you'd just need an
UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that would
look at the datatypes of the fields in each document w/o doing any indexing and
pick the least common denominator.
So then you'd have a chain with all of your normal update processors including
the TypeMapping processors configured with the preccedence orders and locales
and format strings you want – and at the end you'd have your
BestFitScheamGeneratorUpdateProcessorFactory that would look at all those docs,
study their values, and throw them away – until a commit comes along, at which
point it does all the under the hood schema field addition calls.
So to learn, you'd send docs using whatever handler/format you wnat (json, xml,
extraction, etc...) with an update.chain=my.datatype.learning.processor.chain
request param ... and once you've sent a bunch and giving it a lot of variety
to see, then you send a commit so it creates the schema and then you re-index
your docs for real w/o that special chain.
{quote}
...not mentioned originally: this factory could also default to assuming fields
should be single valued, unless/until it sees multiple values in a doc that it
samples.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]