Hoss Man created SOLR-6939:
------------------------------

             Summary: UpdateProcessor to buffer & sample documents and then 
batch create neccessary fields
                 Key: SOLR-6939
                 URL: https://issues.apache.org/jira/browse/SOLR-6939
             Project: Solr
          Issue Type: Improvement
            Reporter: Hoss Man


spun off of an idea in SOLR-6016...

{quote}
bq. We could add a SchemaGeneratorHandler which would generate the "best" 
schema.

You wouldn't need/want a handler for this – you'd just need an 
UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that would 
look at the datatypes of the fields in each document w/o doing any indexing and 
pick the least common denominator.

So then you'd have a chain with all of your normal update processors including 
the TypeMapping processors configured with the preccedence orders and locales 
and format strings you want – and at the end you'd have your 
BestFitScheamGeneratorUpdateProcessorFactory that would look at all those docs, 
study their values, and throw them away – until a commit comes along, at which 
point it does all the under the hood schema field addition calls.

So to learn, you'd send docs using whatever handler/format you wnat (json, xml, 
extraction, etc...) with an update.chain=my.datatype.learning.processor.chain 
request param ... and once you've sent a bunch and giving it a lot of variety 
to see, then you send a commit so it creates the schema and then you re-index 
your docs for real w/o that special chain.
{quote}

...not mentioned originally: this factory could also default to assuming fields 
should be single valued, unless/until it sees multiple values in a doc that it 
samples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to