[ https://issues.apache.org/jira/browse/SOLR-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated SOLR-17052: -------------------------------------- Summary: SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy, buggy, and inefficient (was: SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy and should be inverted) > SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy, buggy, and > inefficient > ----------------------------------------------------------------------------------------- > > Key: SOLR-17052 > URL: https://issues.apache.org/jira/browse/SOLR-17052 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > > While getting familiar with the {{SolreCore + CodecFactory + > SchemaCodecFactory + FieldType}} related code relevant to SOLR-17045, > SOLR-17046, & SOLR-17047 It occurred to me that there is a lot of > ineffeciencies and kludginess to how {{FieldType}} based "codec overrides" > are used (and validated) by {{SchemaCodecFactory}} (and > {{{}SolrCore.initCodec{}}}) : > * {{SolrCore.initCodec}} needs to be aware of all the possible ways a > {{FieldType}} instance might support codec overrides > ** ... so it can fail if any are specified unless the {{CodecFactory > instanceOf SolrCoreAware}} > *** ... even though that still doesn't ensure the factory supports those > field type overrides > ** This validation currently just looks at {{getPostingsFormatForField}} & > {{getDocValuesFormatForField}} > *** ... it's ignorant about {{DenseVectorField}} 's assumptions about being > able to override aspects of the {{KnnVectorsFormat}} > *** ... and AFAICT, what validation is don't doesn't help if the Schema API > is used to add new field types (w/ {{postingsFormat}} or {{docValuesFormat}} > overrides) > * in all of the the {{SchemaCodecFactory}} "per-field" methods > ({{{}getPostingsFormatForField{}}}, {{{}getDocValuesFormatForField{}}}, & > {{{}getKnnVectorsFormatForField{}}}) ... > ** ... every call to these methods resolves a {{SchemaField}} instance – > even though only the (Solr) {{FieldType}} is needed > *** Asking the {{IndexSchema}} for the {{SchemaField}} of a fieldName has > more overhead then just asking for the {{FieldType}} > *** None of the things these methods care about can be configured on a > per-fieldName bassis anyway. > ** For {{PostingsFormat}} and {{{}DocValuesFormat{}}}, every call to these > methods repeats the SPI lookup on the "format name" configured on the > {{FieldType}} instance > ** For {{KnnVectorsFormat}} every call to this method constructs a new > {{SolrDelegatingKnnVectorsFormat}} – even though the same instance could be > re-used for every field of the same {{FieldType}} instance. > * In {{FieldType}} ... > ** ... there is no validation anywhere that the {{postingsFormat}} or > {{docValuesFormat}} are valid > *** ... bogus values only cause a problem when the {{SchemaCodecFactory}} > tries to resolve them (when indexing) > * In {{DenseVectorField}} ... > ** ... {{checkSchemaField}} validates (and logs warnings) based on the > {{vectorEncoding}} and {{{}dimensions{}}}... > *** ... Even though these validations aren't "field" specific – they are > "type" specific, and could be validated in {{DenseVectorField.init()}} > ** BUT! ... there is no validation anywhere that the {{knnAlgorithm}} is > supported, or that the HNSW options make sense for it > *** These are only validated by the > {{Codec.getKnnVectorsFormatForField(...)}} impl provided by > {{SchemaCodecFactory}} ... > **** ... and they are redundenly validated on every call -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org