On 3/19/2021 3:36 PM, gnandre wrote:
While performing atomic indexing, I run into an error which says 'unknown
field X' where X is not a field specified in the schema. It is a
discontinued field. After deleting that field from the schema, I have
restarted Solr but I have not re-indexed the content back, so the deleted
field data still might be there in Solr index.
The way I understand how atomic indexing works, it tries to index all
stored values again, but why is it trying to index stored value of a field
that does not exist in the schema?
Solr's Atomic Update feature works by grabbing the existing document,
all of it, performing the atomic update instructions on that document,
and then indexing the results as a new document. If the uniqueKey
feature is enabled (which would be required for Atomic Updates to work
properly), the old document is deleted as the new document is added. I
haven't looked at the code, but the existing fields are likely added to
the document that is being built all at once and without consulting the
schema. So if field X is in the document that's already in the index,
it will be in the new document too. If X is deleted from the schema,
you'll get the error you're getting.
It would be a fair amount of work to have Solr take the schema into
account for atomic updates. Not impossible, just slightly
time-consuming. I think we (the Solr developers) would want it to still
fail indexing in this situation, the failure would just happen at a
different place in the code than it does now, during atomic document
assembly. Fail earlier and faster.
What you'll need to for your circumstances is leave X in the schema, but
change it to a type that will be completely ignored on indexing.
Something like this:
<fieldType
name="ignored"
indexed="false"
stored="false"
docValues="false"
multiValued="true"
class="solr.StrField" />
You could then add the following to take care of any and all unknown fields:
<dynamicField name="*" type="ignored" multiValued="true" />
Or you could name individual fields like that, which I think would be a
better option than the wildcard dynamic field.
My source for the config snippets:
https://stackoverflow.com/questions/46509259/solr-7-managed-schema-how-to-ignore-unnamed-fields
Thanks,
Shawn