Ishan Sri created LUCENE-8830:
---------------------------------
Summary: DefaultIndexingChain.getOrAddField method ignores
omitNorms from FieldType
Key: LUCENE-8830
URL: https://issues.apache.org/jira/browse/LUCENE-8830
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Affects Versions: 6.6.1
Reporter: Ishan Sri
Norms are being computed and written even when *omitNorms is set to true* in
the fieldTypes. I chased the issue and found that the method *getOrAddField*
tries to create a *FieldInfo* object in the 1st pass. By default this object
has omitNorms to false. The method sets the *indexOptions* as specified in the
fieldType on this newly created object but doesn't do the same for *omitNorms.*
This effectively overrides this flag which creates issues down the line.
Here's the code snippet for the method with the *fieldInfos.getOrAdd* call
{code:java}
private PerField getOrAddField(String name, IndexableFieldType fieldType,
boolean invert) {
// Make sure we have a PerField allocated
final int hashPos = name.hashCode() & hashMask;
PerField fp = fieldHash[hashPos];
while (fp != null && !fp.fieldInfo.name.equals(name)) {
fp = fp.next;
}
if (fp == null) {
// First time we are seeing this field in this segment
FieldInfo fi = fieldInfos.getOrAdd(name);
// Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at
the // initial IndexOptions to decide what arrays it must create). Then, we
also must // set it in PerField.invert to allow for later downgrading of the
index options:
fi.setIndexOptions(fieldType.indexOptions());
fp = new PerField(fi, invert);
... {code}
The *getOrAdd* method below instantiates a new object with omitNorms set to
false as the 4th parameter.
{code:java}
/** Create a new field, or return existing one. */
public FieldInfo getOrAdd(String name) {
FieldInfo fi = fieldInfo(name);
if (fi == null) {
// This field wasn't yet added to this in-RAM
// segment's FieldInfo, so now we get a global
// number for this field. If the field was seen
// before then we'll get the same name and number,
// else we'll allocate a new one:
final int fieldNumber = globalFieldNumbers.addOrGet(name, -1,
DocValuesType.NONE, 0, 0);
fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE,
DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
assert !byName.containsKey(fi.name);
globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name,
DocValuesType.NONE);
byName.put(fi.name, fi);
}
return fi;
}{code}
This will cause norms to always be computed which not only produces incorrect
scores but also impacts the disk usage if there are many documents with
multiple fields which have this flag set to true but ignored
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]