bharath-techie commented on issue #13188:
URL: https://github.com/apache/lucene/issues/13188#issuecomment-2075553304
Thanks for the comments @msfroh .
Good idea, if we want to supply `Dims` and `metric` values to
`DataCubesWriter` as part of `addDocument` flow and consume them similar to
other formats.
But there are some cons:
1. For adding an attribute to the field : (Lets take `IntField` for example )
The same `IntField` can be part of both dimension and metric ( in fact
multiple metrics ) as part of a `DataCubeField`. And same `IntField` can be
part of multiple `DataCubeField`.
2. If we solve the above, and supply values via `DataCubesWriter` for each
`DataCubeField`, there will be duplicate values depending on the configuration.
So in order to avoid the duplication of values , how about we derive the
values of `DataCubeField` from the original values of `DocValuesWriter` during
`flush` ?
### Flush
`IntField` values will be already part of `DocValuesWriter` , so we can
supply `DataCubesConsumer` and keep track of the resultant values.
1. During flush, in a new method `writeDataCubes`, we supply
`dataCubeDocValuesConsumer` to `docValuesWriter.flush`
```
// For all doc values fields
if(perField.docValuesWriter !=null) {
{
if (dataCubeDocValuesConsumer == null) {
// lazy init
DataCubesFormat fmt = state.segmentInfo.getCodec().dataCubesFormat();
dataCubeDocValuesConsumer = fmt.fieldsConsumer(state,
dataCubesConfig);
}
perField.docValuesWriter.flush(state, sortMap,
dataCubeDocValuesConsumer);
}
}
// This creates the dataCubes indices
dataCubeDocValuesConsumer.flush(dataCubesConfig);
```
`DocValuesWriter.flush` calls respective `addNumericField` ,
`addSortedSetField` in the supplied consumer.
2. Then in the `DataCubesDocValuesConsumer`, we keep track of the fields and
the associated doc values. And in flush we can make use of the `DocValues` for
each`DataCubeField`
```
public class DataCubeDocValuesConsumer extends DocValuesConsumer {
Map<String, NumericDocValues> numericDocValuesMap = new
ConcurrentHashMap<>();
Map<String, SortedSetDocValues> sortedSetDocValuesMap = new
ConcurrentHashMap<>();
@Override
public void addSortedSetField(FieldInfo field, DocValuesProducer
valuesProducer)
throws IOException {
sortedSetDocValuesMap.put(field.name,
valuesProducer.getSortedSet(field));
}
@Override
public void addNumericField(FieldInfo field, DocValuesProducer
valuesProducer)
throws IOException {
numericDocValuesMap.put(field.name, valuesProducer.getNumeric(field));
}
}
public void flush(DataCubesConfig dataCubesConfig) throws IOException {
for(DataCubeField field : config.getFields()) {
for(String dim : field.getDims()) {
// Get docValues from the map ( we can get a clone /
singleton )
// Custom implementation over docValuesIterator
}
for(String metric : field.getMetrics()) {
// Get docValues from the map
// Custom implementation over docValuesIterator
}
}
}
```
### Merge
During merge, we will most likely not need `DocValues` , instead `Merge`
will be for `DataCubeIndices` and associated structures.
POC
[code](https://github.com/bharath-techie/lucene/commit/d4455221becca86c039236f4a730066626207870#diff-24dc83bf177eafec2219cdc119f0eaae8c52da9a949973d8045fdef49a0de16e)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]