JervenBolleman commented on issue #13373:
URL: https://github.com/apache/lucene/issues/13373#issuecomment-2114206903
Hi @easyice, I am the original reporter on the mailing list.
As the code around indexing is a bit abstracted it might be hard to follow.
What I do have, is the index that failed merging it is however, 173 GB xz
compressed. I could use luke or a tool like that to extract more information
for the lucene team.
The fieldtype that we are indexing into is
```java
UNSTORED_POSITIONAL.setOmitNorms(true);
UNSTORED_POSITIONAL.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
UNSTORED_POSITIONAL.setStored(false);
UNSTORED_POSITIONAL.setTokenized(false);
UNSTORED_POSITIONAL.freeze();```
```
Then we add fields like so
```java
doc.add(new Field("type", value.toLowerCase(Locale.US), UNSTORED_POSITIONAL);
```
With over 1,177,800,000 documents in this index, all with the term
"positional" at least once in their documents.
On average there are three fields of this type in each document.
So to create local sample data I would just do ;)
```java
for (int i=0;i<2_000_000_000;i++){
{
Document doc = new Document();
doc.add(new Field("type", "number", UNSTORED_POSITIONAL);
if (i % 2 == 0} {
doc.add(new Field("type", "even", UNSTORED_POSITIONAL);
} else {
doc.add(new Field("type", "un-even", UNSTORED_POSITIONAL);
}
writer.addDocument(doc);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]