Preventing field data from being loaded into page cache

2023-10-20 Thread Justin Borromeo
Is there any way to keep field data files out of the operating system's page cache? We only use fdt for highlighting and don't need to keep it warm in memory. From what I understand, the operating system is in control of what files get loaded into the page cache. Does Lucene have any mechanisms to

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
thanks very much for this additional information, Marc! Am 20.10.23 um 20:30 schrieb Marc D'Mello: Just following up on Mike's comment: It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Yeah it was fixed a yea

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Marc D'Mello
Just following up on Mike's comment: > It used to be that the "doc values" based faceting did not support > arbitrary hierarchy, but I think that was fixed at some point. Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField supports hierarchical faceting, I think you just need to e

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
cool, thank you very much! Michael Am 20.10.23 um 15:44 schrieb Michael McCandless: You can use either the "doc values" implementation for facets (SortedSetDocValuesFacetField), or the "taxonomy" implementation (FacetField, in which case, yes, you need to create a TaxonomyWriter). It used to

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael McCandless
You can use either the "doc values" implementation for facets (SortedSetDocValuesFacetField), or the "taxonomy" implementation (FacetField, in which case, yes, you need to create a TaxonomyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
Hi Adrien Thank you very much for your feedback as well! I just replaced the StringField by KeywordField :-) Thanks Michael Am 20.10.23 um 14:13 schrieb Adrien Grand: FYI there is also KeywordField, which combines StringField and SortedSetDocValuesField. It supports filtering, sorting, facet

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
Hi Mike Thanks for your feedback! IIUC in order to have the actual advantages of Facets one has to "connect" it with a TaxonomyWriter FacetsConfig config = new FacetsConfig(); DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir); indexWriter.addDocument(config.build(taxoW

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Adrien Grand
FYI there is also KeywordField, which combines StringField and SortedSetDocValuesField. It supports filtering, sorting, faceting and retrieval. It's my go-to field for string values. Le ven. 20 oct. 2023, 12:20, Michael McCandless a écrit : > There are some differences. > > StringField is indexe

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael McCandless
There are some differences. StringField is indexed into the inverted index (postings) so you can do efficient filtering. You can also store in stored fields to retrieve. FacetField does everything StringField does (filtering, storing (maybe?)), but in addition it stores data for faceting. I.e.

When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of documents I currently use StringField, e.g. doc1.add(new StringField("category", "bo