Re: what is SOLR syntax to remove duplicated documents

2023-10-23 Thread Thomas Corthals
Probably not very helpful for the original question, but for the sake of completeness: you can use the Lucene documentID with the Luke Request Handler. https://solr.apache.org/guide/solr/latest/indexing-guide/luke-request-handler.html You can not use it as a reliable identifier for your Solr docu

Re: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Mikhail Khludnev
You can find id terms repeating in an index via https://solr.apache.org/guide/solr/latest/query-guide/terms-component.html and terms.mincount=2 or do the same via facets q=*:*&facet=true&facet.field=id&facet.limit=-1&facet.mincount=2 (just on top of my head) Then you can query duplicated ids one by

Re: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Dmitri Maziuk
On 10/22/23 12:25, Gus Heck wrote: Echoing what Thomas says, this problem indicates your indexing system probably has a significant design flaw. For most systems, you should have a notion of document identity that is external to Solr, and that should be used as (or to deterministically generate)

Re: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Gus Heck
Echoing what Thomas says, this problem indicates your indexing system probably has a significant design flaw. For most systems, you should have a notion of document identity that is external to Solr, and that should be used as (or to deterministically generate) the id in Solr. If you don't do this

Re: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Thomas Corthals
e, or all fields etc? > > > > Sent from Mail for Windows > > > > From: Vince McMahon > > Sent: Sunday, October 22, 2023 3:22 PM > > To: users@solr.apache.org > > Subject: what is SOLR syntax to remove duplicated documents > > > > I have a SO

Re: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Vince McMahon
ds etc? > > Sent from Mail for Windows > > From: Vince McMahon > Sent: Sunday, October 22, 2023 3:22 PM > To: users@solr.apache.org > Subject: what is SOLR syntax to remove duplicated documents > > I have a SOLR 8.X. I suspect one of the core has duplicates and wants to

RE: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread ufuk yılmaz
When do you consider two documents are duplicates? When 1 field has the same value, when multiple fields have the same value, or all fields etc? Sent from Mail for Windows From: Vince McMahon Sent: Sunday, October 22, 2023 3:22 PM To: users@solr.apache.org Subject: what is SOLR syntax to remove

what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Vince McMahon
I have a SOLR 8.X. I suspect one of the core has duplicates and wants to remove the duplicated documents. Signature, as in the SOLR guide, is not implemented. https://solr.apache.org/guide/6_6/de-duplication.html in sql, a query without the use of a hash column will be liked: ;WITH CTE AS (