Using Lucene for technical documentation

2020-11-22 Thread Trevor Nicholls
Hello, I'd better begin by identifying myself as a newbie. I am investigating using Lucene as a search tool for a library of technical documents, much of which consists of pieces of source code and discussion of the content. The standard analyzer does an adequate job with normal text but st

RE: Using Lucene for technical documentation

2020-12-03 Thread Trevor Nicholls
://hoplahup.net/paul_pubs/AccessRetrievalAM.html). Please ask for the source code, it is old and built on Lucene 3.5 so would need quite some upgrade. On 23 Nov 2020, at 8:42, Trevor Nicholls wrote: > Hello, I'd better begin by identifying myself as a newbie. > > > > I am investiga

Bewildered by my search results, can anyone explain where I might be going wrong?

2021-06-21 Thread Trevor Nicholls
Sorry in advance for writing a small novel. Background: I am indexing and searching technical reference documents, so the standard language analyzers aren't appropriate. For example, the content needs to be indexed so that a search for total matches total value, total[value], and total(value),

RE: Bewildered by my search results, can anyone explain where I might be going wrong?

2021-06-21 Thread Trevor Nicholls
string. cheers T -Original Message- From: Trevor Nicholls Sent: Tuesday, 22 June 2021 08:10 To: java-user@lucene.apache.org Subject: Bewildered by my search results, can anyone explain where I might be going wrong? Sorry in advance for writing a small novel. Background: I am indexing and

find the collection of tokens matching a query in Lucene 8.6.3

2021-07-09 Thread Trevor Nicholls
Problem: I have indexed the filepath and the content of thousands of documents and can successfully query the index on the text to return a collection of filepaths. Now I need to create a collection of the tokens in the index which matched the query. I can see that there are solutions to a rela

RE: lucene 4.10.4 punctuation

2021-08-26 Thread Trevor Nicholls
Hi You want to write your own analyzer which does not lowercase terms and which splits terms at non-alpha or non-alphanumeric characters. You'd use the same analyzer for indexing and for searching. Thus when building the index S.O.S is indexed as the five terms S . O . S and if you search for S

Can an indexreader/indexsearcher survive index edits?

2021-09-22 Thread Trevor Nicholls
Hi Lucene 8.6.3 In a prototype application I build a Lucene index with a single process and query it with another. Every operation is a new process. When the data changes I simply recreate the index and future searches pick up the new index. Of course performance is sub-optimal. So I am

Need some guidance for multi-term synonym matching

2021-12-07 Thread Trevor Nicholls
I am using Lucene 8.6.3 in an application which searches a library of technical documentation. I have implemented synonym matching which works for single word replacements, but does not match when one of the synonyms has two or more words. My attempts to support multi-term synonyms are failing, and

synonym question

2022-03-14 Thread Trevor Nicholls
I have technical data which I am querying with Lucene; one of the features of the content is that a large number of technical terms may be written as multiple words or as a compound word. For example, ISOWEEK or ISO WEEK. Or SynonymFilter or synonym filter. I have a synonym table which includes

RE: synonym question

2022-03-14 Thread Trevor Nicholls
e.org Subject: Re: synonym question Hello, just a guess, have you tried escaping the space in your multi-word terms with backslash? isoweek,iso\ week Regards Bernd Am 14.03.22 um 15:54 schrieb Trevor Nicholls: > I have technical data which I am querying with Lucene; one of the > feat

RE: synonym question

2022-03-14 Thread Trevor Nicholls
Just to confirm, escaping the spaces in synonym table construction, query construction, or both, does not solve the problem. -Original Message- From: Trevor Nicholls Sent: Tuesday, 15 March 2022 05:02 To: java-user@lucene.apache.org Subject: RE: synonym question Hi, thanks for such a

Two issues with synonym phrase matching

2022-07-28 Thread Trevor Nicholls
I am indexing some technical documentation and have been trying to add synonym matching to the searches. Actually I am adding the synonyms at index time so that any synonyms match at search time. a. Simple synonyms (wordA = wordB) are working just fine. b. Multiple synonyms (wordA

RE: Question for SynonymQuery

2023-01-02 Thread Trevor Nicholls
Hi Anh The two links Michael shared relate to questions I asked when I was trying to get synonym matching with our application. I really do have multi-term synonym matching working at this point; there's always scope for improvement of course but with the hints suppled in those threads I was a

RE: Question for SynonymQuery

2023-01-03 Thread Trevor Nicholls
nuary 2023 09:55 To: java-user@lucene.apache.org Subject: Re: Question for SynonymQuery Hello Trevor. Can you help me better understand this approach? If we have a text "wifi router" and inject "internet device" at indexing time, terms reside at the same positions. How to avoid fals

Prioritising certain documents in the search results

2023-02-01 Thread Trevor Nicholls
Hi I'm currently using Lucene 8-6.3, and indexing a few thousand documents. Some of these documents need to be prioritised in the search results, but not by too much; e.g. an exact phrase match in a normal document still needs to top the rankings ahead of a priority document that just matches t

Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Trevor Nicholls
Sorry I apologize for this being a bit long and for explaining the problem at the very bottom after all the background, rather than starting with it at the top. I thought it was easier to explain like this, please bear with me! So I've indexed a library of technical documentation, and the index

RE: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Trevor Nicholls
ssing some point. But, can you highlight another query than one you search for? On Mon, Feb 20, 2023 at 5:07 PM Trevor Nicholls wrote: > Sorry I apologize for this being a bit long and for explaining the > problem at the very bottom after all the background, rather than > starting with it

RE: Highlighting query results, my method is too crude, but how to improve it?

2023-02-21 Thread Trevor Nicholls
/TestMatchHighlighter.java#L241-L269 If things work for you then no rush to switch over - it's yet another option to use. Dawid On Mon, Feb 20, 2023 at 4:21 PM Trevor Nicholls wrote: > Well I don't know; I suppose that's part of my question. > > It's not immediately obvious to me that t

Can I simplify this bit of query boosting?

2023-05-11 Thread Trevor Nicholls
Hi, I've hit a wall here. In brief, users search a library of documents. Every indexed document has a version number field which is always populated for release notes, sometimes for other docs. Every document also has a category field which is how release notes are identified, among other conte

RE: Can I simplify this bit of query boosting?

2023-05-14 Thread Trevor Nicholls
can > refine sort with this numeric value. > I hope it helps - at least to give you an idea which way to go. > BR, > Hrvoje > > On Thu, 11 May 2023, 15:44 Trevor Nicholls, > > wrote: > > > Hi, I've hit a wall here. > > > > > > > > In br

Filter question

2023-11-21 Thread Trevor Nicholls
Hi I'm constructing a BooleanQuery with an optional filter query and a mandatory content query, plus some optional boost queries. In effect, what I am doing is implementing this shorthand: BooleanQuery.Builder qb = new BooleanQuery.Builder(); if (filter) { qb.add(filterquery, Occur.

priority of query results with text alternates

2024-10-05 Thread Trevor Nicholls
(Currently using Lucene 8_6_3, although not averse to moving to a later release if there's a recent feature I need for this) My application searches technical documents, a mix of normal text, source code and expressions involving more than letters and digits. The users want to be able to se

RE: priority of query results with text alternates

2024-10-06 Thread Trevor Nicholls
6.10.2024 um 06:28 schrieb Trevor Nicholls mailto:tre...@castingthevoid.com> >: (Currently using Lucene 8_6_3, although not averse to moving to a later release if there's a recent feature I need for this) My application searches technical documents, a mix of normal text, source code and ex

RE: Synonyms and searching

2025-03-10 Thread Trevor Nicholls
ds via dictionary > https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucen > e/analysis/compound/DictionaryCompoundWordTokenFilter.html > However, it's actually a suggester's duty > https://lucene.apache.org/core//8_0_0/suggest/org/apache/lucene/search > /s

Synonyms and searching

2025-03-05 Thread Trevor Nicholls
I don't know if I have completely the wrong idea or not, hopefully somebody can point out where I have got this wrong I am indexing technical documentation; the content contains strings like "http_proxy_server". When building the index my analyzer breaks this into the tokens "http", "proxy" and