Hello, I'd better begin by identifying myself as a newbie.
I am investigating using Lucene as a search tool for a library of technical
documents, much of which consists of pieces of source code and discussion of
the content.
The standard analyzer does an adequate job with normal text but st
://hoplahup.net/paul_pubs/AccessRetrievalAM.html). Please ask for
the source code, it is old and built on Lucene 3.5 so would need quite some
upgrade.
On 23 Nov 2020, at 8:42, Trevor Nicholls wrote:
> Hello, I'd better begin by identifying myself as a newbie.
>
>
>
> I am investiga
Sorry in advance for writing a small novel.
Background: I am indexing and searching technical reference documents, so
the standard language analyzers aren't appropriate. For example, the content
needs to be indexed so that a search for total matches total value,
total[value], and total(value),
string.
cheers
T
-Original Message-
From: Trevor Nicholls
Sent: Tuesday, 22 June 2021 08:10
To: java-user@lucene.apache.org
Subject: Bewildered by my search results, can anyone explain where I might
be going wrong?
Sorry in advance for writing a small novel.
Background: I am indexing and
Problem: I have indexed the filepath and the content of thousands of
documents and can successfully query the index on the text to return a
collection of filepaths. Now I need to create a collection of the tokens in
the index which matched the query.
I can see that there are solutions to a rela
Hi
You want to write your own analyzer which does not lowercase terms and which
splits terms at non-alpha or non-alphanumeric characters. You'd use the same
analyzer for indexing and for searching. Thus when building the index S.O.S is
indexed as the five terms S . O . S and if you search for S
Hi
Lucene 8.6.3
In a prototype application I build a Lucene index with a single process and
query it with another. Every operation is a new process. When the data
changes I simply recreate the index and future searches pick up the new
index. Of course performance is sub-optimal.
So I am
I am using Lucene 8.6.3 in an application which searches a library of
technical documentation. I have implemented synonym matching which works for
single word replacements, but does not match when one of the synonyms has
two or more words. My attempts to support multi-term synonyms are failing,
and
I have technical data which I am querying with Lucene; one of the features
of the content is that a large number of technical terms may be written as
multiple words or as a compound word. For example, ISOWEEK or ISO WEEK. Or
SynonymFilter or synonym filter.
I have a synonym table which includes
e.org
Subject: Re: synonym question
Hello,
just a guess, have you tried escaping the space in your multi-word terms with
backslash?
isoweek,iso\ week
Regards
Bernd
Am 14.03.22 um 15:54 schrieb Trevor Nicholls:
> I have technical data which I am querying with Lucene; one of the
> feat
Just to confirm, escaping the spaces in synonym table construction, query
construction, or both, does not solve the problem.
-Original Message-
From: Trevor Nicholls
Sent: Tuesday, 15 March 2022 05:02
To: java-user@lucene.apache.org
Subject: RE: synonym question
Hi, thanks for such a
I am indexing some technical documentation and have been trying to add
synonym matching to the searches. Actually I am adding the synonyms at index
time so that any synonyms match at search time.
a. Simple synonyms (wordA = wordB) are working just fine.
b. Multiple synonyms (wordA
Hi Anh
The two links Michael shared relate to questions I asked when I was trying to
get synonym matching with our application.
I really do have multi-term synonym matching working at this point; there's
always scope for improvement of course but with the hints suppled in those
threads I was a
nuary 2023 09:55
To: java-user@lucene.apache.org
Subject: Re: Question for SynonymQuery
Hello Trevor.
Can you help me better understand this approach? If we have a text "wifi
router" and inject "internet device" at indexing time, terms reside at the same
positions. How to avoid fals
Hi
I'm currently using Lucene 8-6.3, and indexing a few thousand documents.
Some of these documents need to be prioritised in the search results, but
not by too much; e.g. an exact phrase match in a normal document still needs
to top the rankings ahead of a priority document that just matches t
Sorry I apologize for this being a bit long and for explaining the problem
at the very bottom after all the background, rather than starting with it at
the top. I thought it was easier to explain like this, please bear with me!
So I've indexed a library of technical documentation, and the index
ssing some point. But, can you highlight another query than one you
search for?
On Mon, Feb 20, 2023 at 5:07 PM Trevor Nicholls
wrote:
> Sorry I apologize for this being a bit long and for explaining the
> problem at the very bottom after all the background, rather than
> starting with it
/TestMatchHighlighter.java#L241-L269
If things work for you then no rush to switch over - it's yet another option to
use.
Dawid
On Mon, Feb 20, 2023 at 4:21 PM Trevor Nicholls
wrote:
> Well I don't know; I suppose that's part of my question.
>
> It's not immediately obvious to me that t
Hi, I've hit a wall here.
In brief, users search a library of documents. Every indexed document has a
version number field which is always populated for release notes, sometimes
for other docs. Every document also has a category field which is how
release notes are identified, among other conte
can
> refine sort with this numeric value.
> I hope it helps - at least to give you an idea which way to go.
> BR,
> Hrvoje
>
> On Thu, 11 May 2023, 15:44 Trevor Nicholls,
>
> wrote:
>
> > Hi, I've hit a wall here.
> >
> >
> >
> > In br
Hi
I'm constructing a BooleanQuery with an optional filter query and a
mandatory content query, plus some optional boost queries.
In effect, what I am doing is implementing this shorthand:
BooleanQuery.Builder qb = new BooleanQuery.Builder();
if (filter) {
qb.add(filterquery, Occur.
(Currently using Lucene 8_6_3, although not averse to moving to a later
release if there's a recent feature I need for this)
My application searches technical documents, a mix of normal text, source
code and expressions involving more than letters and digits.
The users want to be able to se
6.10.2024 um 06:28 schrieb Trevor Nicholls mailto:tre...@castingthevoid.com> >:
(Currently using Lucene 8_6_3, although not averse to moving to a later
release if there's a recent feature I need for this)
My application searches technical documents, a mix of normal text, source
code and ex
ds via dictionary
> https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucen
> e/analysis/compound/DictionaryCompoundWordTokenFilter.html
> However, it's actually a suggester's duty
> https://lucene.apache.org/core//8_0_0/suggest/org/apache/lucene/search
> /s
I don't know if I have completely the wrong idea or not, hopefully somebody
can point out where I have got this wrong
I am indexing technical documentation; the content contains strings like
"http_proxy_server". When building the index my analyzer breaks this into
the tokens "http", "proxy" and
25 matches
Mail list logo