Phil Whelan wrote:
> On Thu, Jul 30, 2009 at 7:12 PM, wrote:
> > I was wonder if there is a list of special characters for the standard
> > analyzer?
> >
> > What I mean by "special" is characters that the analyzer considers break
> > characters.
> > For example, if I have something like
On Thu, Jul 30, 2009 at 7:12 PM, wrote:
> I was wonder if there is a list of special characters for the standard
> analyzer?
>
> What I mean by "special" is characters that the analyzer considers break
> characters.
> For example, if I have something like "foo=something", apparently the analyzer
Thanks a lotit is truly cause by the length normalization there. I follow
your suggestion and change it to 1.0f. Now it works properly.
Thanks again
Ahmet Arslan wrote:
>
>
>> yah, before this i used default lucene...but i dont know
>> what end up wrong...some results with only single wo
Hi,
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by "special" is characters that the analyzer considers break
characters. For example, if I have something like "foo=something", apparently
the analyzer considers this as two terms, "foo" and "so
Phil Whelan wrote:
It seems I have to use the same Analyzer for the all the fields in the
index?
Nope. Look at PerFieldAnalyzerWrapper, which is effectively a Map of
field names -> analyzers. This might help if different fields will have
very different values and semantics.
Cheers,
Paul
-
Hi Matthew / Paul,
On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan wrote:
> Matthew Hall wrote:
>>
>> Place a delimiter between the email addresses that doesn't get removed in
>> your analyzer. (preferably something you know will never be searched on)
>
> Or add them separately (rather than:
> doc.a
While trying out a few tuning options using contrib/benchmak as
described in LIA (2nd edition) book, I had an interesting observation.
If I use a ThreadedIndexWriter (picked the example from lia2e, page
356) instead of IndexWriter, the index size got reduced by 40%
compared to using IndexWr
Matthew Hall wrote:
Place a delimiter between the email addresses that doesn't get removed
in your analyzer. (preferably something you know will never be searched
on)
Or add them separately (rather than:
doc.add(new Field("email", "f...@bar.com b...@foo.com c...@bar.foo" ...);
use
doc.add
Hi,
Lucene Document is a set of fields. Each field has a name and a textual
value. There is no notion of nested fields (filed inside a filed). Do not
focus too much on the XML representation of the index obtained from Luke.
Read Lucene documentation instead.
When indexing a java bean then what in
Thanks Steven,
I guess the index structure that I need in order to perform my query is:
cooking
N
art
Y
Bob
But I'm not sure how to map my domain classes in order to achieve this (or
The domain classes are defined as Groovy classes with compass annotations
(see my original post).
Each class maps directly to a DB table and when the application starts up,
Compass automatically reads the relevant tables and adds the data to the
index.
Lukáš Vlček wrote:
>
> Don,
> To me it se
Hi Donal,
We released SIREn [1], a plugin for Lucene that allows indexing and
querying of semi-structured data, a few days ago. Your use case seems to
match perfectly what SIREn can do.
SIREn enables the indexing of semi-structured data into a Lucene field,
and offers additional query compon
Phew :) Thanks for bringing closure!
Mike
On Thu, Jul 30, 2009 at 3:57 PM, Woolf, Ross wrote:
> This turned out to be my own problem, but using infoStream helped me to
> discover where my problem was.
>
> Thanks
>
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccand
Don,
may be you could try this:
@Searchable @SearchableDynamicMetaData(name="noAttending",
converter="groovy", expression=" ... here iterate over all attendances where
attendance = N and output all course names")
@SearchableDynamicMetaData(name="yesAttending", converter="groovy",
expressio
WhitespaceAnalyzer won't fold case. It won't strip any "odd" characters out.
It won't, in fact, do anything except break on white space. You might want
to write your own analyzer that incorporates, some of the filters,
especially LowercaseFilter.
On Wed, Jul 29, 2009 at 9:04 AM, cbowditch wrote:
Don,
in order to use such query you have to keep mandatory and courseName
relation in your index. In Compass you could use dynamic metadata (
http://www.compass-project.org/docs/2.2.0/reference/html/core-osem.html#core-osem-dynamic).
This way you can add additional fileds into your document. You ca
This turned out to be my own problem, but using infoStream helped me to
discover where my problem was.
Thanks
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Wednesday, July 29, 2009 6:11 PM
To: java-user@lucene.apache.org
Subject: Re: Problems with
Hi Donal,
I looked at the XML index dump you provided, and I can see that there is only
one document in the index. This document matches your query. I've pasted it
below, without the "$/*"-named fields I'm assuming Compass adds to manage
Lucene document -> Grails object mapping, and with just
Don,
To me it seems as if there is only one document in your index, and moreover
the only document has mutifield courseName and mandatory fields (this means
you will get the same result even if you query +courseName:art +mandatory:N).
Do you think you can share how you create your domain objects an
Basically the classes I'm indexing have the following relationships:
Student 1--* Attendance 1--* Course
The
only root class is Student, i.e. only instances of this class can be
returned from a search. I have a Student object graph that could be
represented in JSON as follows:
{
name:
prashant ullegaddi wrote:
> How to get the number of times a term occurs in the Lucene index?
>
> Regards,
> Prashant.
Hi,
You didn't mention if you were looking for something programmatic or not, but
there's a tool called "Luke", and when you start that up and point it to your
index
Place a delimiter between the email addresses that doesn't get removed
in your analyzer. (preferably something you know will never be searched on)
That way you can ensure that each email matches independently of each other.
So something like
f...@bar.com DELIM123 b...@foo.com DELIM123 c...@ba
How to get the number of times a term occurs in the Lucene index?
Regards,
Prashant.
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall
wrote:
>
> 1. Sure, just have an analyzer that splits on all non letter characters.
> 2. Phrase queries keep the order intact. (And yes, the positional
> information for the terms is kept, which is what allows span queries to work)
>
> So searching
1. Sure, just have an analyzer that splits on all non letter characters.
2. Phrase queries keep the order intact. (And yes, the positional
information for the terms is kept, which is what allows span queries to
work)
So searching on the following "foo bar com" will match f...@bar.com but
not
Hi,
We have a very large lucene index that we're developing that has a
field of email addresses. (Actually mulitple fields with multiple
emails addresses, but I'll simplify here)
Each document will have one "email" field containing multiple email addresses.
I am indexing email addresses only usi
Hi Donal,
I'm not familiar with Compass annotations, so forgive my ignorance, but it's
not clear to me what your documents look like, or how a Lucene document
corresponds to your objects.
What does the document you get as a hit when you search look like? That is,
what fields are defined on it
Hi,
I tried your suggestion:
"+courseName:cooking +mandatory:Y"
but
it still matches the student who attends a non-mandatory cooking
course, and another mandatory course, which is not what I want. The
only reason I was using "AND" in my query, was to be explicit about how
the predicates should b
Hi,
this is interesting but why do you use "AND" in your query when both the
term are a MUST (they have +). See
http://lucene.apache.org/java/2_4_1/queryparsersyntax.html for more details
about Lucene query syntax.
Try dropping the AND and try the following query:
+courseName:cooking +mandatory:Y
Hi Matthew and Narcis,
I think that I found the (original) problem.
It looks like the reason that I was getting all those other terms, which looked
to me like the octets, weren't the octets :)...
When I was doing the doc.add(), there were some other numbers (not IP
addresses) in the String tha
Ian,
I'll respond to this msg, re. searching "path".
I made the change you suggested, to "Field.Index.ANALYZED", and that fixed the
problem I was having with searching for components of the "path" field.
Thanks!
Jim
Ian Lea wrote:
> In contrast to your last question and reply, if you u
Hi Phil,
I don't really have any query parsing/generation code to send you, because I'm
not using Lucene directly. I'm using the Grails Searchable Plugin,
which builds on both Lucene and Compass. The only relevant information
I can give you is my Grails domain classes which show how I've mapped
m
I'm a little unclear on how you could be getting both "aa.bb.cc.dd" as a
term, and then also the octets.
Are you adding the "contents" field into the index multiple times,
possibly with separate analyzers?
Could you possibly try a test, very simple case?
Just create an index with a single lu
In contrast to your last question and reply, if you use
doc.add(new Field("path", f.getPath(), Field.Store.YES,
Field.Index.ANALYZED));
the path will get split into tokens which will include "myfile1" and
you will be able to search for it.
The key concept for both questions is analysis. Luce
Hi,
Oh. Ok, thanks! I'll give that a try.
Jim
"Armasu wrote:
> Keyword: Field.Index.NOT_ANALYZED
>
> -Original Message-
> From: oh...@cox.net [mailto:oh...@cox.net]
> Sent: Thursday, July 30, 2009 4:36 PM
> To: java-user@lucene.apache.org
> Subject: How to index IP addresses?
Keyword: Field.Index.NOT_ANALYZED
-Original Message-
From: oh...@cox.net [mailto:oh...@cox.net]
Sent: Thursday, July 30, 2009 4:36 PM
To: java-user@lucene.apache.org
Subject: How to index IP addresses?
Hi,
I am trying to index information in some proprietary-formatted files.
In parti
Hi,
I am working with a modified version of the demo IndexFiles.
In that code, when it builds the index, it has:
doc.add(new Field("path", f.getPath(), Field.Store.YES,
Field.Index.NOT_ANALYZED));
In Luke, I can see all the file paths in the "path" field.
I am also using the demo lucenewe
Hi,
I am trying to index information in some proprietary-formatted files.
In particular, these files contain some IP addresses in dotted notation, e.g.,
aa.bb.cc.dd.
For my initial test, I have a Document implementation, and after I extract what
I need into a String named "Info", I do:
doc.
See the Similarity (and DefaultSimilarity) class in the Lucene source
code. There are, of course, many other ways to customize similarity:
Function queries, write your own queries, etc.
More details on what you are trying to customize would help answer
your question.
On Jul 28, 2009, at
> yah, before this i used default lucene...but i dont know
> what end up wrong...some results with only single word matching when to
> the top of the results.
Hmm. Interesting. It seems that length normalization causing this. Very short
documents with only single word matching getting high scor
yah, before this i used default lucene...but i dont know what end up
wrong...some results with only single word matching when to the top of the
results.
This i assumed is due to the score of the result being to high. Tat's why i
am trying to add additional boost
Ahmet Arslan wrote:
>
>
> : I
: I am trying to create a query, that first will return a set
: of results, then
: it will give a boost to the results that have all the
: keyword entered by the user.
If I understand you correctly: User will enter multiple keywords. Lets say a b
c d. And you want documents - that contains/have
42 matches
Mail list logo