Re: Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread ohaya
Phil Whelan wrote: > On Thu, Jul 30, 2009 at 7:12 PM, wrote: > > I was wonder if there is a list of special characters for the standard > > analyzer? > > > > What I mean by "special" is characters that the analyzer considers break > > characters. > > For example, if I have something like

Re: Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread Phil Whelan
On Thu, Jul 30, 2009 at 7:12 PM, wrote: > I was wonder if there is a list of special characters for the standard > analyzer? > > What I mean by "special" is characters that the analyzer considers break > characters. > For example, if I have something like "foo=something", apparently the analyzer

Re: Generating Query for Multiple Clauses in a Single Field

2009-07-30 Thread blazingwolf7
Thanks a lotit is truly cause by the length normalization there. I follow your suggestion and change it to 1.0f. Now it works properly. Thanks again Ahmet Arslan wrote: > > >> yah, before this i used default lucene...but i dont know >> what end up wrong...some results with only single wo

Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread ohaya
Hi, I was wonder if there is a list of special characters for the standard analyzer? What I mean by "special" is characters that the analyzer considers break characters. For example, if I have something like "foo=something", apparently the analyzer considers this as two terms, "foo" and "so

Re: indexing multiple email addresses in one field

2009-07-30 Thread Paul Cowan
Phil Whelan wrote: It seems I have to use the same Analyzer for the all the fields in the index? Nope. Look at PerFieldAnalyzerWrapper, which is effectively a Map of field names -> analyzers. This might help if different fields will have very different values and semantics. Cheers, Paul -

Re: indexing multiple email addresses in one field

2009-07-30 Thread Phil Whelan
Hi Matthew / Paul, On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan wrote: > Matthew Hall wrote: >> >> Place a delimiter between the email addresses that doesn't get removed in >> your analyzer.  (preferably something you know will never be searched on) > > Or add them separately (rather than: >  doc.a

ThreadedIndexWriter vs. IndexWriter

2009-07-30 Thread Jibo John
While trying out a few tuning options using contrib/benchmak as described in LIA (2nd edition) book, I had an interesting observation. If I use a ThreadedIndexWriter (picked the example from lia2e, page 356) instead of IndexWriter, the index size got reduced by 40% compared to using IndexWr

Re: indexing multiple email addresses in one field

2009-07-30 Thread Paul Cowan
Matthew Hall wrote: Place a delimiter between the email addresses that doesn't get removed in your analyzer. (preferably something you know will never be searched on) Or add them separately (rather than: doc.add(new Field("email", "f...@bar.com b...@foo.com c...@bar.foo" ...); use doc.add

Re: Querying across object relationships

2009-07-30 Thread Lukáš Vlček
Hi, Lucene Document is a set of fields. Each field has a name and a textual value. There is no notion of nested fields (filed inside a filed). Do not focus too much on the XML representation of the index obtained from Luke. Read Lucene documentation instead. When indexing a java bean then what in

RE: Querying across object relationships

2009-07-30 Thread Paolo DiCanio
Thanks Steven, I guess the index structure that I need in order to perform my query is: cooking N art Y Bob But I'm not sure how to map my domain classes in order to achieve this (or

Re: Querying across object relationships

2009-07-30 Thread Paolo DiCanio
The domain classes are defined as Groovy classes with compass annotations (see my original post). Each class maps directly to a DB table and when the application starts up, Compass automatically reads the relevant tables and adds the data to the index. Lukáš Vlček wrote: > > Don, > To me it se

Re: Querying across object relationships

2009-07-30 Thread Renaud Delbru
Hi Donal, We released SIREn [1], a plugin for Lucene that allows indexing and querying of semi-structured data, a few days ago. Your use case seems to match perfectly what SIREn can do. SIREn enables the indexing of semi-structured data into a Lucene field, and offers additional query compon

Re: Problems with IndexWriter.commit()

2009-07-30 Thread Michael McCandless
Phew :) Thanks for bringing closure! Mike On Thu, Jul 30, 2009 at 3:57 PM, Woolf, Ross wrote: > This turned out to be my own problem, but using infoStream helped me to > discover where my problem was. > > Thanks > > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccand

Re: Querying across object relationships

2009-07-30 Thread Lukáš Vlček
Don, may be you could try this: @Searchable @SearchableDynamicMetaData(name="noAttending", converter="groovy", expression=" ... here iterate over all attendances where attendance = N and output all course names") @SearchableDynamicMetaData(name="yesAttending", converter="groovy", expressio

Re: $ or £ symbols are excluded from Search Query

2009-07-30 Thread Erick Erickson
WhitespaceAnalyzer won't fold case. It won't strip any "odd" characters out. It won't, in fact, do anything except break on white space. You might want to write your own analyzer that incorporates, some of the filters, especially LowercaseFilter. On Wed, Jul 29, 2009 at 9:04 AM, cbowditch wrote:

Re: Querying across object relationships

2009-07-30 Thread Lukáš Vlček
Don, in order to use such query you have to keep mandatory and courseName relation in your index. In Compass you could use dynamic metadata ( http://www.compass-project.org/docs/2.2.0/reference/html/core-osem.html#core-osem-dynamic). This way you can add additional fileds into your document. You ca

RE: Problems with IndexWriter.commit()

2009-07-30 Thread Woolf, Ross
This turned out to be my own problem, but using infoStream helped me to discover where my problem was. Thanks -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, July 29, 2009 6:11 PM To: java-user@lucene.apache.org Subject: Re: Problems with

RE: Querying across object relationships

2009-07-30 Thread Steven A Rowe
Hi Donal, I looked at the XML index dump you provided, and I can see that there is only one document in the index. This document matches your query. I've pasted it below, without the "$/*"-named fields I'm assuming Compass adds to manage Lucene document -> Grails object mapping, and with just

Re: Querying across object relationships

2009-07-30 Thread Lukáš Vlček
Don, To me it seems as if there is only one document in your index, and moreover the only document has mutifield courseName and mandatory fields (this means you will get the same result even if you query +courseName:art +mandatory:N). Do you think you can share how you create your domain objects an

Re: Querying across object relationships

2009-07-30 Thread Donal Murtagh
Basically the classes I'm indexing have the following relationships: Student 1--* Attendance 1--* Course The only root class is Student, i.e. only instances of this class can be returned from a search. I have a Student object graph that could be represented in JSON as follows: { name:

Re: Term's frequency

2009-07-30 Thread ohaya
prashant ullegaddi wrote: > How to get the number of times a term occurs in the Lucene index? > > Regards, > Prashant. Hi, You didn't mention if you were looking for something programmatic or not, but there's a tool called "Luke", and when you start that up and point it to your index

Re: indexing multiple email addresses in one field

2009-07-30 Thread Matthew Hall
Place a delimiter between the email addresses that doesn't get removed in your analyzer. (preferably something you know will never be searched on) That way you can ensure that each email matches independently of each other. So something like f...@bar.com DELIM123 b...@foo.com DELIM123 c...@ba

Term's frequency

2009-07-30 Thread prashant ullegaddi
How to get the number of times a term occurs in the Lucene index? Regards, Prashant.

Re: indexing multiple email addresses in one field

2009-07-30 Thread Phil Whelan
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall wrote: > > 1. Sure, just have an analyzer that splits on all non letter characters. > 2. Phrase queries keep the order intact.  (And yes, the positional > information for the terms is kept, which is what allows span queries to work) > > So searching

Re: indexing multiple email addresses in one field

2009-07-30 Thread Matthew Hall
1. Sure, just have an analyzer that splits on all non letter characters. 2. Phrase queries keep the order intact. (And yes, the positional information for the terms is kept, which is what allows span queries to work) So searching on the following "foo bar com" will match f...@bar.com but not

indexing multiple email addresses in one field

2009-07-30 Thread Phil Whelan
Hi, We have a very large lucene index that we're developing that has a field of email addresses. (Actually mulitple fields with multiple emails addresses, but I'll simplify here) Each document will have one "email" field containing multiple email addresses. I am indexing email addresses only usi

RE: Querying across object relationships

2009-07-30 Thread Steven A Rowe
Hi Donal, I'm not familiar with Compass annotations, so forgive my ignorance, but it's not clear to me what your documents look like, or how a Lucene document corresponds to your objects. What does the document you get as a hit when you search look like? That is, what fields are defined on it

Re: Querying across object relationships

2009-07-30 Thread Donal Murtagh
Hi, I tried your suggestion: "+courseName:cooking +mandatory:Y" but it still matches the student who attends a non-mandatory cooking course, and another mandatory course, which is not what I want. The only reason I was using "AND" in my query, was to be explicit about how the predicates should b

Re: Querying across object relationships

2009-07-30 Thread Lukáš Vlček
Hi, this is interesting but why do you use "AND" in your query when both the term are a MUST (they have +). See http://lucene.apache.org/java/2_4_1/queryparsersyntax.html for more details about Lucene query syntax. Try dropping the AND and try the following query: +courseName:cooking +mandatory:Y

Re: How to index IP addresses?

2009-07-30 Thread ohaya
Hi Matthew and Narcis, I think that I found the (original) problem. It looks like the reason that I was getting all those other terms, which looked to me like the octets, weren't the octets :)... When I was doing the doc.add(), there were some other numbers (not IP addresses) in the String tha

Re: How to search "path"?

2009-07-30 Thread ohaya
Ian, I'll respond to this msg, re. searching "path". I made the change you suggested, to "Field.Index.ANALYZED", and that fixed the problem I was having with searching for components of the "path" field. Thanks! Jim Ian Lea wrote: > In contrast to your last question and reply, if you u

Re: Querying across object relationships

2009-07-30 Thread Donal Murtagh
Hi Phil, I don't really have any query parsing/generation code to send you, because I'm not using Lucene directly. I'm using the Grails Searchable Plugin, which builds on both Lucene and Compass. The only relevant information I can give you is my Grails domain classes which show how I've mapped m

Re: How to index IP addresses?

2009-07-30 Thread Matthew Hall
I'm a little unclear on how you could be getting both "aa.bb.cc.dd" as a term, and then also the octets. Are you adding the "contents" field into the index multiple times, possibly with separate analyzers? Could you possibly try a test, very simple case? Just create an index with a single lu

Re: How to search "path"?

2009-07-30 Thread Ian Lea
In contrast to your last question and reply, if you use doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZED)); the path will get split into tokens which will include "myfile1" and you will be able to search for it. The key concept for both questions is analysis. Luce

RE: How to index IP addresses?

2009-07-30 Thread ohaya
Hi, Oh. Ok, thanks! I'll give that a try. Jim "Armasu wrote: > Keyword: Field.Index.NOT_ANALYZED > > -Original Message- > From: oh...@cox.net [mailto:oh...@cox.net] > Sent: Thursday, July 30, 2009 4:36 PM > To: java-user@lucene.apache.org > Subject: How to index IP addresses?

RE: How to index IP addresses?

2009-07-30 Thread Armasu, Narcis
Keyword: Field.Index.NOT_ANALYZED -Original Message- From: oh...@cox.net [mailto:oh...@cox.net] Sent: Thursday, July 30, 2009 4:36 PM To: java-user@lucene.apache.org Subject: How to index IP addresses? Hi, I am trying to index information in some proprietary-formatted files. In parti

How to search "path"?

2009-07-30 Thread ohaya
Hi, I am working with a modified version of the demo IndexFiles. In that code, when it builds the index, it has: doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); In Luke, I can see all the file paths in the "path" field. I am also using the demo lucenewe

How to index IP addresses?

2009-07-30 Thread ohaya
Hi, I am trying to index information in some proprietary-formatted files. In particular, these files contain some IP addresses in dotted notation, e.g., aa.bb.cc.dd. For my initial test, I have a Document implementation, and after I extract what I need into a String named "Info", I do: doc.

Re: How to can I to customize the Similarity?

2009-07-30 Thread Grant Ingersoll
See the Similarity (and DefaultSimilarity) class in the Lucene source code. There are, of course, many other ways to customize similarity: Function queries, write your own queries, etc. More details on what you are trying to customize would help answer your question. On Jul 28, 2009, at

Re: Generating Query for Multiple Clauses in a Single Field

2009-07-30 Thread AHMET ARSLAN
> yah, before this i used default lucene...but i dont know > what end up wrong...some results with only single word matching when to > the top of the results. Hmm. Interesting. It seems that length normalization causing this. Very short documents with only single word matching getting high scor

Re: Generating Query for Multiple Clauses in a Single Field

2009-07-30 Thread blazingwolf7
yah, before this i used default lucene...but i dont know what end up wrong...some results with only single word matching when to the top of the results. This i assumed is due to the score of the result being to high. Tat's why i am trying to add additional boost Ahmet Arslan wrote: > > > : I

Re: Generating Query for Multiple Clauses in a Single Field

2009-07-30 Thread AHMET ARSLAN
: I am trying to create a query, that first will return a set : of results, then : it will give a boost to the results that have all the : keyword entered by the user. If I understand you correctly: User will enter multiple keywords. Lets say a b c d. And you want documents - that contains/have