Re: Google finance-like suggestible search field

2009-01-16 Thread Asbjørn A . Fellinghaug

Hi again.

You can find additional info regarding this Bigram index here:
http://asbjorn.fellinghaug.com/blog/master-thesis/

The source code was available, from the same site but it has 
disappeared. However, it can be downloaded from the computer 
science department at NTNU in Norway:
http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429

Hope this helps.

Hayes, Peter:
> Thanks for your input.  I will try and apply your suggestion.
> 
> Thanks,
> Peter 
> 
> -Original Message-
> From: Asbjørn A. Fellinghaug [mailto:asbj...@fellinghaug.com] 
> Sent: Thursday, January 15, 2009 3:25 AM
> To: java-user@lucene.apache.org
> Subject: Re: Google finance-like suggestible search field
> 
> 
> Hi.
> 
> Such 'autocompletion' features with Lucene could be provided with n-gram
> tokenizers, as Erick states. I made a 'Bigram' analyzer for my master
> thesis, when I was doing some research on how to enhance phrase
> searching. This Analyzer considers pair of words as single terms.
> 
> Basically, what the Bigram analyzer does is to index stopwords combined
> with the "previous" word, and with the "next" word. Single stopwords
> would not be indexed, as they demand a lot of resources during searches.
> Only combination of prev+stopword and stopword+nextword would be
> indexed. This saves a lot during searching.
> 
> Consider this sentence: "fetch me a beer honey" (where 'a' and 'me' is
> stopwords). The Bigram analyzer would index these 'Tokens':
> 'fetch', 'fetch me', 'me a', 'a beer', 'honey'.
> 
> Erick Erickson:
> > You could look at the n-gram tokenizers (I confess I haven't used them
> > so I'm not all *that* familiar with them). Or you could make a rule like
> > "no autocomplete until the user types 3 characters" if that would work.
> > 
> > Instead of forming a query, you might try using TermEnum, or
> > WildCardTermEnum
> > or even RegexTermEnum to quickly get the list of terms for your
> > autocomplete. The
> > nice part about this approach is that you could quit after a suitable number
> > of
> > terms were found rather than get them all. As I remember, WildCardTermEnum
> > is
> > faster than RegexTermEnum, but don't hold me to that. So I'd try
> > WildCardTermEnum
> > first, I think you'll find it much more suitable than forming
> > 
> > Best
> > Erick
> 
> -- 
> Asbjørn A. Fellinghaug
> asbj...@fellinghaug.com
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 

-- 
Asbjørn A. Fellinghaug
asbj...@fellinghaug.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Google finance-like suggestible search field

2009-01-16 Thread Shalin Shekhar Mangar
Also look at ConstantScorePrefixQuery in Solr source.

In the past I've used Solr with shingles and prefix queries to solve similar
problems.

On Thu, Jan 15, 2009 at 7:29 AM, Hayes, Peter  wrote:

> Hi all,
>
> We are trying to implement a Google finance-like suggest as you type
> search field.  The index is quite large and comprised of multiple fields
> to search across so our initial implementation was to use a BooleanQuery
> with multiple PrefixQuery across each field.  We quickly ran into the
> TooManyClauses exception and are looking for alternatives.
>
> Is there an implementation pattern for this use case using lucene?  This
> seems like a common feature on various sites and I'm wondering if lucene
> can be used to support this.
>
> Thanks in advance.
>
> Peter Hayes
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Lucene index updation and performance

2009-01-16 Thread mitu2009

I am working on a job portal site and have been using Lucene for job search
functionality. 
Users will be posting a number jobs on our site on a daily basis.We need to
make sure that new job posted is searchable on the site as soon as possible. 
In this context, how do I update Lucene index when a new job is posted or
when an existing job is edited? 
Can lucene index updating and search work in parallel? 

Also,can I know any tips/best practices with respect to Lucene
indexing,optimizing,performance etc? 

Appreciate ur help! 

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Lucene-index-updation-and-performance-tp21504659p21504659.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Maximum boost factor

2009-01-16 Thread mitu2009

Does anyone know the maximum boost factor value for a field in Lucene? 

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Maximum-boost-factor-tp21504717p21504717.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene index updation and performance

2009-01-16 Thread Angel, Eric
You can simply call IndexWriter.addDocument() for new jobs and
IndexWriter.updateDocument

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexWri
ter.html

Also, don't forget to optimize your index.  Depending on your volume,
you might want to optimize during slow traffic.

Eric Angel 

-Original Message-
From: mitu2009 [mailto:musicfrea...@gmail.com] 
Sent: Friday, January 16, 2009 9:39 AM
To: java-user@lucene.apache.org
Subject: Lucene index updation and performance


I am working on a job portal site and have been using Lucene for job
search
functionality. 
Users will be posting a number jobs on our site on a daily basis.We need
to
make sure that new job posted is searchable on the site as soon as
possible. 
In this context, how do I update Lucene index when a new job is posted
or
when an existing job is edited? 
Can lucene index updating and search work in parallel? 

Also,can I know any tips/best practices with respect to Lucene
indexing,optimizing,performance etc? 

Appreciate ur help! 

Thanks!
-- 
View this message in context:
http://www.nabble.com/Lucene-index-updation-and-performance-tp21504659p2
1504659.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene index updation and performance

2009-01-16 Thread Erick Erickson
You should look over the FAQ, lots of information there.

See: http://wiki.apache.org/lucene-java/LuceneFAQ

You can index and search in parallel, but a searcher doesn't
see additions to an indexer until the underlying IndexReader is
closed/reopened (see the FAQ section:

Does Lucene allow searching and indexing simultaneously?)

Best
Erick

On Fri, Jan 16, 2009 at 12:38 PM, mitu2009  wrote:

>
> I am working on a job portal site and have been using Lucene for job search
> functionality.
> Users will be posting a number jobs on our site on a daily basis.We need to
> make sure that new job posted is searchable on the site as soon as
> possible.
> In this context, how do I update Lucene index when a new job is posted or
> when an existing job is edited?
> Can lucene index updating and search work in parallel?
>
> Also,can I know any tips/best practices with respect to Lucene
> indexing,optimizing,performance etc?
>
> Appreciate ur help!
>
> Thanks!
> --
> View this message in context:
> http://www.nabble.com/Lucene-index-updation-and-performance-tp21504659p21504659.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


RE: clustering with compass & terracotta

2009-01-16 Thread Angel, Eric
Glen,

Thanks for the links.  I'll try these out and see.

-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com] 
Sent: Thursday, January 15, 2009 12:06 PM
To: java-user@lucene.apache.org
Subject: Re: clustering with compass & terracotta

There is a discussion here:
 http://www.terracotta.org/web/display/orgsite/Lucene+Integration

Also of interest: "Katta - distribute lucene indexes in a grid"
http://katta.wiki.sourceforge.net/

-glen

http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lus
ql.html
http://zzzoot.blogspot.com/2008/11/software-announcement-lusql-database-
to.html
http://zzzoot.blogspot.com/2008/09/katta-released-lucene-on-grid.html
http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance.
html
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.ht
ml


2009/1/15 Angel, Eric :
> I just ran into this
>
http://www.compass-project.org/docs/2.0.0/reference/html/needle-terracot
> ta.html and was wondering if any of you had tried anything like this
and
> if so, what your experience was like.
>
>
>
> Eric
>
>



-- 

-

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



ANNOUNCE: Welcome as Contrib Committer

2009-01-16 Thread Ryan McKinley
The PMC is pleased to announce that Patrick O'Leary has been voted to  
be a a Lucene-Java Contrib committer.


Patrick has contributed a great foundation for integrating spatial  
search with lucene.  I look forward to future development in this area.


Patrick - traditionally we ask you to send out an introduction to the  
community; its nice for folks to get a sense for who everyone is.   
Also check that your new svn karma works by adding yourself to the  
list of contrib committers.


Welcome Patrick!

ryan

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer

2009-01-16 Thread Ryan McKinley

dooh, never hit paste in the subject line


On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote:

The PMC is pleased to announce that Patrick O'Leary has been voted  
to be a a Lucene-Java Contrib committer.


Patrick has contributed a great foundation for integrating spatial  
search with lucene.  I look forward to future development in this  
area.


Patrick - traditionally we ask you to send out an introduction to  
the community; its nice for folks to get a sense for who everyone  
is.  Also check that your new svn karma works by adding yourself to  
the list of contrib committers.


Welcome Patrick!

ryan



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Term Frequency and IndexSearcher

2009-01-16 Thread Chris Hostetter

: References:
: 
: <1998.130.159.185.12.1232021837.squir...@webmail.cis.strath.ac.uk>
: Date: Thu, 15 Jan 2009 04:49:49 -0800 (PST)
: Subject: Term Frequency and IndexSearcher

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking




-Hoss


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer

2009-01-16 Thread patrick o'leary
Thanks Folks

I'm in the business well over a decade now; Started my career in my country
of origin in Ireland, and have since lived & worked in UK and the US. I've
also traveled extensively establishing development groups in remote offices
for my company
in a few countries.

I've worked in several areas, from global publishing services, CRM's /
fulfillment systems, web server development, to technical operations and for
the past number of years have made a home for myself in search and local
search.

My background has been in CS, math and physics.
And despite the rumors my user name "pjaol" is actually an acronym of my
full name, which is only ever used
by my mother when I'm in trouble :-)

It will be a pleasure to continue working with all of you, and thank you
again for this honor.

Thanks
Patrick O'Leary



> On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote:
>
>  The PMC is pleased to announce that Patrick O'Leary has been voted to be a
>> a Lucene-Java Contrib committer.
>>
>> Patrick has contributed a great foundation for integrating spatial search
>> with lucene.  I look forward to future development in this area.
>>
>> Patrick - traditionally we ask you to send out an introduction to the
>> community; its nice for folks to get a sense for who everyone is.  Also
>> check that your new svn karma works by adding yourself to the list of
>> contrib committers.
>>
>> Welcome Patrick!
>>
>> ryan
>>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer

2009-01-16 Thread Michael McCandless


Welcome aboard Patrick!

Mike

patrick o'leary wrote:


Thanks Folks

I'm in the business well over a decade now; Started my career in my  
country
of origin in Ireland, and have since lived & worked in UK and the  
US. I've
also traveled extensively establishing development groups in remote  
offices

for my company
in a few countries.

I've worked in several areas, from global publishing services, CRM's /
fulfillment systems, web server development, to technical operations  
and for
the past number of years have made a home for myself in search and  
local

search.

My background has been in CS, math and physics.
And despite the rumors my user name "pjaol" is actually an acronym  
of my

full name, which is only ever used
by my mother when I'm in trouble :-)

It will be a pleasure to continue working with all of you, and thank  
you

again for this honor.

Thanks
Patrick O'Leary




On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote:

The PMC is pleased to announce that Patrick O'Leary has been voted  
to be a

a Lucene-Java Contrib committer.

Patrick has contributed a great foundation for integrating spatial  
search

with lucene.  I look forward to future development in this area.

Patrick - traditionally we ask you to send out an introduction to  
the
community; its nice for folks to get a sense for who everyone is.   
Also
check that your new svn karma works by adding yourself to the list  
of

contrib committers.

Welcome Patrick!

ryan




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer

2009-01-16 Thread Shalin Shekhar Mangar
Welcome Patrick!

On Sat, Jan 17, 2009 at 1:22 AM, patrick o'leary  wrote:

> Thanks Folks
>
> I'm in the business well over a decade now; Started my career in my country
> of origin in Ireland, and have since lived & worked in UK and the US. I've
> also traveled extensively establishing development groups in remote offices
> for my company
> in a few countries.
>
> I've worked in several areas, from global publishing services, CRM's /
> fulfillment systems, web server development, to technical operations and
> for
> the past number of years have made a home for myself in search and local
> search.
>
> My background has been in CS, math and physics.
> And despite the rumors my user name "pjaol" is actually an acronym of my
> full name, which is only ever used
> by my mother when I'm in trouble :-)
>
> It will be a pleasure to continue working with all of you, and thank you
> again for this honor.
>
> Thanks
> Patrick O'Leary
>
>
>
> > On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote:
> >
> >  The PMC is pleased to announce that Patrick O'Leary has been voted to be
> a
> >> a Lucene-Java Contrib committer.
> >>
> >> Patrick has contributed a great foundation for integrating spatial
> search
> >> with lucene.  I look forward to future development in this area.
> >>
> >> Patrick - traditionally we ask you to send out an introduction to the
> >> community; its nice for folks to get a sense for who everyone is.  Also
> >> check that your new svn karma works by adding yourself to the list of
> >> contrib committers.
> >>
> >> Welcome Patrick!
> >>
> >> ryan
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer

2009-01-16 Thread Mark Miller

Welcome Patrick!

+1 for LocalLucene.

patrick o'leary wrote:

Thanks Folks

I'm in the business well over a decade now; Started my career in my country
of origin in Ireland, and have since lived & worked in UK and the US. I've
also traveled extensively establishing development groups in remote offices
for my company
in a few countries.

I've worked in several areas, from global publishing services, CRM's /
fulfillment systems, web server development, to technical operations and for
the past number of years have made a home for myself in search and local
search.

My background has been in CS, math and physics.
And despite the rumors my user name "pjaol" is actually an acronym of my
full name, which is only ever used
by my mother when I'm in trouble :-)

It will be a pleasure to continue working with all of you, and thank you
again for this honor.

Thanks
Patrick O'Leary



  

On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote:

 The PMC is pleased to announce that Patrick O'Leary has been voted to be a


a Lucene-Java Contrib committer.

Patrick has contributed a great foundation for integrating spatial search
with lucene.  I look forward to future development in this area.

Patrick - traditionally we ask you to send out an introduction to the
community; its nice for folks to get a sense for who everyone is.  Also
check that your new svn karma works by adding yourself to the list of
contrib committers.

Welcome Patrick!

ryan

  

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





  



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Nightly source builds of Lucene ..

2009-01-16 Thread Kay Kay
I am trying to access the nightly lucene builds here at -  
http://people.apache.org/builds/lucene/java/nightly/  .  It does not 
seem to be available for sometime.  Just curious if that is the right 
source to access the same. 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Nightly source builds of Lucene ..

2009-01-16 Thread Ryan McKinley

maybe try:

http://hudson.zones.apache.org/hudson/view/Solr/job/Solr-trunk/



On Jan 16, 2009, at 4:47 PM, Kay Kay wrote:

I am trying to access the nightly lucene builds here at -  http://people.apache.org/builds/lucene/java/nightly/ 
  .  It does not seem to be available for sometime.  Just curious if  
that is the right source to access the same.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Search Across All Fields

2009-01-16 Thread Jamie

Hi Everyone

I have two queries:

Query 1
==

(attachments:"beauty supply") AND sentdate:[d2008111701 TO 
d20090117235900]


Query 2
==

(priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty 
archivedate:beauty receiveddate:beauty from:beauty to:beauty 
subject:beauty cc:beauty bcc:beauty deliveredto:beauty flag:beauty 
sensitivity:beauty sender:beauty recipient:beauty body:beauty 
attachments:beauty attachname:beauty AND priority:supply attach:supply 
score:supply size:supply sentdate:supply archivedate:supply 
receiveddate:supply from:supply to:supply subject:supply cc:supply 
bcc:supply deliveredto:supply flag:supply sensitivity:supply 
sender:supply recipient:supply body:supply attachments:supply 
attachname:supply) AND sentdate:[d2008111701 TO d20090117235900]


Query 1 returns 138 results, while Query 2 return 0 result. Any idea 
why? The second query is meant to offer the search across all fields, 
whereas the first query specifies one field. Is there a better way to 
conduct a search across all fields? Am I missing something?


Thanks in advance for your help!

Regards,

Jamie


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Search Across All Fields

2009-01-16 Thread Zhang, Lisheng
Hi,

Inside (priority:beauty ..) there is an AND,
is that operator what you want?

Best regards, Lisheng

-Original Message-
From: Jamie [mailto:ja...@stimulussoft.com]
Sent: Friday, January 16, 2009 3:02 PM
To: java-user@lucene.apache.org
Subject: Search Across All Fields


Hi Everyone

I have two queries:

Query 1
==

(attachments:"beauty supply") AND sentdate:[d2008111701 TO 
d20090117235900]

Query 2
==

(priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty 
archivedate:beauty receiveddate:beauty from:beauty to:beauty 
subject:beauty cc:beauty bcc:beauty deliveredto:beauty flag:beauty 
sensitivity:beauty sender:beauty recipient:beauty body:beauty 
attachments:beauty attachname:beauty AND priority:supply attach:supply 
score:supply size:supply sentdate:supply archivedate:supply 
receiveddate:supply from:supply to:supply subject:supply cc:supply 
bcc:supply deliveredto:supply flag:supply sensitivity:supply 
sender:supply recipient:supply body:supply attachments:supply 
attachname:supply) AND sentdate:[d2008111701 TO d20090117235900]

Query 1 returns 138 results, while Query 2 return 0 result. Any idea 
why? The second query is meant to offer the search across all fields, 
whereas the first query specifies one field. Is there a better way to 
conduct a search across all fields? Am I missing something?

Thanks in advance for your help!

Regards,

Jamie


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Words that need protection from stemming, i.e., protwords.txt

2009-01-16 Thread David Woodward
Hi.

Any good protwords.txt out there?

In a fairly standard solr analyzer chain, we use the English Porter analyzer 
like so:



For most purposes the porter does just fine, but occasionally words come along 
that really don't work out to well, e.g.,

"maine" is stemmed to "main" - clearly goofing up precision about "Maine" 
without doing much good for variants of "main".

So - I have an entry for my protwords.txt. What else should go in there?

Thanks for your ideas,

Dave Woodward


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Words that need protection from stemming, i.e., protwords.txt

2009-01-16 Thread patrick o'leary
Porter is a little outdated I've found KStem much better
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

You'll still need a good protected word list, but KStem is just a little
nicer

On Fri, Jan 16, 2009 at 6:20 PM, David Woodward  wrote:

> Hi.
>
> Any good protwords.txt out there?
>
> In a fairly standard solr analyzer chain, we use the English Porter
> analyzer like so:
>
> 
>
> For most purposes the porter does just fine, but occasionally words come
> along that really don't work out to well, e.g.,
>
> "maine" is stemmed to "main" - clearly goofing up precision about "Maine"
> without doing much good for variants of "main".
>
> So - I have an entry for my protwords.txt. What else should go in there?
>
> Thanks for your ideas,
>
> Dave Woodward
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Search Across All Fields

2009-01-16 Thread Erick Erickson
I think you forgot a set of parentheses, a close paren right before
the AND and an open paren right after AND

Depending upon how big your index is, a MUCH easier way to do
this is to index another field, call it all_text say, and add all your
terms to that field as well as to the individual one, then search your
all_text field instead

Best
Erick

On Fri, Jan 16, 2009 at 6:02 PM, Jamie  wrote:

> Hi Everyone
>
> I have two queries:
>
> Query 1
> ==
>
> (attachments:"beauty supply") AND sentdate:[d2008111701 TO
> d20090117235900]
>
> Query 2
> ==
>
> (priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty
> archivedate:beauty receiveddate:beauty from:beauty to:beauty subject:beauty
> cc:beauty bcc:beauty deliveredto:beauty flag:beauty sensitivity:beauty
> sender:beauty recipient:beauty body:beauty attachments:beauty
> attachname:beauty AND priority:supply attach:supply score:supply size:supply
> sentdate:supply archivedate:supply receiveddate:supply from:supply to:supply
> subject:supply cc:supply bcc:supply deliveredto:supply flag:supply
> sensitivity:supply sender:supply recipient:supply body:supply
> attachments:supply attachname:supply) AND sentdate:[d2008111701 TO
> d20090117235900]
>
> Query 1 returns 138 results, while Query 2 return 0 result. Any idea why?
> The second query is meant to offer the search across all fields, whereas the
> first query specifies one field. Is there a better way to conduct a search
> across all fields? Am I missing something?
>
> Thanks in advance for your help!
>
> Regards,
>
> Jamie
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


term offsets info seems to be wrong...

2009-01-16 Thread Koji Sekiguchi
Hello,

I'm writing a highlighter by using term offsets info (yes, I borrowed
the idea
of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info
when getting multi-valued field.

For example, if I indexed [" "," bbb "] (multi-valued), I got term info
bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "]
(note that using " aaa " instead of " "), I got term info bbb(6,9) which
is unexpected. I would like to get same offset info for bbb because they
are same length of field values.

Please use the following program to see the problem I'm seeing. I'm
using trunk:

public static void main(String[] args) throws Exception {
// create an index
Directory dir = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter( dir, analyzer, true,
MaxFieldLength.LIMITED );
Document doc = new Document();
doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
//doc.add( new Field( "f", " ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
writer.addDocument( doc );
writer.close();

// print the offsets
IndexReader reader = IndexReader.open( dir );
TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector(
0, "f" );
for( int i = 0; i < tpv.getTerms().length; i++ ){
System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" );
TermVectorOffsetInfo[] tvois = tpv.getOffsets( i );
for( TermVectorOffsetInfo tvoi : tvois ){
System.out.println( "(" + tvoi.getStartOffset() + "," +
tvoi.getEndOffset() + ")" );
}
}
reader.close();
}

regards,

Koji


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: term offsets info seems to be wrong...

2009-01-16 Thread Mark Miller
Okay, Koji, hopefully I'll be more luckily suggesting this this time.

Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am
not sure if its in an applyable state, but I hope that covers your issue.

On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi  wrote:

> Hello,
>
> I'm writing a highlighter by using term offsets info (yes, I borrowed
> the idea
> of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info
> when getting multi-valued field.
>
> For example, if I indexed [" "," bbb "] (multi-valued), I got term info
> bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "]
> (note that using " aaa " instead of " "), I got term info bbb(6,9)
> which
> is unexpected. I would like to get same offset info for bbb because they
> are same length of field values.
>
> Please use the following program to see the problem I'm seeing. I'm
> using trunk:
>
> public static void main(String[] args) throws Exception {
> // create an index
> Directory dir = new RAMDirectory();
> Analyzer analyzer = new WhitespaceAnalyzer();
> IndexWriter writer = new IndexWriter( dir, analyzer, true,
> MaxFieldLength.LIMITED );
> Document doc = new Document();
> doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED,
> TermVector.WITH_OFFSETS ) );
> //doc.add( new Field( "f", " ", Store.YES, Index.ANALYZED,
> TermVector.WITH_OFFSETS ) );
> doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED,
> TermVector.WITH_OFFSETS ) );
> writer.addDocument( doc );
> writer.close();
>
> // print the offsets
> IndexReader reader = IndexReader.open( dir );
> TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector(
> 0, "f" );
> for( int i = 0; i < tpv.getTerms().length; i++ ){
> System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" );
> TermVectorOffsetInfo[] tvois = tpv.getOffsets( i );
> for( TermVectorOffsetInfo tvoi : tvois ){
> System.out.println( "(" + tvoi.getStartOffset() + "," +
> tvoi.getEndOffset() + ")" );
> }
> }
> reader.close();
> }
>
> regards,
>
> Koji
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: term offsets info seems to be wrong...

2009-01-16 Thread Koji Sekiguchi

Mark,

This is exactly what I want and It worked perfectly. Thanks!
I'll post my highlighter to JIRA in a few days (hopegully).
It uses term offsets with positions (WITH_POSITIONS_OFFSETS)
to support PhraseQuery.

Thanks again,

Koji


Mark Miller wrote:

Okay, Koji, hopefully I'll be more luckily suggesting this this time.

Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am
not sure if its in an applyable state, but I hope that covers your issue.

On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi  wrote:

  

Hello,

I'm writing a highlighter by using term offsets info (yes, I borrowed
the idea
of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info
when getting multi-valued field.

For example, if I indexed [" "," bbb "] (multi-valued), I got term info
bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "]
(note that using " aaa " instead of " "), I got term info bbb(6,9)
which
is unexpected. I would like to get same offset info for bbb because they
are same length of field values.

Please use the following program to see the problem I'm seeing. I'm
using trunk:

public static void main(String[] args) throws Exception {
// create an index
Directory dir = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter( dir, analyzer, true,
MaxFieldLength.LIMITED );
Document doc = new Document();
doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
//doc.add( new Field( "f", " ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED,
TermVector.WITH_OFFSETS ) );
writer.addDocument( doc );
writer.close();

// print the offsets
IndexReader reader = IndexReader.open( dir );
TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector(
0, "f" );
for( int i = 0; i < tpv.getTerms().length; i++ ){
System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" );
TermVectorOffsetInfo[] tvois = tpv.getOffsets( i );
for( TermVectorOffsetInfo tvoi : tvois ){
System.out.println( "(" + tvoi.getStartOffset() + "," +
tvoi.getEndOffset() + ")" );
}
}
reader.close();
}

regards,

Koji


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





  



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Search Across All Fields

2009-01-16 Thread Jamie

Hi Erick

Thanks for the pointer. I dont know how I missed that. Our index sizes 
are absolutely huge so its not really practical in putting an all_text 
field. It would great if you could introduce a macro or something that 
one could use to specify all fields.


Thanks anyway!

Jamie


Erick Erickson wrote:

I think you forgot a set of parentheses, a close paren right before
the AND and an open paren right after AND

Depending upon how big your index is, a MUCH easier way to do
this is to index another field, call it all_text say, and add all your
terms to that field as well as to the individual one, then search your
all_text field instead

Best
Erick

  



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org