Re: Google finance-like suggestible search field
Hi again. You can find additional info regarding this Bigram index here: http://asbjorn.fellinghaug.com/blog/master-thesis/ The source code was available, from the same site but it has disappeared. However, it can be downloaded from the computer science department at NTNU in Norway: http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429 Hope this helps. Hayes, Peter: > Thanks for your input. I will try and apply your suggestion. > > Thanks, > Peter > > -Original Message- > From: Asbjørn A. Fellinghaug [mailto:asbj...@fellinghaug.com] > Sent: Thursday, January 15, 2009 3:25 AM > To: java-user@lucene.apache.org > Subject: Re: Google finance-like suggestible search field > > > Hi. > > Such 'autocompletion' features with Lucene could be provided with n-gram > tokenizers, as Erick states. I made a 'Bigram' analyzer for my master > thesis, when I was doing some research on how to enhance phrase > searching. This Analyzer considers pair of words as single terms. > > Basically, what the Bigram analyzer does is to index stopwords combined > with the "previous" word, and with the "next" word. Single stopwords > would not be indexed, as they demand a lot of resources during searches. > Only combination of prev+stopword and stopword+nextword would be > indexed. This saves a lot during searching. > > Consider this sentence: "fetch me a beer honey" (where 'a' and 'me' is > stopwords). The Bigram analyzer would index these 'Tokens': > 'fetch', 'fetch me', 'me a', 'a beer', 'honey'. > > Erick Erickson: > > You could look at the n-gram tokenizers (I confess I haven't used them > > so I'm not all *that* familiar with them). Or you could make a rule like > > "no autocomplete until the user types 3 characters" if that would work. > > > > Instead of forming a query, you might try using TermEnum, or > > WildCardTermEnum > > or even RegexTermEnum to quickly get the list of terms for your > > autocomplete. The > > nice part about this approach is that you could quit after a suitable number > > of > > terms were found rather than get them all. As I remember, WildCardTermEnum > > is > > faster than RegexTermEnum, but don't hold me to that. So I'd try > > WildCardTermEnum > > first, I think you'll find it much more suitable than forming > > > > Best > > Erick > > -- > Asbjørn A. Fellinghaug > asbj...@fellinghaug.com > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Asbjørn A. Fellinghaug asbj...@fellinghaug.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Google finance-like suggestible search field
Also look at ConstantScorePrefixQuery in Solr source. In the past I've used Solr with shingles and prefix queries to solve similar problems. On Thu, Jan 15, 2009 at 7:29 AM, Hayes, Peter wrote: > Hi all, > > We are trying to implement a Google finance-like suggest as you type > search field. The index is quite large and comprised of multiple fields > to search across so our initial implementation was to use a BooleanQuery > with multiple PrefixQuery across each field. We quickly ran into the > TooManyClauses exception and are looking for alternatives. > > Is there an implementation pattern for this use case using lucene? This > seems like a common feature on various sites and I'm wondering if lucene > can be used to support this. > > Thanks in advance. > > Peter Hayes > > > -- Regards, Shalin Shekhar Mangar.
Lucene index updation and performance
I am working on a job portal site and have been using Lucene for job search functionality. Users will be posting a number jobs on our site on a daily basis.We need to make sure that new job posted is searchable on the site as soon as possible. In this context, how do I update Lucene index when a new job is posted or when an existing job is edited? Can lucene index updating and search work in parallel? Also,can I know any tips/best practices with respect to Lucene indexing,optimizing,performance etc? Appreciate ur help! Thanks! -- View this message in context: http://www.nabble.com/Lucene-index-updation-and-performance-tp21504659p21504659.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Maximum boost factor
Does anyone know the maximum boost factor value for a field in Lucene? Thanks! -- View this message in context: http://www.nabble.com/Maximum-boost-factor-tp21504717p21504717.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Lucene index updation and performance
You can simply call IndexWriter.addDocument() for new jobs and IndexWriter.updateDocument http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexWri ter.html Also, don't forget to optimize your index. Depending on your volume, you might want to optimize during slow traffic. Eric Angel -Original Message- From: mitu2009 [mailto:musicfrea...@gmail.com] Sent: Friday, January 16, 2009 9:39 AM To: java-user@lucene.apache.org Subject: Lucene index updation and performance I am working on a job portal site and have been using Lucene for job search functionality. Users will be posting a number jobs on our site on a daily basis.We need to make sure that new job posted is searchable on the site as soon as possible. In this context, how do I update Lucene index when a new job is posted or when an existing job is edited? Can lucene index updating and search work in parallel? Also,can I know any tips/best practices with respect to Lucene indexing,optimizing,performance etc? Appreciate ur help! Thanks! -- View this message in context: http://www.nabble.com/Lucene-index-updation-and-performance-tp21504659p2 1504659.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene index updation and performance
You should look over the FAQ, lots of information there. See: http://wiki.apache.org/lucene-java/LuceneFAQ You can index and search in parallel, but a searcher doesn't see additions to an indexer until the underlying IndexReader is closed/reopened (see the FAQ section: Does Lucene allow searching and indexing simultaneously?) Best Erick On Fri, Jan 16, 2009 at 12:38 PM, mitu2009 wrote: > > I am working on a job portal site and have been using Lucene for job search > functionality. > Users will be posting a number jobs on our site on a daily basis.We need to > make sure that new job posted is searchable on the site as soon as > possible. > In this context, how do I update Lucene index when a new job is posted or > when an existing job is edited? > Can lucene index updating and search work in parallel? > > Also,can I know any tips/best practices with respect to Lucene > indexing,optimizing,performance etc? > > Appreciate ur help! > > Thanks! > -- > View this message in context: > http://www.nabble.com/Lucene-index-updation-and-performance-tp21504659p21504659.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
RE: clustering with compass & terracotta
Glen, Thanks for the links. I'll try these out and see. -Original Message- From: Glen Newton [mailto:glen.new...@gmail.com] Sent: Thursday, January 15, 2009 12:06 PM To: java-user@lucene.apache.org Subject: Re: clustering with compass & terracotta There is a discussion here: http://www.terracotta.org/web/display/orgsite/Lucene+Integration Also of interest: "Katta - distribute lucene indexes in a grid" http://katta.wiki.sourceforge.net/ -glen http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lus ql.html http://zzzoot.blogspot.com/2008/11/software-announcement-lusql-database- to.html http://zzzoot.blogspot.com/2008/09/katta-released-lucene-on-grid.html http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance. html http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.ht ml 2009/1/15 Angel, Eric : > I just ran into this > http://www.compass-project.org/docs/2.0.0/reference/html/needle-terracot > ta.html and was wondering if any of you had tried anything like this and > if so, what your experience was like. > > > > Eric > > -- - - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
ANNOUNCE: Welcome as Contrib Committer
The PMC is pleased to announce that Patrick O'Leary has been voted to be a a Lucene-Java Contrib committer. Patrick has contributed a great foundation for integrating spatial search with lucene. I look forward to future development in this area. Patrick - traditionally we ask you to send out an introduction to the community; its nice for folks to get a sense for who everyone is. Also check that your new svn karma works by adding yourself to the list of contrib committers. Welcome Patrick! ryan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer
dooh, never hit paste in the subject line On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote: The PMC is pleased to announce that Patrick O'Leary has been voted to be a a Lucene-Java Contrib committer. Patrick has contributed a great foundation for integrating spatial search with lucene. I look forward to future development in this area. Patrick - traditionally we ask you to send out an introduction to the community; its nice for folks to get a sense for who everyone is. Also check that your new svn karma works by adding yourself to the list of contrib committers. Welcome Patrick! ryan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Term Frequency and IndexSearcher
: References: : : <1998.130.159.185.12.1232021837.squir...@webmail.cis.strath.ac.uk> : Date: Thu, 15 Jan 2009 04:49:49 -0800 (PST) : Subject: Term Frequency and IndexSearcher http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer
Thanks Folks I'm in the business well over a decade now; Started my career in my country of origin in Ireland, and have since lived & worked in UK and the US. I've also traveled extensively establishing development groups in remote offices for my company in a few countries. I've worked in several areas, from global publishing services, CRM's / fulfillment systems, web server development, to technical operations and for the past number of years have made a home for myself in search and local search. My background has been in CS, math and physics. And despite the rumors my user name "pjaol" is actually an acronym of my full name, which is only ever used by my mother when I'm in trouble :-) It will be a pleasure to continue working with all of you, and thank you again for this honor. Thanks Patrick O'Leary > On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote: > > The PMC is pleased to announce that Patrick O'Leary has been voted to be a >> a Lucene-Java Contrib committer. >> >> Patrick has contributed a great foundation for integrating spatial search >> with lucene. I look forward to future development in this area. >> >> Patrick - traditionally we ask you to send out an introduction to the >> community; its nice for folks to get a sense for who everyone is. Also >> check that your new svn karma works by adding yourself to the list of >> contrib committers. >> >> Welcome Patrick! >> >> ryan >> > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer
Welcome aboard Patrick! Mike patrick o'leary wrote: Thanks Folks I'm in the business well over a decade now; Started my career in my country of origin in Ireland, and have since lived & worked in UK and the US. I've also traveled extensively establishing development groups in remote offices for my company in a few countries. I've worked in several areas, from global publishing services, CRM's / fulfillment systems, web server development, to technical operations and for the past number of years have made a home for myself in search and local search. My background has been in CS, math and physics. And despite the rumors my user name "pjaol" is actually an acronym of my full name, which is only ever used by my mother when I'm in trouble :-) It will be a pleasure to continue working with all of you, and thank you again for this honor. Thanks Patrick O'Leary On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote: The PMC is pleased to announce that Patrick O'Leary has been voted to be a a Lucene-Java Contrib committer. Patrick has contributed a great foundation for integrating spatial search with lucene. I look forward to future development in this area. Patrick - traditionally we ask you to send out an introduction to the community; its nice for folks to get a sense for who everyone is. Also check that your new svn karma works by adding yourself to the list of contrib committers. Welcome Patrick! ryan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer
Welcome Patrick! On Sat, Jan 17, 2009 at 1:22 AM, patrick o'leary wrote: > Thanks Folks > > I'm in the business well over a decade now; Started my career in my country > of origin in Ireland, and have since lived & worked in UK and the US. I've > also traveled extensively establishing development groups in remote offices > for my company > in a few countries. > > I've worked in several areas, from global publishing services, CRM's / > fulfillment systems, web server development, to technical operations and > for > the past number of years have made a home for myself in search and local > search. > > My background has been in CS, math and physics. > And despite the rumors my user name "pjaol" is actually an acronym of my > full name, which is only ever used > by my mother when I'm in trouble :-) > > It will be a pleasure to continue working with all of you, and thank you > again for this honor. > > Thanks > Patrick O'Leary > > > > > On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote: > > > > The PMC is pleased to announce that Patrick O'Leary has been voted to be > a > >> a Lucene-Java Contrib committer. > >> > >> Patrick has contributed a great foundation for integrating spatial > search > >> with lucene. I look forward to future development in this area. > >> > >> Patrick - traditionally we ask you to send out an introduction to the > >> community; its nice for folks to get a sense for who everyone is. Also > >> check that your new svn karma works by adding yourself to the list of > >> contrib committers. > >> > >> Welcome Patrick! > >> > >> ryan > >> > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Regards, Shalin Shekhar Mangar.
Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer
Welcome Patrick! +1 for LocalLucene. patrick o'leary wrote: Thanks Folks I'm in the business well over a decade now; Started my career in my country of origin in Ireland, and have since lived & worked in UK and the US. I've also traveled extensively establishing development groups in remote offices for my company in a few countries. I've worked in several areas, from global publishing services, CRM's / fulfillment systems, web server development, to technical operations and for the past number of years have made a home for myself in search and local search. My background has been in CS, math and physics. And despite the rumors my user name "pjaol" is actually an acronym of my full name, which is only ever used by my mother when I'm in trouble :-) It will be a pleasure to continue working with all of you, and thank you again for this honor. Thanks Patrick O'Leary On Jan 16, 2009, at 1:54 PM, Ryan McKinley wrote: The PMC is pleased to announce that Patrick O'Leary has been voted to be a a Lucene-Java Contrib committer. Patrick has contributed a great foundation for integrating spatial search with lucene. I look forward to future development in this area. Patrick - traditionally we ask you to send out an introduction to the community; its nice for folks to get a sense for who everyone is. Also check that your new svn karma works by adding yourself to the list of contrib committers. Welcome Patrick! ryan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Nightly source builds of Lucene ..
I am trying to access the nightly lucene builds here at - http://people.apache.org/builds/lucene/java/nightly/ . It does not seem to be available for sometime. Just curious if that is the right source to access the same. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Nightly source builds of Lucene ..
maybe try: http://hudson.zones.apache.org/hudson/view/Solr/job/Solr-trunk/ On Jan 16, 2009, at 4:47 PM, Kay Kay wrote: I am trying to access the nightly lucene builds here at - http://people.apache.org/builds/lucene/java/nightly/ . It does not seem to be available for sometime. Just curious if that is the right source to access the same. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Search Across All Fields
Hi Everyone I have two queries: Query 1 == (attachments:"beauty supply") AND sentdate:[d2008111701 TO d20090117235900] Query 2 == (priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty archivedate:beauty receiveddate:beauty from:beauty to:beauty subject:beauty cc:beauty bcc:beauty deliveredto:beauty flag:beauty sensitivity:beauty sender:beauty recipient:beauty body:beauty attachments:beauty attachname:beauty AND priority:supply attach:supply score:supply size:supply sentdate:supply archivedate:supply receiveddate:supply from:supply to:supply subject:supply cc:supply bcc:supply deliveredto:supply flag:supply sensitivity:supply sender:supply recipient:supply body:supply attachments:supply attachname:supply) AND sentdate:[d2008111701 TO d20090117235900] Query 1 returns 138 results, while Query 2 return 0 result. Any idea why? The second query is meant to offer the search across all fields, whereas the first query specifies one field. Is there a better way to conduct a search across all fields? Am I missing something? Thanks in advance for your help! Regards, Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Search Across All Fields
Hi, Inside (priority:beauty ..) there is an AND, is that operator what you want? Best regards, Lisheng -Original Message- From: Jamie [mailto:ja...@stimulussoft.com] Sent: Friday, January 16, 2009 3:02 PM To: java-user@lucene.apache.org Subject: Search Across All Fields Hi Everyone I have two queries: Query 1 == (attachments:"beauty supply") AND sentdate:[d2008111701 TO d20090117235900] Query 2 == (priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty archivedate:beauty receiveddate:beauty from:beauty to:beauty subject:beauty cc:beauty bcc:beauty deliveredto:beauty flag:beauty sensitivity:beauty sender:beauty recipient:beauty body:beauty attachments:beauty attachname:beauty AND priority:supply attach:supply score:supply size:supply sentdate:supply archivedate:supply receiveddate:supply from:supply to:supply subject:supply cc:supply bcc:supply deliveredto:supply flag:supply sensitivity:supply sender:supply recipient:supply body:supply attachments:supply attachname:supply) AND sentdate:[d2008111701 TO d20090117235900] Query 1 returns 138 results, while Query 2 return 0 result. Any idea why? The second query is meant to offer the search across all fields, whereas the first query specifies one field. Is there a better way to conduct a search across all fields? Am I missing something? Thanks in advance for your help! Regards, Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Words that need protection from stemming, i.e., protwords.txt
Hi. Any good protwords.txt out there? In a fairly standard solr analyzer chain, we use the English Porter analyzer like so: For most purposes the porter does just fine, but occasionally words come along that really don't work out to well, e.g., "maine" is stemmed to "main" - clearly goofing up precision about "Maine" without doing much good for variants of "main". So - I have an entry for my protwords.txt. What else should go in there? Thanks for your ideas, Dave Woodward - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Words that need protection from stemming, i.e., protwords.txt
Porter is a little outdated I've found KStem much better http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem You'll still need a good protected word list, but KStem is just a little nicer On Fri, Jan 16, 2009 at 6:20 PM, David Woodward wrote: > Hi. > > Any good protwords.txt out there? > > In a fairly standard solr analyzer chain, we use the English Porter > analyzer like so: > > > > For most purposes the porter does just fine, but occasionally words come > along that really don't work out to well, e.g., > > "maine" is stemmed to "main" - clearly goofing up precision about "Maine" > without doing much good for variants of "main". > > So - I have an entry for my protwords.txt. What else should go in there? > > Thanks for your ideas, > > Dave Woodward > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Search Across All Fields
I think you forgot a set of parentheses, a close paren right before the AND and an open paren right after AND Depending upon how big your index is, a MUCH easier way to do this is to index another field, call it all_text say, and add all your terms to that field as well as to the individual one, then search your all_text field instead Best Erick On Fri, Jan 16, 2009 at 6:02 PM, Jamie wrote: > Hi Everyone > > I have two queries: > > Query 1 > == > > (attachments:"beauty supply") AND sentdate:[d2008111701 TO > d20090117235900] > > Query 2 > == > > (priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty > archivedate:beauty receiveddate:beauty from:beauty to:beauty subject:beauty > cc:beauty bcc:beauty deliveredto:beauty flag:beauty sensitivity:beauty > sender:beauty recipient:beauty body:beauty attachments:beauty > attachname:beauty AND priority:supply attach:supply score:supply size:supply > sentdate:supply archivedate:supply receiveddate:supply from:supply to:supply > subject:supply cc:supply bcc:supply deliveredto:supply flag:supply > sensitivity:supply sender:supply recipient:supply body:supply > attachments:supply attachname:supply) AND sentdate:[d2008111701 TO > d20090117235900] > > Query 1 returns 138 results, while Query 2 return 0 result. Any idea why? > The second query is meant to offer the search across all fields, whereas the > first query specifies one field. Is there a better way to conduct a search > across all fields? Am I missing something? > > Thanks in advance for your help! > > Regards, > > Jamie > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
term offsets info seems to be wrong...
Hello, I'm writing a highlighter by using term offsets info (yes, I borrowed the idea of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info when getting multi-valued field. For example, if I indexed [" "," bbb "] (multi-valued), I got term info bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "] (note that using " aaa " instead of " "), I got term info bbb(6,9) which is unexpected. I would like to get same offset info for bbb because they are same length of field values. Please use the following program to see the problem I'm seeing. I'm using trunk: public static void main(String[] args) throws Exception { // create an index Directory dir = new RAMDirectory(); Analyzer analyzer = new WhitespaceAnalyzer(); IndexWriter writer = new IndexWriter( dir, analyzer, true, MaxFieldLength.LIMITED ); Document doc = new Document(); doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); //doc.add( new Field( "f", " ", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); writer.addDocument( doc ); writer.close(); // print the offsets IndexReader reader = IndexReader.open( dir ); TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector( 0, "f" ); for( int i = 0; i < tpv.getTerms().length; i++ ){ System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" ); TermVectorOffsetInfo[] tvois = tpv.getOffsets( i ); for( TermVectorOffsetInfo tvoi : tvois ){ System.out.println( "(" + tvoi.getStartOffset() + "," + tvoi.getEndOffset() + ")" ); } } reader.close(); } regards, Koji - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: term offsets info seems to be wrong...
Okay, Koji, hopefully I'll be more luckily suggesting this this time. Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am not sure if its in an applyable state, but I hope that covers your issue. On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi wrote: > Hello, > > I'm writing a highlighter by using term offsets info (yes, I borrowed > the idea > of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info > when getting multi-valued field. > > For example, if I indexed [" "," bbb "] (multi-valued), I got term info > bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "] > (note that using " aaa " instead of " "), I got term info bbb(6,9) > which > is unexpected. I would like to get same offset info for bbb because they > are same length of field values. > > Please use the following program to see the problem I'm seeing. I'm > using trunk: > > public static void main(String[] args) throws Exception { > // create an index > Directory dir = new RAMDirectory(); > Analyzer analyzer = new WhitespaceAnalyzer(); > IndexWriter writer = new IndexWriter( dir, analyzer, true, > MaxFieldLength.LIMITED ); > Document doc = new Document(); > doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED, > TermVector.WITH_OFFSETS ) ); > //doc.add( new Field( "f", " ", Store.YES, Index.ANALYZED, > TermVector.WITH_OFFSETS ) ); > doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED, > TermVector.WITH_OFFSETS ) ); > writer.addDocument( doc ); > writer.close(); > > // print the offsets > IndexReader reader = IndexReader.open( dir ); > TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector( > 0, "f" ); > for( int i = 0; i < tpv.getTerms().length; i++ ){ > System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" ); > TermVectorOffsetInfo[] tvois = tpv.getOffsets( i ); > for( TermVectorOffsetInfo tvoi : tvois ){ > System.out.println( "(" + tvoi.getStartOffset() + "," + > tvoi.getEndOffset() + ")" ); > } > } > reader.close(); > } > > regards, > > Koji > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: term offsets info seems to be wrong...
Mark, This is exactly what I want and It worked perfectly. Thanks! I'll post my highlighter to JIRA in a few days (hopegully). It uses term offsets with positions (WITH_POSITIONS_OFFSETS) to support PhraseQuery. Thanks again, Koji Mark Miller wrote: Okay, Koji, hopefully I'll be more luckily suggesting this this time. Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am not sure if its in an applyable state, but I hope that covers your issue. On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi wrote: Hello, I'm writing a highlighter by using term offsets info (yes, I borrowed the idea of LUCENE-644). In my highlighter, I'm seeing unexpected term offsets info when getting multi-valued field. For example, if I indexed [" "," bbb "] (multi-valued), I got term info bbb(7,10). This is expected result. But if I indexed [" aaa "," bbb "] (note that using " aaa " instead of " "), I got term info bbb(6,9) which is unexpected. I would like to get same offset info for bbb because they are same length of field values. Please use the following program to see the problem I'm seeing. I'm using trunk: public static void main(String[] args) throws Exception { // create an index Directory dir = new RAMDirectory(); Analyzer analyzer = new WhitespaceAnalyzer(); IndexWriter writer = new IndexWriter( dir, analyzer, true, MaxFieldLength.LIMITED ); Document doc = new Document(); doc.add( new Field( "f", " aaa ", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); //doc.add( new Field( "f", " ", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); doc.add( new Field( "f", " bbb ", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); writer.addDocument( doc ); writer.close(); // print the offsets IndexReader reader = IndexReader.open( dir ); TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector( 0, "f" ); for( int i = 0; i < tpv.getTerms().length; i++ ){ System.out.print( "term = \"" + tpv.getTerms()[i] + "\"" ); TermVectorOffsetInfo[] tvois = tpv.getOffsets( i ); for( TermVectorOffsetInfo tvoi : tvois ){ System.out.println( "(" + tvoi.getStartOffset() + "," + tvoi.getEndOffset() + ")" ); } } reader.close(); } regards, Koji - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Search Across All Fields
Hi Erick Thanks for the pointer. I dont know how I missed that. Our index sizes are absolutely huge so its not really practical in putting an all_text field. It would great if you could introduce a macro or something that one could use to specify all fields. Thanks anyway! Jamie Erick Erickson wrote: I think you forgot a set of parentheses, a close paren right before the AND and an open paren right after AND Depending upon how big your index is, a MUCH easier way to do this is to index another field, call it all_text say, and add all your terms to that field as well as to the individual one, then search your all_text field instead Best Erick - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org