Re: Lucene Error : java.io.FileNotFoundException
It looks like under JBoss you are accidentally using Lucene 1.4, not 2.3.2. Mike yugana wrote: Hi, I am indexing content and searching using lucene. It is working fine when I use the simple servlet and jsp mechanism. I am able to search on the indexed content. I tried to implement the same using JBoss Portal. When I try to run the search, I get the below error: Please help me to resolve the error. I am using Lucene 2.3.2 09:43:42,671 ERROR [STDERR] java.io.FileNotFoundException: D:\indexDir\segments (The system cannot find the file specified) 09:43:42,671 ERROR [STDERR] at java.io.RandomAccessFile.open(Native Method) 09:43:42,671 ERROR [STDERR] at java.io.RandomAccessFile.(RandomAccessFile.java:212) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.store.FSInputStream $Descriptor.(FSDirectory.java:376) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.store.FSInputStream.(FSDirectory.java:405) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:40) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:116) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.store.Lock$With.run(Lock.java:109) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.index.IndexReader.open(IndexReader.java:111) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.index.IndexReader.open(IndexReader.java:95) 09:43:42,671 ERROR [STDERR] at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:38) 09:43:42,671 ERROR [STDERR] at com.xerox.mywebboard.search.SearchManager.search(SearchManager.java: 53) 09:43:42,671 ERROR [STDERR] at com .xerox .mywebboard .xeroxArticleSearchPortlet.search(xeroxArticleSearchPortlet.java:45) 09:43:42,671 ERROR [STDERR] at com .xerox .mywebboard .xeroxArticleSearchPortlet .processAction(xeroxArticleSearchPortlet.java:27) 09:43:42,671 ERROR [STDERR] at org .jboss .portal .portlet .impl .jsr168.PortletContainerImpl.invokeAction(PortletContainerImpl.java 09:43:42,687 ERROR [STDERR] at org .jboss .portal .portlet .impl.jsr168.PortletContainerImpl.dispatch(PortletContainerImpl.java: 401 09:43:42,687 ERROR [STDERR] at org.jboss.portal.portlet.container.PortletContainerInvoker $1.invoke(PortletContainerInvoker.java 09:43:42,687 ERROR [STDERR] at org .jboss .portal.common.invocation.Invocation.invokeNext(Invocation.java:131) 09:43:42,687 ERROR [STDERR] at org.jboss.portal.core.aspects.portlet.TransactionInterceptor.org $jboss$portal$core$aspects$portl 09:43:42,687 ERROR [STDERR] at org.jboss.portal.core.aspects.portlet.TransactionInterceptor $invokeNotSupported_N454727078796479 09:43:42,687 ERROR [STDERR] at org.jboss.aspects.tx.TxPolicy.invokeInNoTx(TxPolicy.java:66) 09:43:42,687 ERROR [STDERR] at org.jboss.aspects.tx.TxInterceptor $NotSupported.invoke(TxInterceptor.java:112) 09:43:42,687 ERROR [STDERR] at org.jboss.portal.core.aspects.portlet.TransactionInterceptor $invokeNotSupported_N454727078796479 09:43:42,687 ERROR [STDERR] at org.jboss.aspects.tx.TxPolicy.invokeInNoTx(TxPolicy.java:66) 09:43:42,687 ERROR [STDERR] at org.jboss.aspects.tx.TxInterceptor $NotSupported.invoke(TxInterceptor.java:102) 09:43:42,687 ERROR [STDERR] at org.jboss.portal.core.aspects.portlet.TransactionInterceptor $invokeNotSupported_N454727078796479 09:43:42,687 ERROR [STDERR] at org .jboss .portal .core .aspects .portlet.TransactionInterceptor.invokeNotSupported(TransactionInter 09:43:42,687 ERROR [STDERR] at org .jboss .portal .core .aspects .portlet.TransactionInterceptor.invoke(TransactionInterceptor.java: 09:43:42,687 ERROR [STDERR] at org .jboss .portal .portlet .invocation.PortletInterceptor.invoke(PortletInterceptor.java:38) 09:43:42,687 ERROR [STDERR] at org .jboss .portal.common.invocation.Invocation.invokeNext(Invocation.java:115) 09:43:42,687 ERROR [STDERR] at org .jboss .portal .core .aspects.portlet.HeaderInterceptor.invoke(HeaderInterceptor.java:50) 09:43:42,687 ERROR [STDERR] at org .jboss .portal .portlet .invocation.PortletInterceptor.invoke(PortletInterceptor.java:38) 09:43:42,687 ERROR [STDERR] at org .jboss .portal.common.invocation.Invocation.invokeNext(Invocation.java:115) 09:43:42,687 ERROR [STDERR] at org .jboss .portal .portlet .aspects .portlet.ProducerCacheInterceptor.invoke(ProducerCacheIntercepto 09:43:42,687 ERROR [STDERR] at org .jboss .portal .portlet .invocation.PortletInterceptor.invoke(PortletInterceptor.java:38) 09:43:42,687 ERROR [STDERR] at org .jboss .portal.common.invocation.Invocation.invokeNext(Invocation.java:115) 09:43:42,687 ERROR [STDERR] at org .jboss .portal .core.aspects.portlet.AjaxInterceptor.invoke(AjaxInterceptor.java:51) 09:43:42,687 ERROR [STDERR] at
Store/Index Email Address in Lucene
Hi there, I want to index email address in such a way that i can do WildCard, Phrase and Simple search on those items. for each document i will have email addresses string just like in the case of CC and TO in mails. for eg "[EMAIL PROTECTED]; [EMAIL PROTECTED]; john hopkings; [EMAIL PROTECTED]" Now what is the best way to store them so that i can do various type of search on them. Do i need the split the email address first and further split the single email address as well and store them in multiple fields? What is the best way to deal such case? Your help is highly anticipated Thank You miztaken -- View this message in context: http://www.nabble.com/Store-Index-Email-Address-in-Lucene-tp18257247p18257247.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search question (newbie)
Hi, Can someone point me in the right direction please? How can I trap this situation correctly? I receive user queries like this (quotes included): /from:"fred flintston*"/ Which produces a query string of /+from:fred body:flintston/ (where /body/ is the default field) What I want is: /+from:fred +from:flintston*/ In other words, I want quoted expressions to be treated as single units.. Thanks for any pointers, - Chris
Enhancing phrase searching in Lucene
Hi. I've just finished my master thesis regarding how to enhance overall phrase searching in search engines nowadays. The focus in the thesis is to experiment with a new approach, whereas I've focused on pair of words (bigrams). The thesis can be freely downloaded here [1]. What I've specifically experimented with is bigrams based on stopwords and their characteristics. In this experiment there is created an Analyzer which create bigram Tokens compounded of pair of words. First we have a predefined list of stopwords, and then we analyze each token in the Analyze. Given that a stopword token is identified, then we create two new bigram tokens: 1) previouse token + stopword token 2) stopword token + next token The identified stopword token is discarded, as it pose a huge posting list in the inverted index. The overall main goal is to drastically reduce the posting lists lengths, and thereby save I/O and processing made by Apache Lucene. Based on the experiments performed, this new phrase searching approach in Lucene introduce some performance gains. The code which was created in the experiment will be made available shortly. I just need to make some Javadoc, and prettify some. There is nothing revolutionary in the code, as I've noticed by this maillist that others have also been into this subject. Hope someone finds some of the aspects discussed in my master thesis useful. I've also, into some extend, tried to describe Apache Lucene and how it works. [1] http://asbjorn.fellinghaug.com/filer/master/Master_thesis.pdf -- Asbjørn A. Fellinghaug [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Term Frequency for more complex terms
I have a quick question, could someone point me towards where in the API I'll have to investigate in order to figure out the term frequencies of more complex terms? For example I want to know the tf of "kit ligand" treated as a phrase. I see that luke has access to this information in its explain method, but the api call is currently eluding me. Thanks, Matt -- Matthew Hall Software Engineer Mouse Genome Informatics [EMAIL PROTECTED] (207) 288-6012 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Memory Usage
Hello All, I have something that's not exactly causing me a major problem, but I would appreciate help in understanding the behaviour here: I have an internet message board, and I soon hope to revamp the code to be using Lucene for searching the threads and posts, as it's far better than the database's fulltext capability. However, one of the sort of things I want to be able to do is for a user to be able to request a list of posts, written by user x, ordered by the newest first (and it's this sorting of the items by date that is the issue here). To do this, I have a timestamp in the index, along with each post, user etc. I find that if I use the Java SimpleDateFormat class to encode the timestamp like this: yyMMdd (let's not worry about the year 2100 problem for now!), then I can measure the index cache (which is fully loaded, since I need to sort the results) as taking somewhere in the region of 30M of memory. Now, I noticed that obviously if I index like the above, I won't get the correct sort order for several posts having been posted on the same day, so I changed it to index yyMMddHHmmss to index down to the second, rather than just the day. I didn't pay much attention to memory usage until I started getting out of heap space errors... When I looked into the usage I found: (there are around 6,000,000 posts on the message board database) Date encoded as yyMMdd: appears to be using around 30M Date encoded as yyMMddHHmmss: appears to be using more than 400M! I guess I would have understood if I was seeing the usage double for sure, or even a little more; no idea how you guys encode the indexes, if at all, but it's gone up over tenfold, which I can't explain. For now, I have just moved it back to do it on a per day basis, as it's not a huge deal, but can anyone help with this? Is there something I might be doing wrong? That's all I changed between the two runs, and it certainly seems to be repeatable. I tried upgrading from the previous version of Lucene to the latest one, but no difference. Many thanks, Keith. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search question (newbie)
Chris, I've had similar requirements in the past. First strip the quotes then create a BooleanQuery consisting of two separate queries. 1. TermQuery for the first term - Fred 2. PrefixQuery for the second term - Flintstone When you add each individual query to the BooleanQuery make sure the BooleanClause.Occur parameter is set to MUST (look at the BooleanQuery API docs). Use the toString() method on the BooleanQuery after it's created to make sure you did it correctly. John G. -Original Message- From: Chris Bamford [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 7:39 AM To: java-user@lucene.apache.org Subject: Search question (newbie) Hi, Can someone point me in the right direction please? How can I trap this situation correctly? I receive user queries like this (quotes included): /from:"fred flintston*"/ Which produces a query string of /+from:fred body:flintston/ (where /body/ is the default field) What I want is: /+from:fred +from:flintston*/ In other words, I want quoted expressions to be treated as single units.. Thanks for any pointers, - Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Term Frequency for more complex terms
Matthew, I not totally sure what you are asking but if it's 'where do I call the explain method from?' it looks like you want to call it from the IndexSearcher class. Look at the API docs for Searcher (the IndexSearcher's superclass). John G. P.S. If that's not it, look for explain in the API docs by clicking on Index at the top of the docs. They're all there. -Original Message- From: Matthew Hall [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 10:20 AM To: lucene Subject: Term Frequency for more complex terms I have a quick question, could someone point me towards where in the API I'll have to investigate in order to figure out the term frequencies of more complex terms? For example I want to know the tf of "kit ligand" treated as a phrase. I see that luke has access to this information in its explain method, but the api call is currently eluding me. Thanks, Matt -- Matthew Hall Software Engineer Mouse Genome Informatics [EMAIL PROTECTED] (207) 288-6012 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Store/Index Email Address in Lucene
Miz, The StandardAnalyzer recognizes email addresses as is. That is, it pays attention to the '@' symbol. Just store an email address in a field and search them normally. This assumes you are going to store the different emails in separate fields. There is an alternative strategy if you need it. Create a string consisting of all the emails separated by whitespace. Make sure the field is tokenized and then you only have to search one field for any of the emails. Your call. John G. -Original Message- From: miztaken [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 5:31 AM To: java-user@lucene.apache.org Subject: Store/Index Email Address in Lucene Hi there, I want to index email address in such a way that i can do WildCard, Phrase and Simple search on those items. for each document i will have email addresses string just like in the case of CC and TO in mails. for eg "[EMAIL PROTECTED]; [EMAIL PROTECTED]; john hopkings; [EMAIL PROTECTED]" Now what is the best way to store them so that i can do various type of search on them. Do i need the split the email address first and further split the single email address as well and store them in multiple fields? What is the best way to deal such case? Your help is highly anticipated Thank You miztaken -- View this message in context: http://www.nabble.com/Store-Index-Email-Address-in-Lucene-tp18257247p1825724 7.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory Usage
(there are around 6,000,000 posts on the message board database) Date encoded as yyMMdd: appears to be using around 30M Date encoded as yyMMddHHmmss: appears to be using more than 400M! I guess I would have understood if I was seeing the usage double for sure, or even a little more; no idea how you guys encode the indexes, if at all, but it's gone up over tenfold, which I can't explain. Sort memory cost is based on the total # of unique terms for the given field (multiplied by the number of locale's involved if you have to do that too! but in temporal sorting you don't). This is easier than you think, just use 2 fields (date, time) and sort by both. This means the Date field's unique term count grows only 1 term per day. The Time field can be set to minutes (if you can get away with that) meaning that you only have fairly insignificant total term count for the time field. We use this at Aconex, and have indexes with millions of records (weekly 'work' searcher refreshed every 5 seconds, archive searcher is held in memory, with a Multisearcher done over the 2) and it works a treat. We regularly need to return million+ results from a search (don't ask) using this sort of sorting and the overall search time is only a few seconds. On a related note, work hard not to need to use Locale sensitive sorting if you can for any other fields, for large results the CPU penalty is horrific (even once you get past the synchronization bottleneck in the CollationKey stuff). cheers, Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Error : java.io.FileNotFoundException
I have checked all the jars and tried replacing with the same versions. Still I get the same error. Please let me know what else to check. yug Michael McCandless-2 wrote: > > > It looks like under JBoss you are accidentally using Lucene 1.4, not > 2.3.2. > > Mike > > yugana wrote: > >> >> Hi, >> >> I am indexing content and searching using lucene. It is working fine >> when I >> use the simple servlet and jsp mechanism. I am able to search on the >> indexed >> content. I tried to implement the same using JBoss Portal. When I >> try to run >> the search, I get the below error: Please help me to resolve the >> error. I am >> using Lucene 2.3.2 >> >> 09:43:42,671 ERROR [STDERR] java.io.FileNotFoundException: >> D:\indexDir\segments (The system cannot find the file specified) >> 09:43:42,671 ERROR [STDERR] at >> java.io.RandomAccessFile.open(Native >> Method) >> 09:43:42,671 ERROR [STDERR] at >> java.io.RandomAccessFile.(RandomAccessFile.java:212) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.store.FSInputStream >> $Descriptor.(FSDirectory.java:376) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.store.FSInputStream.(FSDirectory.java:405) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:40) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:116) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.store.Lock$With.run(Lock.java:109) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.index.IndexReader.open(IndexReader.java:111) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.index.IndexReader.open(IndexReader.java:95) >> 09:43:42,671 ERROR [STDERR] at >> org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:38) >> 09:43:42,671 ERROR [STDERR] at >> com.xerox.mywebboard.search.SearchManager.search(SearchManager.java: >> 53) >> 09:43:42,671 ERROR [STDERR] at >> com >> .xerox >> .mywebboard >> .xeroxArticleSearchPortlet.search(xeroxArticleSearchPortlet.java:45) >> 09:43:42,671 ERROR [STDERR] at >> com >> .xerox >> .mywebboard >> .xeroxArticleSearchPortlet >> .processAction(xeroxArticleSearchPortlet.java:27) >> 09:43:42,671 ERROR [STDERR] at >> org >> .jboss >> .portal >> .portlet >> .impl >> .jsr168.PortletContainerImpl.invokeAction(PortletContainerImpl.java >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal >> .portlet >> .impl.jsr168.PortletContainerImpl.dispatch(PortletContainerImpl.java: >> 401 >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.portal.portlet.container.PortletContainerInvoker >> $1.invoke(PortletContainerInvoker.java >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal.common.invocation.Invocation.invokeNext(Invocation.java:131) >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.portal.core.aspects.portlet.TransactionInterceptor.org >> $jboss$portal$core$aspects$portl >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.portal.core.aspects.portlet.TransactionInterceptor >> $invokeNotSupported_N454727078796479 >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.aspects.tx.TxPolicy.invokeInNoTx(TxPolicy.java:66) >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.aspects.tx.TxInterceptor >> $NotSupported.invoke(TxInterceptor.java:112) >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.portal.core.aspects.portlet.TransactionInterceptor >> $invokeNotSupported_N454727078796479 >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.aspects.tx.TxPolicy.invokeInNoTx(TxPolicy.java:66) >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.aspects.tx.TxInterceptor >> $NotSupported.invoke(TxInterceptor.java:102) >> 09:43:42,687 ERROR [STDERR] at >> org.jboss.portal.core.aspects.portlet.TransactionInterceptor >> $invokeNotSupported_N454727078796479 >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal >> .core >> .aspects >> .portlet.TransactionInterceptor.invokeNotSupported(TransactionInter >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal >> .core >> .aspects >> .portlet.TransactionInterceptor.invoke(TransactionInterceptor.java: >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal >> .portlet >> .invocation.PortletInterceptor.invoke(PortletInterceptor.java:38) >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal.common.invocation.Invocation.invokeNext(Invocation.java:115) >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal >> .core >> .aspects.portlet.HeaderInterceptor.invoke(HeaderInterceptor.java:50) >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal >> .portlet >> .invocation.PortletInterceptor.invoke(PortletInterceptor.java:38) >> 09:43:42,687 ERROR [STDERR] at >> org >> .jboss >> .portal.common.invocation.Invocation.invokeNext(Invoca
Re: Store/Index Email Address in Lucene
Hi miztaken Check out: http://openmailarchiva.svn.sourceforge.net/viewvc/openmailarchiva/Server/trunk/src/com/stimulus/archiva/search/EmailFilter.java?view=markup I think its what you want. I want to index email address in such a way that i can do WildCard, Phrase and Simple search on those items. for each document i will have email addresses string just like in the case of CC and TO in mails. for eg "[EMAIL PROTECTED]; [EMAIL PROTECTED]; john hopkings; [EMAIL PROTECTED]" Now what is the best way to store them so that i can do various type of search on them. Do i need the split the email address first and further split the single email address as well and store them in multiple fields? What is the best way to deal such case? Regards, Jamie -- Stimulus Software - MailArchiva Email Archiving And Compliance - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
too many clauses exception
Hi, I am stuck with one more exception. When i am using a wild card such as a* i am getting too many clauses exception. It saying maximum clause count is set to 1024. Is there any way to increase this count. Can u please help me out in overcoming this. Thanks in advance. -Gaurav - -Gaurav -- View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
too many clauses exception
Hi, I am stuck with an exception in lucene (too many clauses). When i am using a wild card such as a* i am getting too many clauses exception. It saying maximum clause count is set to 1024. Is there any way to increase this count. Can u please help me out in overcoming this. Thanks in advance. -Gaurav - -Gaurav -- View this message in context: http://www.nabble.com/too-many-clauses-exception-tp18273582p18273582.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many clauses exception
This is easy, use: BooleanQuery.setMaxClauseCount(4096); -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Thu, Jul 3, 2008 at 11:23 PM, Gaurav Sharma <[EMAIL PROTECTED]> wrote: > > > Hi, > > I am stuck with one more exception. > When i am using a wild card such as a* i am getting too many clauses > exception. It saying maximum clause count is set to 1024. Is there any way > to increase this count. > Can u please help me out in overcoming this. > > Thanks in advance. > -Gaurav > > > > - > -Gaurav > -- > View this message in context: > http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Multifield Search with OR and AND on different doc Fields
My requirement is to search on SEVEN Fields say F1,F2,F3,F4,F5,F6,F7 having F1,F2,F3,F4 on one doc index and F5,F6,F7 on a different doc index I need to perform a search with ((F1=9 AND F2=4) AND (F3=keyword OR F4=keyword)) OR (F5=9 AND F6=4 AND F7=keyword) For normal search I was doing like this: String[] sFields = { ID1, ID2, TITLE, CONTENT }; String[] sQuery = { id1, id2, sKeyword, sKeyword }; Occur[] flag = { BooleanClause.Occur.MUST, BooleanClause.Occur.MUST, BooleanClause.Occur.MUST, BooleanClause.Occur.MUST }; Query oQuery = oMultiParser.parse(sQuery, sFields, flag, oAnalyzer) ; Hits hits = indexSearcher.search(oQuery); How can I modify the above query in such a way that it has to search on different doc Indexes? -- View this message in context: http://www.nabble.com/Multifield-Search-with-OR-and-AND-on-different-doc-Fields-tp18273644p18273644.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Store/Index Email Address in Lucene
Hi there, Thanks for the comment. So basically it will be lame to add new field for each email address, wont it? How about getting unique tokens from string of email addresses using EmailFilter.java class and storing it in as a single field ? Jamie-52 wrote: > > Hi miztaken > > Check out: > > http://openmailarchiva.svn.sourceforge.net/viewvc/openmailarchiva/Server/trunk/src/com/stimulus/archiva/search/EmailFilter.java?view=markup > > I think its what you want. >> I want to index email address in such a way that i can do WildCard, >> Phrase >> and Simple search on those items. >> >> for each document i will have email addresses string just like in the >> case >> of CC and TO in mails. >> for eg "[EMAIL PROTECTED]; [EMAIL PROTECTED]; john hopkings; [EMAIL >> PROTECTED]" >> >> Now what is the best way to store them so that i can do various type of >> search on them. >> >> Do i need the split the email address first and further split the single >> email address as well and store them in multiple fields? >> >> What is the best way to deal such case? >> > > Regards, > > Jamie > > -- > Stimulus Software - MailArchiva > Email Archiving And Compliance > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Store-Index-Email-Address-in-Lucene-tp18257247p18273786.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory Usage
Thanks very much for this; I'll give it a shot. Keith. On 4 Jul 2008, at 00:02, Paul Smith wrote: (there are around 6,000,000 posts on the message board database) Date encoded as yyMMdd: appears to be using around 30M Date encoded as yyMMddHHmmss: appears to be using more than 400M! I guess I would have understood if I was seeing the usage double for sure, or even a little more; no idea how you guys encode the indexes, if at all, but it's gone up over tenfold, which I can't explain. Sort memory cost is based on the total # of unique terms for the given field (multiplied by the number of locale's involved if you have to do that too! but in temporal sorting you don't). This is easier than you think, just use 2 fields (date, time) and sort by both. This means the Date field's unique term count grows only 1 term per day. The Time field can be set to minutes (if you can get away with that) meaning that you only have fairly insignificant total term count for the time field. We use this at Aconex, and have indexes with millions of records (weekly 'work' searcher refreshed every 5 seconds, archive searcher is held in memory, with a Multisearcher done over the 2) and it works a treat. We regularly need to return million+ results from a search (don't ask) using this sort of sorting and the overall search time is only a few seconds. On a related note, work hard not to need to use Locale sensitive sorting if you can for any other fields, for large results the CPU penalty is horrific (even once you get past the synchronization bottleneck in the CollationKey stuff). cheers, Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]