Re: How to search special characters in LUcene

2009-04-21 Thread uday kumar maddigatla
Hi Thanks for your reply. I'm able to see the DutchAnalyzer. When i'm indexing my documents i given instace of DutchAnalyzer as an argument to IndexWriter Class. After this when i search for the http://www.nabble.com/file/p23170710/IndexFiles.java IndexFiles.java contains the danish elements

Re: Why is CustomScoreQuery limited to ValueSourceQuery type?

2009-04-21 Thread Steven Bethard
On 4/21/2009 10:09 AM, Doron Cohen wrote: > It could, but (historically and) currently it doesn't... :) > I actually have code for this. > Would you like open a JIRA issue for this - I'll attach my wrapper there? Done. https://issues.apache.org/jira/browse/LUCENE-1608 Steve > On Tue, Apr 21, 20

Getting Top n term for a given field for a given time period

2009-04-21 Thread Preetham Kajekar
Hi, I have a lucene index which has 20 mil documents. Each document has a timestamp field and a source field. I am interested in finding the top n sources for a given hour (based on the timestamp). I know we can get the top n sources fields easily using the IndexReader API, but was wondering

Re: Appropriate analyzer

2009-04-21 Thread AlexElba
try to use RegexQuery Artyom Sokolov wrote: > > Hello. > > Currently I'm trying to find something like an analyzer to solve the > problem. > > Actually, what I need is next: search on a query string step-by-step, > trimming last char on each step. Small example: > > In index we've: abc, ab

Re: semi-infinite loop during merging

2009-04-21 Thread Christiaan Fluit
Christiaan Fluit wrote: It seems that it gets up to the point to commit, but the "IW: commitMerge done" message is never reached. Furthermore, no exceptions are printed to the output, so handleMergeException does not seem to have been invoked. Should I add more debug statements elsewhere?

Re: readModifiedUTF8String stuck

2009-04-21 Thread MakMak
Mike, Ran CheckIndex. This is what it prints out: cantOpenSegments: false numBadSegments: 0 numSegments: 14 segmentFormat: FORMAT_HAS_PROX [Lucene 2.4] segmentsFileName: segments_2od totLoseDocCount: 0 clean: true toolOutOfDate: false So I guess everything is fine ..! Our application is

Re: semi-infinite loop during merging

2009-04-21 Thread Christiaan Fluit
Michael McCandless wrote: One question: are you using IndexWriter.close(false)? I wonder if there's some path whereby the merges fail to abort (and simply keep retrying) if you do that... No, I don't. More inlined below... On Thu, Apr 16, 2009 at 5:42 AM, Christiaan Fluit wrote: I spent a

Re: MergeException

2009-04-21 Thread Christiaan Fluit
Michael McCandless wrote: On Tue, Apr 21, 2009 at 4:26 PM, Christiaan Fluit wrote: I have experienced similar problems (see the "semi-infinite loop during merging" thread - still working out the problem): the merger gets into an infinite loop and causes my drive to be filled with temporary file

Appropriate analyzer

2009-04-21 Thread Artyom Sokolov
Hello. Currently I'm trying to find something like an analyzer to solve the problem. Actually, what I need is next: search on a query string step-by-step, trimming last char on each step. Small example: In index we've: abc, abcdef, xyz When search on abcdefgh the most relevant result should be a

Re: readModifiedUTF8String stuck

2009-04-21 Thread Michael McCandless
On Mon, Apr 20, 2009 at 6:07 PM, MakMak wrote: >   I made a standalone tool like you suggested which prints out the size of > each doc in the index, none of the docs are more than 1MB !!! The queries > are the same. They repeat throughout the test. We give about 6GB of heap to > the application a

Re: MergeException

2009-04-21 Thread Michael McCandless
On Tue, Apr 21, 2009 at 4:26 PM, Christiaan Fluit wrote: > I have experienced similar problems (see the "semi-infinite loop during > merging" thread - still working out the problem): the merger gets into an > infinite loop and causes my drive to be filled with temporary files that are > not delete

Re: MergeException

2009-04-21 Thread Michael McCandless
Are you opening the index with Luke while indexing is still running? Also, I don't understand why Luke would be causing any merging to occur. Or: did you ask Luke to optimize your index? Mike On Tue, Apr 21, 2009 at 11:01 AM, Martine Woudstra wrote: > Hi all, > > I'm using Lucene 2.4.1. for bu

Re: MergeException

2009-04-21 Thread Christiaan Fluit
I have experienced similar problems (see the "semi-infinite loop during merging" thread - still working out the problem): the merger gets into an infinite loop and causes my drive to be filled with temporary files that are not deleted, until it runs out of space. Sometimes it exits with a Merge

Re: Servlets Sharing Resources

2009-04-21 Thread Paul Libbrecht
Various servlets or various webapps? Various servlets is trivial, indeed using ServletContext.getAttribute(). Various webapps is more difficult: - you need to set cross context so that context.getContext("/ otherpath") is accessible (a config of context in tomcat) - you need classes to be shared

Share Index on NFS

2009-04-21 Thread Harini Raghavan
Hi Everyone, We are planning to distribute searches on the index and have a single indexing node. We want to mount the index on NFS so that it can be shared by the indexer and searcher nodes. To optimize several of our search workflows, we are caching the IndexSearcher and refreshing it every h

RE: Servlets Sharing Resources

2009-04-21 Thread David Seltzer
> But honestly, you'll just spend the next few hours googling, pulling out > hair, buying a book on jboss > and then curse it, and do it this way in the end.. Spoken like a man who's been there. Hehe... Who hasn't... I'm going to try storing a persistent manager class in the ServletContext so I c

Re: Servlets Sharing Resources

2009-04-21 Thread patrick o'leary
Not every servlet container will support the same cross context methodology. Most would say you're approach is an EJB with a life cycle outside of the interaction layer. But honestly, you'll just spend the next few hours googling, pulling out hair, buying a book on jboss and then curse it, and do

Re: Why is CustomScoreQuery limited to ValueSourceQuery type?

2009-04-21 Thread Doron Cohen
It could, but (historically and) currently it doesn't... :) I actually have code for this. Would you like open a JIRA issue for this - I'll attach my wrapper there? Doron On Tue, Apr 21, 2009 at 7:58 PM, Steven Bethard wrote: > On 4/21/2009 12:47 AM, Doron Cohen wrote: > > CustomScoreQuery expec

Re: Why is CustomScoreQuery limited to ValueSourceQuery type?

2009-04-21 Thread Steven Bethard
On 4/21/2009 12:47 AM, Doron Cohen wrote: > CustomScoreQuery expects the VSQs to have a score for document matching the > (main) subQuery - this does not hold for arbitrary queries. Sure, but it could easily assign 0.0 scores for sub-queries that didn't match, no? Steve > On Sat, Apr 18, 2009 at

RE: Servlets Sharing Resources

2009-04-21 Thread David Seltzer
That certainly seems like the simple way to solve the problem. I was just wondering if I was overlooking a simple way to do this via web.xml servlet-mapping. I was trying to avoid having everything hit the same doGet(). -Original Message- From: patrick o'leary [mailto:pj...@pjaol.com] Sen

Re: Servlets Sharing Resources

2009-04-21 Thread patrick o'leary
Why not have 1 servlet and based on a parameter / url, serve 2 different outputs? if(request.getString("asXML") !=null) showXML(); else showOtherStuff(); Save yourself the hassle of dealing with jndi / contexts / spring or SingleTons On Tue, Apr 21, 2009 at 12:01 PM, David Seltzer wrote:

Re: Servlets Sharing Resources

2009-04-21 Thread Mindaugas Žakšauskas
Hi, Generally speaking, yes - this is the most straightforward way of storing application-bound data. Somewhat related explanation available here: http://www.coderanch.com/t/358143/Servlets/java/servlet-context-vs-session Regards, Mindaugas On Tue, Apr 21, 2009 at 5:23 PM, David Seltzer wrote:

Re: Servlets Sharing Resources

2009-04-21 Thread mark harwood
Spring is pretty useful for managing and sharing resources - see what looks like a related example here: http://croarkin.blogspot.com/2008/05/injecting-spring-bean-into-servlet.html Cheers, Mark - Original Message From: David Seltzer To: java-user@lucene.apache.org Sent: Tuesday,

RE: Servlets Sharing Resources

2009-04-21 Thread David Seltzer
Thanks Minduagas, So in Tomcat, is there a way to store a variable outside an individual Servlet in the ServletContext? The API shows ServletContext.setAttribute and ServeletContext.getAttribtue. Would that be a way to make an object application-bound? -Dave -Original Message- From: M

Re: Servlets Sharing Resources

2009-04-21 Thread Mindaugas Žakšauskas
Hi, Servlets are stateless and they must extend javax.servlet.http.HttpServlet, therefore I'm afraid the idea of manager class is probably unrealistic. The stuff you want to achieve normally works by either placing objects into the HTTP session (user-bound) or attaching them to your application c

Servlets Sharing Resources

2009-04-21 Thread David Seltzer
Hi All, Sorry for the slightly off-topic question, but I've just run into a gap in my understanding of Servlet programming. The question: Is it possible for two servlets to share access to an instance of IndexSearcher or an IndexReader? I'm thinking about setting up a Search servlet to provide XM

MergeException

2009-04-21 Thread Martine Woudstra
Hi all, I'm using Lucene 2.4.1. for building an ngram index. Indexing works well until I try to open the index built so far with Luke. A MergeException is thrown, see below. Opening an index with Luke during indexing never caused problems with Lucene 2.3. Anyone familiar with this problem? Thanks

Re: changing term freq in indexing time

2009-04-21 Thread Eran Sevi
Hi, You might want to take a look at Payloads. If you know the frequency of the words in each world in advance than during tokenization for each world you could save the frequency as the payload. During searches you could use BoostingTermQuery to take the frequency into account. Eran. On Tue, Ap

Re: How to search special characters in LUcene

2009-04-21 Thread Erick Erickson
Take a look at DutchAnalyzer. The problem you'll have is if you're indexing this document along with a bunch of documents from other languages. You could search the mail archive for extensive discussions of indexing/ searching documents from several languages. Best Erick On Tue, Apr 21, 2009 at

Re: Proximity and Percentage match search in Lucene

2009-04-21 Thread Rads2029
Hi all, does anybody have a solution to the below query? regards, radha Rads2029 wrote: > > What I need is the following : > If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd). > > Given the following : > I should get a hit even if all of the search tokens aren't pres

RE: Faceting, Sort and DocIDSet

2009-04-21 Thread David Seltzer
Karsten, You're right, 300 facets would be a lot. Hehe. I have one facet with about three hundred potential values. What I've done is create an FacetManager who, in another thread, sets up an map of ~300 OpenBitSets. One bitset for each possible value of the facet. Then, rather than using an iter

Re: changing term freq in indexing time

2009-04-21 Thread liat oren
Hi Doron, Thank you very much for the elaborated answer! About the Synonyms, I can't use Wordnet as I have my own list of synonyms. I will look at contrib/memory and see what it does. You understood correctly the process of using the inverse doc. About the two problems you mentioned: scalability

RE: IndexWriter update method

2009-04-21 Thread Newman, Billy
Yeah I was hoping to change the code to use the update method after I upgraded from 1.4.3 but doesn't look feasible. I will just continue to find the doc and delete it, then re-insert it. Thanks for all the help guys! -Original Message- From: Doron Cohen [mailto:cdor...@gmail.com] Se

Re: changing term freq in indexing time

2009-04-21 Thread Doron Cohen
Hi Liat, there are two packages under Lucene's contrib that deals with Synonyms - that is contrib/memory and contrib/wordnet - which you may find useful. I never used these two but they seem relevant to what you are trying to achieve. Anyhow, it seems you compute the synonyms for word w are those

Re: ebook resources - including lucene in action

2009-04-21 Thread Anshum
All that is said is so right. Moreover, though I doubt, in case someone can not really shell out that much money, they could have asked Otis/Erik or the list to provide with some sort of a discount code for the same. Each cent paid for the book is very well deserved. There should be sharing of such

RE: ebook resources - including lucene in action

2009-04-21 Thread Lukas, Ray
Erik is right!! We should bane together and bring legal action against these dirtballs.. Do you like it when someone steals your work, takes credit for it, and turns a profit off of it. More than giving their lives to write this content, they are also contributors to the very software that we use,

Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2009-04-21 Thread Thomas Pönitz
Hi, I have the same problem as discussed here: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200511.mbox/%3c200511021310.18686...@last.fm%3e I want to specify termvectors directly instead of constructing a dummy string like "a a a b b c" that will be transformed to a[3] b[2] c[1].

Re: changing term freq in indexing time

2009-04-21 Thread liat oren
Ok, I will explain the full 'problem' and then explain how I approach it: Lets divide it into three steps: 1. I have a 'dictionary' of words - for every word, I have a list of worlds, which are ids of text documents that the word appears in. So, for example, for the word 'dog', I have '1 1600 360

Re: changing term freq in indexing time

2009-04-21 Thread Doron Cohen
Depending on the problem you are trying to solve there may be other solutions to it, not requiring setting wrong (?) values for term frequencies. If you can explain what you are trying to solve, people on the list may be able to suggest such alternatives. - Doron On Sun, Apr 19, 2009 at 2:39 PM, l

Re: IndexWriter update method

2009-04-21 Thread Doron Cohen
*IndexWriter.deleteDocuments *(Query query) may be handy too (but note that i

Using Payloads

2009-04-21 Thread Murat Yakici
Hi, I started playing with the experimental payload functionality. I have written an analyzer which adds a payload (some sort of a score/boost) for each term occurance. The payload/score for each term is dependent on the document that the term comes from (I guess this is the typoical use case)

Re: Query scoring

2009-04-21 Thread liat oren
Sorry, you can see the script below: Thanks // Index Method **/ public void index(DoubleMap doubleMap, String dirPath, String originalPath) throws IOException { File f = new File(dirPath); IndexWriter writer = null; if(f.exists(

Re: Why is CustomScoreQuery limited to ValueSourceQuery type?

2009-04-21 Thread Doron Cohen
CustomScoreQuery expects the VSQs to have a score for document matching the (main) subQuery - this does not hold for arbitrary queries. On Sat, Apr 18, 2009 at 2:35 AM, Steven Bethard wrote: > CustomScoreQuery only allows the secondary queries to be of type > ValueSourceQuery instead of allowing