Re: Query did not return results

2009-04-24 Thread blazingwolf7
I am using the standard analyzer. This problem only happen when I set the query to BooleanClause.Occur.SHOULD instead of BooleanClause.Occur.MUST while creating the query John Wang wrote: > > What analyzers are you using for both query and indexing?Can you also post > some code on you indexed?

Re: Query did not return results

2009-04-24 Thread John Wang
What analyzers are you using for both query and indexing?Can you also post some code on you indexed? -John On Fri, Apr 24, 2009 at 8:02 PM, blazingwolf7 wrote: > > Hi, > > I created a query that will find a match inside documents. Example of text > match "terror india" > And documents with this

Query did not return results

2009-04-24 Thread blazingwolf7
Hi, I created a query that will find a match inside documents. Example of text match "terror india" And documents with this exact match does exists. My query generated is like this: (title:"terror india"^4 content:"terror india"^3 site:"terror india") But why does it not return any results? can

Re: kamikaze

2009-04-24 Thread John Wang
Hi Michael: We are using it internally here at LinkedIn for both our search engine as well as our social graph engine. And we have a team developing actively on it. Let us know how we can help you. -John On Fri, Apr 24, 2009 at 1:56 PM, Michael Mastroianni < mmastroia...@glgroup.com> wrote:

Re: semi-infinite loop during merging

2009-04-24 Thread Michael McCandless
OK I opened https://issues.apache.org/jira/browse/LUCENE-1611. Christiaan, could you try out that patch to see if it fixes the semi-infinite merging? Thanks. (You'll need to back-port to 2.4.1, but it's a very small patch so hopefully not a problem). Mike On Fri, Apr 24, 2009 at 5:11 PM, Micha

Re: readModifiedUTF8String stuck

2009-04-24 Thread Michael McCandless
On Fri, Apr 24, 2009 at 4:46 PM, MakMak wrote: > > - We had a 2.3.2 index earlier. We have reindexed using 2.4.1 now. So the hang still happens with 2.4.1? > - SAN is ruled out. This occurs even with local file system. OK. Have you confirmed things are really hung, vs just taking a long time?

Re: semi-infinite loop during merging

2009-04-24 Thread Michael McCandless
On Fri, Apr 24, 2009 at 5:02 PM, Christiaan Fluit wrote: > Rollback does not work for me, as my IW is in auto-commit mode. It gives an > IllegalStateException when I invoke it. > > A workaround that does work for me is to close and reopen the IndexWriter > immediately after an OOME occurs. Ahh,

Piece of coded needed

2009-04-24 Thread Andy
--- On Sat, 4/25/09, andykan1...@yahoo.com wrote: From: andykan1...@yahoo.com Subject: Piece of coded needed To: java-user@lucene.apache.org Date: Saturday, April 25, 2009, 1:37 AM Hi every body I know it may seem stupid, but I'm in the middle of a research and I need a piece of code in luc

Piece of coded needed

2009-04-24 Thread andykan1984
Hi every body I know it may seem stupid, but I'm in the middle of a research and I need a piece of code in lucene to give me a weight matrix of a text collection and a given query: W i,j = (f i,j)x(idf i) AND    for the query:  W i,q = (0.5 + (0.5xfreq i,q)/Max(freq i,q))x (idf i ) where: f

Re: semi-infinite loop during merging

2009-04-24 Thread Christiaan Fluit
Michael McCandless wrote: - even though the commitMerge returns false, it should probably not get into an infinite loop. Is this an internal Lucene problem or is there something I can/should do about it myself? Yes, something is wrong with Lucene's handling of OOME. It certainly should not lea

kamikaze

2009-04-24 Thread Michael Mastroianni
Hi-- Has anyone here used kamikaze much? I'm interested in using it in situations where I'll have several docidsets of >2M, plus several in the 10s of thousands. On prototype basis, I got something running nicely using OpenBitSet, but I can't use that much memory for my real application.

Re: readModifiedUTF8String stuck

2009-04-24 Thread MakMak
- We had a 2.3.2 index earlier. We have reindexed using 2.4.1 now. - SAN is ruled out. This occurs even with local file system. - One more point, this occurs with very high load on the application. about 2-3 requests per second, the search part of each request is within milliseconds. the page size

Re: readModifiedUTF8String stuck

2009-04-24 Thread Michael McCandless
On Tue, Apr 21, 2009 at 6:25 PM, MakMak wrote: >   Ran CheckIndex. This is what it prints out: > > cantOpenSegments: false > numBadSegments: 0 > numSegments: 14 > segmentFormat: FORMAT_HAS_PROX [Lucene 2.4] > segmentsFileName: segments_2od > totLoseDocCount: 0 > clean: true > toolOutOfDate: false

Re: Index in text format

2009-04-24 Thread Andrzej Bialecki
Otis Gospodnetic wrote: No. But you could look at an existing index, pull out one Document at a time, pull out any stored Field values from each Document, and write those to a text file. You'd have to write the code for this yourself. Actually, the latest version of Luke (http://www.getopt.

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2009-04-24 Thread Michael McCandless
I don't think there's an easy way to jump straight from term + freq per doc to a Lucene index. Mike On Tue, Apr 21, 2009 at 7:14 AM, Thomas Pönitz wrote: > Hi, > > I have the same problem as discussed here: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200511.mbox/%3c200511021310.1

Re: Share Index on NFS

2009-04-24 Thread Michael McCandless
Can you describe how you change the index? EG are you committing very frequently? It's odd to get 1000+ files in the index in 10 minutes unless you are committing frequently. If so, you may need a smarter deletion policy that stays in touch w/ the readers to know precisely which commit point the

Re: Getting Top n term for a given field for a given time period

2009-04-24 Thread Michael McCandless
Make a RangeFilter that visits only docs in your time period, then run a search w/ a custom HitCollector that looks at the source of each doc and tallies up the results? For performance, you'll probably need to load the source using FieldCache (FieldCache.DEFAULT.getStrings(...)). Or, use Solr's

Re: exponential boosts

2009-04-24 Thread Steven Bethard
On 4/24/2009 3:16 AM, Doron Cohen wrote: > On Fri, Apr 24, 2009 at 12:28 AM, Steven Bethard wrote: > >> On 4/23/2009 2:08 PM, Marcus Herou wrote: >>> But perhaps one could use a FieldCache somehow ? >> Some code snippets that may help. I add the PageRank value as a field of >> the documents I inde

Re: Low-memory searcher

2009-04-24 Thread mark harwood
See IndexReader.setTermInfosIndexDivisor() for a way to help reduce memory usage without needing to re-index. If you have indexed fields with omitNorms off (the default) you will be paying a 1 byte per field per document memory cost and may need to look at re-indexing Cheers Mark - Orig

Stable version ogf Lucene 2.9

2009-04-24 Thread Paul Taylor
Hi I would like to upgrade to Lucene 2.9, I can see the daily builds on Hudson, should I just take the last work that worked, or ar ethere any particular builds that have been tested and hence are possibly more stable. thanks Paul -

Low-memory searcher

2009-04-24 Thread Douglas Campos
Hi! Is there any way to reduce memory footprint doing a search over a very large index (20G). I've getting OOMs with 512m heap! cheers -- Douglas Campos Theros Consulting +55 11 9267 4540 +55 11 3020 8168

Re: Wordnet indexing error

2009-04-24 Thread Otis Gospodnetic
Nothing that marries WordNet with Lucene other than that syns stuff exists in Lucene contrib (but it may exist on SourceForce, in Google Code, etc.). There are several WordNet java libraries you could use to combing WN and Lucene: http://www.simpy.com/user/otis/search/wordnet Otis -- Semat

no segments* file found: files: Error on opening index

2009-04-24 Thread Paul Taylor
Hi I was using a RAMDirectory and this was working fine but have now moved over to a filesystem directory to preserve space, the directory is just initialized once directory = new RAMDirectory(); directory = FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + TAG_BROWSER

Re: Index in text format

2009-04-24 Thread Otis Gospodnetic
No. But you could look at an existing index, pull out one Document at a time, pull out any stored Field values from each Document, and write those to a text file. You'd have to write the code for this yourself. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Origin

Re: SpanQuery wildcards?

2009-04-24 Thread Ivan Vasilev
Thanks Guys for the answers! Steven, I tried with the ".*" instead of "*" but it did not worked as desired. The ".*" does not replace any symbol(s) in the query. I tested with different Analyzers. Depending on Analyzer it is omitted or ".*" are treated just as normal symbols. Mark, your clas

Re: How to search special characters in LUcene

2009-04-24 Thread Erick Erickson
I'm puzzled why you say "By the above out put we can say that StandardAnalyzer is enough to get rid of danish elements." It does NOT get rid of the accents, according to your own output. If your goal is to go ahead and index multiple language documents in a single index then search it, I'd recom

Re: Error: there are more terms than documents...

2009-04-24 Thread Doron Cohen
On Thu, Apr 23, 2009 at 11:52 PM, wrote: > I figured it out. We are using Hibernate Search and in my ORM class I > am doing the following: > > @Field(index=Index.TOKENIZED,store=Store.YES) > protected String objectId; > > So when I persisted a new object to our database I was inadvertently > cre

Re: exponential boosts

2009-04-24 Thread Doron Cohen
On Fri, Apr 24, 2009 at 12:28 AM, Steven Bethard wrote: > On 4/23/2009 2:08 PM, Marcus Herou wrote: > > But perhaps one could use a FieldCache somehow ? > > Some code snippets that may help. I add the PageRank value as a field of > the documents I index with Lucene like this: > >Document docum

Re: How to search special characters in LUcene

2009-04-24 Thread uday kumar maddigatla
Hi Thanks for your reply. After gone threw with the site which you given... i understood that StandardAnalyzer is enough to handle these special characters. i'm attaching one class called AnalysisDemo.java. By executing that class i'm able to say the above sentance(i.e StandardAnalyzer is enough