Term Limit?

2009-04-03 Thread deminix
http://lucene.apache.org/java/2_4_1/fileformats.html The file format page at the bottom cites that there is a 32 bit limit to term numbers. I fail to see where in the file formats documentation that is actually true. Is the bottom of the page simply out of date? I'm also wondering whether the c

Re: Autonomy search technology

2009-04-03 Thread Vladimir Ignatov
Here is at yellopages/rus (yell.ru) we switch from FAST to Lucene. I am HAPPY about it. Sometimes FAST was a nightmare to work with. Non-working features, obscure bugs, thousands of non-documented settings, slow, lazy and dumb techsupport... Not mention it's ridiculous price and very limiting licen

Re: Autonomy search technology

2009-04-03 Thread Shashi Kant
Hmm..not sure I would call Autonomy a "superb product". IMHO It is anything but. In fact, it is what one calls bloat-ware.I have had some experience with Autonomy and it is hardly something you should consider using unless you are eager to shoot yourself in the foot. I fundamentally disagree with P

Re: Autonomy search technology

2009-04-03 Thread patrick o'leary
I think you need to ask the question what do you want? A person asked me one, which is better, a gold pen fountain pen or a plastic bic pen? The answer - depends If you want the most fluid writing instrument, which gives you a certain level of accomplishment as you use it, and looks superb then th

simultaneous indexing and searching causing intermitently long searches.

2009-04-03 Thread Dan OConnor
All, I have a several questions regarding query response time and I would appreciate any help that can be provided. We have a system that indexes approximately 200,000 documents per day at a fairly constant rate and holds them in a cfs-style file system directory index for 8 days. The index is

Re: Autonomy search technology

2009-04-03 Thread John Wang
Not quite.For example, # of fields is static thru out the corpus. # zones is per document. E.g. let's say you have 1 million docs, some docs have 2 paragraphs, some 1, and some 1. You want to limit your search two paragraph 13. How many fields do you create? What if you add a document with 500

Re: sloppyFreq question

2009-04-03 Thread Chris Hostetter
: Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the : terms "hey look, the quick brown fox jumped very high", but in Doc 1 all the : terms are indexed at the same position. In doc 2, the terms are indexed in : adjacent positions (normal way). For the query "the quick brow

RE: Autonomy search technology

2009-04-03 Thread Digy
As far as I can remember, "Zone" in Verity is similar to "field" in Lucene and verity performs searches on all "zones" by default. DIGY -Original Message- From: Matthew Runo [mailto:mr...@zappos.com] Sent: Friday, April 03, 2009 9:08 PM To: java-user@lucene.apache.org Subject: Re: Autono

Re: Autonomy search technology

2009-04-03 Thread John Wang
Maybe it is a Verity specific term :) zone search = searching only a part of a document. e.g. 1000 docs in the corpus, query only second paragraph of all docs. @Lukas: That is not what I am saying at all. Lucene's feature set is not a superset of those of autonomy/verity/endeca ..., neither is th

Re: Autonomy search technology

2009-04-03 Thread Matthew Runo
Would you be willing to explain what "zone search" is? I did a quick google search, but came up empty handed. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Apr 3, 2009, at 10:08 AM, John Wang wrote: Verity VDK, which was bought by autonom

Re: Autonomy search technology

2009-04-03 Thread Lukáš Vlček
So it means that autonomy search solution is doing better out of the box then any solution based on Lucene right now? And bringing Lucene based solution at the same level would require additional investments and non-trivial development (probably not small). In other words if client is using autonom

Re: Autonomy search technology

2009-04-03 Thread John Wang
Verity VDK, which was bought by autonomy, has zone search. Something lucene currently does not support. We have implemented it ontop of lucene and thinking about contributing. -John On Fri, Apr 3, 2009 at 8:56 AM, Lukáš Vlček wrote: > Hi, > anybody has experience with Automony search technolog

Re: Speed of fuzzy searches

2009-04-03 Thread Erik Hatcher
On Apr 3, 2009, at 10:58 AM, Grant Ingersoll wrote: Now, we have an implementation of JaroWinkler in the spell checker (in fact, we have pluggable distance measures there), perhaps it makes sense to think about how FuzzyQuery could leverage this pluggability? My suggestion is to make it p

Re: Speed of fuzzy searches

2009-04-03 Thread Grant Ingersoll
In a really weird "what is old, is new again" sort of thing, I am researching spellchecking, and came across: http://www.lucidimagination.com/search/document/cc46ac41bd4ee661/ngramspeller_contribution_re_combining_open_office_spellchecker_with_lucene#4f731c4209e3d7d0 which suggests speeding up

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Michael McCandless
I'm also confused, because ScorerDocQueue should not be used during indexing. It's used only when scoring boolean "OR" queries. Are you doing searching in the same JVM as indexing? Mike On Fri, Apr 3, 2009 at 9:21 AM, John Byrne wrote: > Unfortunately I'm not sure of the exact number. It happe

Re: Speed of fuzzy searches

2009-04-03 Thread Matt Schraeder
After doing some research I broke down and just updated my Zend Framework. I just installed it not long ago so I didn't think much of it, but then I realized I'm running version 1.6.1 and that Zend is currently on 1.7.8. Upon upgrading the complex fuzzy search that was taking 30 seconds now takes

Lucene Filtering

2009-04-03 Thread addman
How do you create a Lucene Filter to check if a field has a value? It is part for a ChainedFilter that I am creating. -- View this message in context: http://www.nabble.com/Lucene-Filtering-tp22868930p22868930.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Erick Erickson
Good Luck! I love (a little sarcasm there) being presented with problem statements like "it doesn't work. You can't do anything on the machine where the problem is. We can't give you any information about what's happening. How long will it take you to fix it?" Best er...@infullsympathy.com On Fr

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread John Byrne
Unfortunately I'm not sure of the exact number. It happened on a machine I have no access to, and I was just e-mailed a few details of the problem! We have a JMS queue, where each message is a file to be indexed. There was somewhere between 2000 and 10,000 messages processed when it happened.

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Erick Erickson
H, that's odd. how many is "a large number of documents"? And what is your index size when things to wonky? (approximately) I can say that other people create very large indexes without this happening, but the only thing that says is that this isn't a *known* problem. Is there any chance you'

RE: Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-03 Thread David Seltzer
Ah, ok. Well that explains the behavior then. Thanks! -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Thursday, April 02, 2009 4:14 PM To: java-user@lucene.apache.org Subject: Re: Retrieving TokenStream from Tokenized Non-Stored Field Actually you hav

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread John Byrne
The maximum JVM memory is 2GB. Apparently 1.2GB is being used up by this class. All IndexWriter settings are left as default. I haven't tried any changes yet, because the problem so far has on ly happened in a production environment that I can't mess with. I am planning to try reproducing it

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Erick Erickson
How much memory are you allocating for the JVM? And what are your various indexwriter settings (e.g. MaxBufferedDocs, MaxMergeDocs, etc). Have you tried different settings in setRamBufferSizeMB? Best Erick On Fri, Apr 3, 2009 at 7:13 AM, John Byrne wrote: > Hi, I'm having a problem where the J

HeapedScorerDoc using all my memory

2009-04-03 Thread John Byrne
Hi, I'm having a problem where the JVM runs out of memory while indexing a large number of files. An analysis of the heapdump shows that most of the memory was taken up with "org/apache/lucene/util/ScorerDocQueue$HeapedScorerDoc". I can't find any leaks in my code so far, and I was wondering,