http://lucene.apache.org/java/2_4_1/fileformats.html
The file format page at the bottom cites that there is a 32 bit limit to
term numbers. I fail to see where in the file formats documentation that is
actually true. Is the bottom of the page simply out of date? I'm also
wondering whether the c
Here is at yellopages/rus (yell.ru) we switch from FAST to Lucene. I
am HAPPY about it. Sometimes FAST was a nightmare to work with.
Non-working features, obscure bugs, thousands of non-documented
settings, slow, lazy and dumb techsupport... Not mention it's
ridiculous price and very limiting licen
Hmm..not sure I would call Autonomy a "superb product". IMHO It is anything
but. In fact, it is what one calls bloat-ware.I have had some experience
with Autonomy and it is hardly something you should consider using unless
you are eager to shoot yourself in the foot. I fundamentally disagree with
P
I think you need to ask the question what do you want?
A person asked me one, which is better, a gold pen fountain pen or a plastic
bic pen?
The answer - depends
If you want the most fluid writing instrument, which gives you a certain
level of accomplishment as you use it, and looks superb then th
All,
I have a several questions regarding query response time and I would appreciate
any help that can be provided.
We have a system that indexes approximately 200,000 documents per day at a
fairly constant rate and holds them in a cfs-style file system directory index
for 8 days. The index is
Not quite.For example, # of fields is static thru out the corpus. # zones
is per document. E.g. let's say you have 1 million docs, some docs have 2
paragraphs, some 1, and some 1. You want to limit your search two
paragraph 13. How many fields do you create? What if you add a document with
500
: Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the
: terms "hey look, the quick brown fox jumped very high", but in Doc 1 all the
: terms are indexed at the same position. In doc 2, the terms are indexed in
: adjacent positions (normal way). For the query "the quick brow
As far as I can remember, "Zone" in Verity is similar to "field" in Lucene
and verity performs searches on all "zones" by default.
DIGY
-Original Message-
From: Matthew Runo [mailto:mr...@zappos.com]
Sent: Friday, April 03, 2009 9:08 PM
To: java-user@lucene.apache.org
Subject: Re: Autono
Maybe it is a Verity specific term :)
zone search = searching only a part of a document. e.g. 1000 docs in the
corpus, query only second paragraph of all docs.
@Lukas: That is not what I am saying at all. Lucene's feature set is not a
superset of those of autonomy/verity/endeca ..., neither is th
Would you be willing to explain what "zone search" is? I did a quick
google search, but came up empty handed.
Thanks for your time!
Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833
On Apr 3, 2009, at 10:08 AM, John Wang wrote:
Verity VDK, which was bought by autonom
So it means that autonomy search solution is doing better out of the box
then any solution based on Lucene right now? And bringing Lucene based
solution at the same level would require additional investments and
non-trivial development (probably not small). In other words if client is
using autonom
Verity VDK, which was bought by autonomy, has zone search. Something lucene
currently does not support.
We have implemented it ontop of lucene and thinking about contributing.
-John
On Fri, Apr 3, 2009 at 8:56 AM, Lukáš Vlček wrote:
> Hi,
> anybody has experience with Automony search technolog
On Apr 3, 2009, at 10:58 AM, Grant Ingersoll wrote:
Now, we have an implementation of JaroWinkler in the spell checker
(in fact, we have pluggable distance measures there), perhaps it
makes sense to think about how FuzzyQuery could leverage this
pluggability?
My suggestion is to make it p
In a really weird "what is old, is new again" sort of thing, I am
researching spellchecking, and came across: http://www.lucidimagination.com/search/document/cc46ac41bd4ee661/ngramspeller_contribution_re_combining_open_office_spellchecker_with_lucene#4f731c4209e3d7d0
which suggests speeding up
I'm also confused, because ScorerDocQueue should not be used during indexing.
It's used only when scoring boolean "OR" queries.
Are you doing searching in the same JVM as indexing?
Mike
On Fri, Apr 3, 2009 at 9:21 AM, John Byrne wrote:
> Unfortunately I'm not sure of the exact number. It happe
After doing some research I broke down and just updated my Zend
Framework. I just installed it not long ago so I didn't think much of
it, but then I realized I'm running version 1.6.1 and that Zend is
currently on 1.7.8. Upon upgrading the complex fuzzy search that was
taking 30 seconds now takes
How do you create a Lucene Filter to check if a field has a value? It is
part for a ChainedFilter that I am creating.
--
View this message in context:
http://www.nabble.com/Lucene-Filtering-tp22868930p22868930.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Good Luck! I love (a little sarcasm there) being presented with
problem statements like "it doesn't work. You can't do anything
on the machine where the problem is. We can't give you any
information about what's happening. How long will it take you
to fix it?"
Best
er...@infullsympathy.com
On Fr
Unfortunately I'm not sure of the exact number. It happened on a machine
I have no access to, and I was just e-mailed a few details of the
problem! We have a JMS queue, where each message is a file to be
indexed. There was somewhere between 2000 and 10,000 messages processed
when it happened.
H, that's odd. how many is "a large number of documents"? And
what is your index size when things to wonky? (approximately)
I can say that other people create very large indexes without this
happening,
but the only thing that says is that this isn't a *known* problem.
Is there any chance you'
Ah, ok. Well that explains the behavior then. Thanks!
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Thursday, April 02, 2009 4:14 PM
To: java-user@lucene.apache.org
Subject: Re: Retrieving TokenStream from Tokenized Non-Stored Field
Actually you hav
The maximum JVM memory is 2GB. Apparently 1.2GB is being used up by this
class.
All IndexWriter settings are left as default.
I haven't tried any changes yet, because the problem so far has on ly
happened in a production environment that I can't mess with. I am
planning to try reproducing it
How much memory are you allocating for the JVM? And what are your
various indexwriter settings (e.g. MaxBufferedDocs, MaxMergeDocs, etc).
Have you tried different settings in setRamBufferSizeMB?
Best
Erick
On Fri, Apr 3, 2009 at 7:13 AM, John Byrne wrote:
> Hi, I'm having a problem where the J
Hi, I'm having a problem where the JVM runs out of memory while indexing
a large number of files. An analysis of the heapdump shows that most of
the memory was taken up with
"org/apache/lucene/util/ScorerDocQueue$HeapedScorerDoc".
I can't find any leaks in my code so far, and I was wondering,
24 matches
Mail list logo