OK - it seems as if there is a blow-up in FieldPhraseList if a document
has a large number of occurrences of a term that is in the query. In
one example, I searched for "1", and this occurs just under 2000 times
in one of my test documents (as the value of HTML attributes).
Admittedly a weird
I am running over a 100 million row nosql set and unfortunately building 1
million indexes. Each row I get may or may not be for the index I just wrote
too so I can't keep IndexWriter open very long. I am currently simulating how
long it would take me to build all the indexes and it looks like
I did that, and the benchmark indicates FVH is 10x faster than
Highlighter now. I ran with a subset of the wikipedia data since I
didn't want to deal with the whole thing. I'm trying to reconcile these
weirdly varying results. One difference is that the benchmark doesn't
use PhraseQueries -
(11/06/22 2:03), Anupam Tangri wrote:
Hi,
We are using lucene 3.2 for our project where I needed to highlight search
matches. I earlier used default highlighter which did not work correctly
all the time.
So, I started using FHV which worked worked beautifully till I started
searching multiple t
Anyone know how to do a simple RamDirectory...I just created it but it is
failing with this...
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file
found in org.apache.lucene.store.RAMDirectory@1d5a7f6
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@64804:
IMO, a "reversed word Index" does not work in this case, because he's looking
for a word in the middle (See curi*).
Another idea is to build word chunks and save them in a second index plus docID
of the first index.
e.g. security go to "security ecurity curity ... ity"
This is much faster to
Sweet, thanks hmmm, yeah, you should have just given me a
http://lmgtfy.com/ link ;)...darn, should have found that one myself.
I am storing pretty small data.
Thanks,
Dean
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, June 21, 2011 5:39 AM
I wonder if you would be better off creating a second index with the words
reversed.depends on your application profile I guess and what you want, but
an additional index may not be too bad in some cases to speed up the search.
Dean
-Original Message-
From: G.Long [mailto:jde...@gma
> So action that starts a new searcher and closes the old one (like
> replication)
> should release cache from fieldCache through garbage collection?
Absolutely. It won't be immediate, because the JVM has some
heuristics it uses to initiate garbage collection. You could try
attaching to the Solr i
Because when I use [ 20110601 TO * ] lucene does not return my results greater
than 20110601 but when I use
[20110601 TO ], it works fine. Why is this? How do I get everything
larger than 20110601. I have another
Case of sequence numbers and want to get everything above a certain numbe
Hi,
We are using lucene 3.2 for our project where I needed to highlight search
matches. I earlier used default highlighter which did not work correctly
all the time.
So, I started using FHV which worked worked beautifully till I started
searching multiple terms. On digging I came to know FHV has
Reopening is based entirely on the latest segments_N file present in the index.
Lucene loads that file and checks if it refers to any new segments not
already open and if so opens those new ones. And segments in common
with what the reader already has open (ie same segment name) are
simply reused
Say,
Which of the solutions did you find to work better?
Can you please say which package should I change it to if I choose to do it
that way?
Thanks,
Moshe Lichman.
--
View this message in context:
http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-with-multiple-fields-tp2879290p30902
Hey there,
I have a doubt about the behaviour of IndexReader.reopen.
I have a tomcat server holding a lucene index over an IndexSearcher. If I
move the index.folder to index.folder.old and another index, let's say
index.folder.2 to index.folder and then I reopen readers, something weird
happen if
Thank you for the tip :)
I'll try it.
Regards,
Gary
Le 21/06/2011 17:38, Ian Lea a écrit :
See the javadocs for QueryParser.setAllowLeadingWildcard(boolean
allowLeadingWildcard). And from the FAQ, see
http://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_support_is_available_from_
See the javadocs for QueryParser.setAllowLeadingWildcard(boolean
allowLeadingWildcard). And from the FAQ, see
http://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_support_is_available_from_Lucene.3F
Be sure to heed the warnings about performance.
--
Ian.
On Tue, Jun 21, 2011 at 4:
Hi :)
I've got the following text indexed with simpleAnalyzer :
"security is a real problem."
If I try to search for secu*, it will find the document. But if I try to
search for curi*, there are no results.
I raed that it's not possible to add a * wildcard at the begining of the
query so wh
You might want to look at ManifoldCF too.
http://incubator.apache.org/connectors/
Karl
-Original Message-
From: ext Marlen [mailto:zmach...@facinf.uho.edu.cu]
Sent: Tuesday, June 21, 2011 9:49 AM
To: java-user@lucene.apache.org
Subject: need help
I need to create a search engine that s
Hello Cheta,
Check this site : http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
Vinaya
-Original Message-
From: Marlen [mailto:zmach...@facinf.uho.edu.cu]
Sent: Tuesday, June 21, 2011 7:19 PM
To: java-user@lucene.apache.org
Subject: need help
I need to create a search engi
I need to create a search engine that searches on a intranet and my FTP ..
.. I want to use Lucene as search engine .. my question is: which best
fits my needs .. Nutch or Solr?
thank a lot
cheta
On 13/06/2011 9:17, Ian Lea wrote:
Hello
Lucene can be used for searching pretty much anything.
Currently I'm using version 3.2.
I used already 4.x some month ago but there was to much change to that time
so I decided to go with 3.0.x and updated to 3.1 and now to 3.2.
I'm still dealing with my fieldCache OOM issue and want to understand
why things are as they are.
I have already removed/s
Hmmm, I'm not going to even try to talk about the code itself, but I will add
a couple of clarifications:
Jetty has nothing to do with it. It's in Lucene, and it's used for sorting and
sometimes faceting. The cache is associated with a reader on a machine
used to search. When replication happens,
Does this help?
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/util/IndexableBinaryStringTools.html
If not, here's a note from Ryan McKinley on another thread (googling
lucene storing binary data brought it up)...
**
You can store binary data using a binary field type -- then y
I'm trying to understand the logic of/behind fieldCache.
Who has written this peace of code or has good knowledge about it?
Why is it under the hood of jetty?
I see FieldCache$StringIndex with
- f_dccollection
- f_dcyear
- f_dctype
but also
- dctitle --> f_dctitle --> f_dccreator
- title --> f_
Complicated with all those indexes.
3 suggestions:
1. Just give it more memory.
2. Profile it to find out what is actually using the memory.
3. Cut down the number of indexes. See recent threads on pros and
cons of multiple indexes vs one larger index.
--
Ian.
On Mon, Jun 20, 2011 at 2:
25 matches
Mail list logo