On Thu, Aug 16, 2012 at 11:27 AM, zhoucheng2008 wrote:
>
> +(title:21 title:a title:day title:once title:a title:month)
Looks like you have a fairly big boolean query going on here, and some
of the terms you're using are really common ones like "a".
Are you using AND or OR for the default operat
and also try jmap -heap pid to check whether it runs out of memory
or jstat -gcutil pid 1000
On Thu, Aug 16, 2012 at 10:09 AM, zhoucheng2008 wrote:
> The query has been stuck for more than an hour. The total size is less than
> 1G, and the number of docs is around 100,000. Hardware is ok as it w
use jstack pid to check any deadlock.
On Thu, Aug 16, 2012 at 10:09 AM, zhoucheng2008 wrote:
> The query has been stuck for more than an hour. The total size is less than
> 1G, and the number of docs is around 100,000. Hardware is ok as it works well
> with other much more demanding projects.
>
The query has been stuck for more than an hour. The total size is less than 1G,
and the number of docs is around 100,000. Hardware is ok as it works well with
other much more demanding projects.
-- --
??: "Li Li";
: 2012??8??16??(??
how slow is it? are all your searches slow or only that query slow? how
many docs are indexed and the size of the indexes? whats the hardware
configuration?
you should describe it clearly to get help.
在 2012-8-16 上午9:28,"zhoucheng2008" 写道:
> Hi,
>
>
> I have the string "$21 a Day Once a Month" to
ok, I have no problem with filter/copy to new index, but that seems like
a good start point. Would need to figure out how to extend that class
correctly, but at least gives me a good starting point.
On 08/15/2012 02:48 PM, Uwe Schindler wrote:
You cannot modify the ternm dictionary of an inde
You cannot modify the ternm dictionary of an index, see my other eMail. You
have to filter it by copying to a new index or reindexing. Document
modifications are not supported in Lucene and other inverted indexes.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMai
If you found the terms to remove with e.g. HighFreqTerms, you can use the
abstract class FilterIndexReader (FilterAtomicReader in Lucene 4.0) to code
a filter for the term dictionary (just return a filtered TermEnum) on
merging. Just wrap an IndexReader with this FilterIndexReader that hides the
te
On 08/15/2012 02:34 PM, Ahmet Arslan wrote:
Is there an easy way to figure out
the most common tokens and then remove those tokens from the
documents.
Probably this :
http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/HighFreqTerms.html
ah, that's a good part 1. Then the Q w
On 08/15/2012 02:29 PM, Erick Erickson wrote:
I don't see how you could without indexing everything first
since you can't know what the most frequent terms until
you've processed all your documents
exactly
If you know these terms in advance, it seems like you could
just call then stopword
> Is there an easy way to figure out
> the most common tokens and then remove those tokens from the
> documents.
Probably this :
http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/HighFreqTerms.html
-
To unsubscr
I don't see how you could without indexing everything first
since you can't know what the most frequent terms until
you've processed all your documents
If you know these terms in advance, it seems like you could
just call then stopwords and use the common stopword
processing.
If you have to e
Is there an easy way to figure out the most common tokens and then
remove those tokens from the documents.
use case: imagine one is indexing a mailing list (such as this
java-user) and is extracting all e-mail addresses in the messages and
adding them to a doc.
What that means is that one wi
I am using lucene to produce several indexes from html-sites.
To work with them i convert the lucene database into sql via a small
programm. The main problem is that I take a small part of the collected
datafields ( datasource, plainTextContent, title, description and keyword).
But there are in mos
Problem not fixed! I contacted infra on IRC already.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Wednesday, August 15, 2012 4:26 PM
> To: java-user@luce
I hope the problem is fixed now; this mail is just to check! It was hard to
unsubscribe because of the strange eMail. Have no idea at all...
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Uwe Schin
I got is, too. As a moderator of this list, I will look into finding the
root cause and forcefully unsubscribe the failing address!
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Bernd Fehling [mailto:b
I guess that ulimit could be a default setting of XenServer when it was first
time setup.
We started with about 27G.
I already raised ulimit -n when setting up XenServer because this was also
limited.
By the way, am I the only one getting this nasty DELIVERY FAILURE message from
one
on this li
So my blog post, last section, helped? I think the ulimits came from there.
What distribution do you use that ulimit was actually limited - or was it
some sysadmin doing this? :-)
We should maybe refer to this blog post from docs or create a copy of the
page inside lucene's distribution!
Uwe
---
Hi Uwe,
index size is:
-rw-r--r-- 1 solr users 82G 15. Aug 07:50 _2rhe.fdt
-rw-r--r-- 1 solr users 303M 15. Aug 07:50 _2rhe.fdx
-rw-r--r-- 1 solr users 1,2k 15. Aug 07:36 _2rhe.fnm
-rw-r--r-- 1 solr users 39G 15. Aug 09:04 _2rhe.frq
-rw-r--r-- 1 solr users 757M 15. Aug 09:05 _2rhe.nrm
-rw-r--r--
You don't get a heap-related OOM in your stack trace, it is "Map failed" -
caused by MMapDirectory. You don't have enough virtual memory to map the
index into address space. I think your heap is way too mch (-Xmx25g is way
too big for any existing index and drives GC crazy). How big is your index?
I'm trying to run CheckIndex as seperate tool on a large index to get
nice infos about number of terms, number of tokens, ... but always get OOM
exception.
Already have JAVA_OPTS -d64 -Xmx25g -Xms25g -Xmn6g
Any idea how to use CheckIndex on huge index size?
Opening index @ /srv/www/solr/sol
22 matches
Mail list logo