Hi,
We are considering to use glusterfs to replicate indexed data from one box to
another, I searched
Google and found that some people did seem to use glusterfs for this purpose,
we are using lucene
3.6.
I tested read/write in parallel (thread to search and another thread to index),
and fou
space
efficient when there are a tiny number of documents.
Mike McCandless
http://blog.mikemccandless.com
On Fri, Aug 9, 2013 at 11:55 AM, Zhang, Lisheng
wrote:
> Hi Mike,
>
> Any more comments on this issue?
>
> Thanks and best regards, Lisheng
>
> -Original Message-
Hi Mike,
Any more comments on this issue?
Thanks and best regards, Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Friday, August 02, 2013 7:55 AM
To: java-user@lucene.apache.org
Subject: RE: lucene 4.3 seems to be much slower in indexing
-luceneDir
Thanks and best regards, Lisheng
-Original Message-
From: Zhang, Lisheng
Sent: Thursday, August 01, 2013 11:16 AM
To: 'java-user@lucene.apache.org'
Subject: RE: lucene 4.3 seems to be much slower in indexing than lucene
3.6?
Hi Mike,
First I really appreciate your
To: Lucene Users
Subject: Re: lucene 4.3 seems to be much slower in indexing than lucene
3.6?
On Tue, Jul 30, 2013 at 6:13 PM, Zhang, Lisheng
wrote:
> Hi Mike,
>
> I did more tests with realistic text from different languages (typical
> text for 8 different languages, English one is nov
can each
time create searcher on the fly, but seems lucene goes further away from
that?
Your guidance would be very appreciated,
Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Saturday, July 27, 2013 11:06 PM
To: java-user@lucene.apache.org
Sub
cCandless
http://blog.mikemccandless.com
On Fri, Jul 26, 2013 at 2:55 PM, Zhang, Lisheng
wrote:
>
> Hi,
>
> I did some basic performance testing, just use random number to generate
> text for indexing,
> below I attached source java code. The command I used are:
>
> java TestReal43 inde
, i can assure you lucene 4.3 is way more
efficient than 3.6. Well after understanding and tweaking a few things ;)
second can you help us understanding what is indexed and how? like what
kind of fields? which merge policy ?...
Thanks,
Nicolas
On Fri, Jul 26, 2013 at 11:55 AM, Zhang, Lisheng
Hi,
I used in the following code to detect data corruption in lucene 4.3.0:
/
import org.apache.lucene.index.CheckIndex;
...
CheckIndex checkIndex = new CheckIndex(getLuceneDirectory(folderPath));
CheckIndex.Status status = checkIndex.checkIndex();
Hi,
I did some basic performance testing, just use random number to generate text
for indexing,
below I attached source java code. The command I used are:
java TestReal43 index -docCount 500 -start 1 -optimize true -luceneDir mmap
java TestReal36 index -docCount 500 -start 1 -optimize true
I am very sorry, I should have sent to solr user group, not lucene!!
Best regards, Lisheng
-Original Message-
From: Zhang, Lisheng
Sent: Friday, February 08, 2013 12:17 PM
To: 'java-user@lucene.apache.org'
Subject: Solr query parser, needs to call
setAutoGeneratePhraseQu
Hi,
In our application we need to call method
setAutoGeneratePhraseQueries(true)
on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?
But in current solr 3.6.1, the only way to do so is to set
LUCENE_30
in solrconfig.xml (i
is corrupted but later healed itself)?
Thanks very much for helps, Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Thursday, August 02, 2012 10:56 AM
To: java-user@lucene.apache.org
Subject: lucene Indexer failed to close, but later indexing
Hi,
We are using lucene 2.3.2 on linux/ubuntu (we will upgrade lucene soon),
recently we got exception:
read past EOF #012java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:130)
at
org.apache.lucene.index.CompoundFileReader$CSIn
blog post about the Java 7 bugs, too, they are closely related:
> blog.thetaphi.de
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> "Zhang, Lisheng" schrieb:
>
> Hi,
>
> We have been using lucene 2.3.2 for ye
helps, Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Saturday, June 30, 2012 2:17 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene indexed data corruption error
Thanks for such a quick help!
The java we use is:
java -version
java
ee my blog
post about the Java 7 bugs, too, they are closely related: blog.thetaphi.de
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
"Zhang, Lisheng" schrieb:
Hi,
We have been using lucene 2.3.2 for years well (yes, we should upgrade).
Recently
Hi,
We have been using lucene 2.3.2 for years well (yes, we should upgrade).
Recently we encountered data corruption error when commiting IndexWriter:
///
background merge hit exception: _14b:c61262 _1ag:c11225 _1gb:c9411 _1gv:c905
_1gw:c50 _1gx:c50 _1gy:c50 _1gz:c50 _1h0:c31 into _1h1 [opti
6; so the total score is 0.5*0.5+0.8*0.7.
So inside CustomScoreQuery essentially we need to fetch the payloads of good
and morning separately (maybe using TermPositions?), and use them to score the
document. Is this what you meant ?
Thanks,
Arnon.
From: "Zhang, Lisheng"
To: java-user
Hi,
A few days ago I asked a similar question:
1) in coming lucene 4.0, there is a feature sort like payload in document level:
>lucene 4 has a feature called IndexDocValues which is essentially a
> payload per document per field.
>
> you can read about it here:
> http://www.searchworkings.org/b
-Original Message-
From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Thursday, December 01, 2011 11:34 AM
To: Zhang, Lisheng
Cc: java-user@lucene.apache.org
Subject: Re: Boost more recent document
On Thu, Dec 1, 2011 at 8:30 PM, Zhang, Lisheng
wrote:
> Hi Simon,
>
>
, so I would like to try CustomScoreQuery without
cache
first?
Thanks very much for helps, Lisheng
-Original Message-
From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Thursday, December 01, 2011 11:21 AM
To: Zhang, Lisheng
Cc: java-user@lucene.apache.org
Subject: Re
r selected cache.
Thanks very much for all your great helps, please point out if you see wrong
in above statements?
Best regards, Lisheng
-Original Message-----
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Wednesday, November 30, 2011 1:40 PM
To: java-user@lucene.
, 2011 at 9:08 PM, Zhang, Lisheng
wrote:
> Thanks very much for your helps! I got the point, only problem is that
> I cannot afford to to use FieldCache because in our app we have many
> lucene index data folders, is there another simple way?
>
> Thanks again, Lisheng
>
>
...@googlemail.com]
Sent: Wednesday, November 30, 2011 11:40 AM
To: java-user@lucene.apache.org
Subject: Re: Boost more recent document
On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
wrote:
> Hi,
>
> We need to boost document which is more recent (each doc has time stamp
> attribute). I
Hi,
We need to boost document which is more recent (each doc has time stamp
attribute). It seems that
we cannot use doc boost at index time because it will be condensed into one
byte (cannot differentiate
365 days), so we may use payload (save time stamp as payload) to boost at
search time.
rd to estimate this in the abstract, I'm afraid you'll just
have to try it.
Best
Erick
On Mon, Nov 14, 2011 at 6:40 PM, Zhang, Lisheng
wrote:
> Our indexed data are around 200~300MB size (each folder), so it is
> still small?
>
> Could you roughly estimate how big the indexed data
Our indexed data are around 200~300MB size (each folder), so it is
still small?
Could you roughly estimate how big the indexed data size (10GB?)
needs to be, so that creating IndexReader each time could become a
serious issue?
Thanks very much for helps!
Lisheng
-Original Message-
Fro
Thanks for your reply!
The reason why we cannot reuse IndexReader is that our server holds many (>4000)
independent index folders, each one corresponds to a separate URL. At any time
any folder can be queried, so we cannot hold all of them into memory.
In lucene 2.3.2 query is fast even if we rec
2.3.2 to 3.1.0
On Mon, Nov 14, 2011 at 11:09 AM, Zhang, Lisheng
wrote:
> We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In
> Action" I learned
> that we should "warm up" IndexSearcher and donot expect initial a few queries
> to be fast.
Make s
We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In Action"
I learned
that we should "warm up" IndexSearcher and donot expect initial a few queries
to be fast.
But due to our special app we cannot "warm up" (each query has to use a new
IndexSearcher),
in lucene 2.3.2 this se
you
closed the IndexWriter (this was fixed in 2.4.0). This means even if
you close the writer and a crash occurs the index could become
corrupt.
Did you have an OS/machine crash on this index?
Mike McCandless
http://blog.mikemccandless.com
On Sat, Oct 29, 2011 at 12:15 PM, Zhang, Lisheng
w
http://blog.mikemccandless.com
On Fri, Oct 28, 2011 at 4:57 PM, Zhang, Lisheng
wrote:
>
> We are using lucene 2.3.2 (yes we should upgrade) and recently we had
> Exception when opening
> index:
>
> ###
> java.io.IOException: read past EOF "urn:schemas-microsoft-com:off
We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception
when opening
index:
###
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
at
org.apache.lucene.store.BufferedIndexInput.readByte(
Hi,
Another solution would be to define locale when creating SortField object,
if using English locale the sorting should be case insensitive?
Best regards, Lisheng
-Original Message-
From: Senthil V S [mailto:vss...@gmail.com]
Sent: Tuesday, October 11, 2011 12:34 PM
To: java-user@lucen
Hi,
I know that we need to delete/index a document in order to update any part of
it,
but recently we need to index a field which changes rather frequently so that
each
time reindexing whole document would be inpractical for performance reason.
This field is a small integer so I may just trea
helps, Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Saturday, February 26, 2011 5:00 PM
To: java-user@lucene.apache.org
Subject: Lucene search result produced wrong result (due to java
Collation)?
Hi,
Today I have noticed that sometimes
Hi,
Today I have noticed that sometimes lucene sort produced strange result in plain
English names, like (String ASC)
l yy
liu yu
I traced to lucene source code, it seems to be a java English Collator problem
(I
set Locale.English to SortField), below I reproduced issue by a trivial code
(pu
Hi Kumar,
1) For your question in last mail: for tool luke, go to site
http://www.getopt.org/luke/
and click "launch luke now", then pointing to your lucene data folder. Also the
book
"Lucene in Action" is a great source (go to .amazon.com and search this
book) where
everything (almost) is
Hi,
I think using Field.Index.NOT_ANALYZED means ignoring StandardAnalyzer, so we
index "sql. server" as one word. You may use luke to see how this field is
indexed.
In this case we can only search whole term (without case change even), if using
the
StandardAnalyzer to analyze "sql. server" w
Hi,
Do you know any good open source tool to extract text from MS outlook MSG
files?
1) Apache Tika seems not to support *.msg yet.
2) Apache POI recently started to support *.msg (3.7 10/2010), but I run into
several problems (cannot process Japanese well, null pointer exception ..)?
Thank
8 AM, Lance Norskog wrote:
>>> 2billion is a hard limit. Usually people split indexes into multiple
>>> index long before this, and use the parallel multi reader (I think) to
>>> read from all of the sub-indexes.
>>>
>>> On Mon, Nov 1, 2010 at 2:16 PM, Zha
and use the parallel multi reader (I think) to
>> read from all of the sub-indexes.
>>
>> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
>> wrote:
>>>
>>> Hi,
>>>
>>> Now lucene uses integer as document id, so it means we cannot have mo
Hi,
Now lucene uses integer as document id, so it means we cannot have more
than 2^31-1 documents within one collection? Even if we use MultiSearcher
the document id is still integer so it seems this is still a problem?
We have been using lucene for some time and our document count is growing
ra
Hi,
Have you compared your java version in these two boxes? Also PHP version?
Did you run indexer from command line or from browser?
I used Zend java bridge before and found java version too low may cause
problem?
Best regards, Lisheng
-Original Message-
From: dian puma [mailto:dianp.
indexes that
significantly improve search speed.
I'm not sure..but I think indexWriter.getReader() for almost realtime
was added to 2.9, so you can keep your writer always open and get very
cheaply a new reader on each search request.
On Fri, Sep 24, 2010 at 09:47, Zhang, Lisheng
wrote:
be appreciated,
Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Thursday, September 23, 2010 6:11 PM
To: java-user@lucene.apache.org
Subject: In lucene 2.3.2, needs to stop optimization?
Hi,
We are using lucene 2.3.2, now we need to index e
Hi,
We are using lucene 2.3.2, now we need to index each document as
fast as possible, so user can almost immediately search it.
So I am considering stop IndexWriter optimization during real time,
then in relatively off-time like late night we may call IndexWriter optimize
method explicitly onc
:03 AM
To: java-user@lucene.apache.org
Subject: Re: Building maven artifacts
Hi,
I don't know. I tried to setup somethind like this:
But error is the same. Maybe there are any other parameters?
2010/7/16 Zhang, Lisheng
> Hi,
>
> I never this kind of build before, but just
Hi,
I never this kind of build before, but just from the error message
I guess it could mean two variables:
${project.artifactId}
${project.version}
are not defined (otherwise exact jar file name would be printed out)?
Could it be some environment setup issue?
Best regards, Lisheng
-Origi
Hi,
I remembered I tested earlier lucene 1.4 and 2.4, and found the following:
# it is OK for multiple searchers to search the same collection.
# it is OK for one IndexerWriter to edit and multiple searchers to search
at the same time.
# it is generally NOT OK for multiple IndexerWriter to
Hi,
It looks good to me, but I did not test, when testing,
we may print out both
initialQuery.toString() // query produced by QueryParser
finalQuery.toString() // query after your new function
as comparison, besides testing the query result.
Best regards, Lisheng
-Original Message-
F
have not done that myself before, but feel it should work.
Best regards, Lisheng
-Original Message-
From: Christopher Condit [mailto:con...@sdsc.edu]
Sent: Friday, April 30, 2010 2:08 PM
To: java-user@lucene.apache.org
Cc: Zhang, Lisheng
Subject: RE: Modify TermQueries or Tokens
Hi Li
rked for me well.
Best regards, Lisheng
-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Friday, April 30, 2010 1:41 PM
To: java-user@lucene.apache.org
Subject: RE: Modify TermQueries or Tokens
Hi,
Lucene already have class WildcardQuery, I think
Hi,
Lucene already have class WildcardQuery, I think you can add "*" on either side
(or both), when creating Term:
http://lucene.apache.org/java/3_0_1/api/core/index.html
But notice by default QueryParser cannot parse *queryString.
Best regards, Lisheng
-Original Message-
From: Christo
p://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6
Million Euro funding!
Zhang, Lisheng wrote:
> Hi,
>
> I have been using Java/lucene for a few years and it works well for me.
>
&g
Hi,
I have been using Java/lucene for a few years and it works well for me.
Recently we started to use PHP/lucene from Zend, I found some problems,
especially
that for each query, it immediately loads whole term id/score (other info..)
array into
memory, this would cause memory exhausion if t
? Maybe time to consider an upgrade.
>
> Anyway, if you're getting that exception when creating a searcher I
> guess you are using a constructor that takes an IndexReader and a
> further guess would be that something has closed it.
>
> --
> Ian.
>
>
> On Tue,
Hi,
We are using lucene 1.4.3, sometimes we encounter an error when creating
Searcher object with IOException: "Already closed".
I searched lucene message archive but did not see conclusive answer, any
help would be very appreciated.
Best regards, Lisheng
--
Hi,
We are using lucene 1.4.3, sometimes we encounter an error when creating
Searcher object with IOException: "Already closed".
I searched lucene message archive but did not see conclusive answer, any
help would be very appreciated.
Best regards, Lisheng
---
Hi,
Simplest way is to add one condition (assuming field is f1):
f1:notebook f1:"note book"
which means (notebook OR "note book"), 2nd condition is phrase
search.
Best regards, Lisheng
-Original Message-
From: Alex Bredariol Grilo [mailto:abgr...@gmail.com]
Sent: Friday, September 25,
Hi,
I read through the lucene thread/process safety issue for concurrent
indexing, my understanding is that each indexing through IndexWriter
will lock the whole index directory.
Now we need to index a community blog where many people add/update,
so queuing all those indexing requests would be a
Hi,
Does this issue has anything to do with the line:
> TopScoreDocCollector collector = new TopScoreDocCollector(10);
if we do:
> TopScoreDocCollector collector = new TopScoreDocCollector(2);
instead (only see top two documents), could memory usage be less?
Best regards, Lisheng
-Or
9 10:39 PM
> To: java-user@lucene.apache.org
> Subject: Re: Possible bug in QueryParser when using CJKAnalyzer (lucene
> 2.4.1)
>
>
> I'm not sure this is the same case, but there is a report and patch for
> CJKTokenizer in JARA:
>
> https://issues.apache.org/jira/br
LUCENE-973
Koji
Zhang, Lisheng wrote:
> Hi,
>
> When I use lucene 2.4.1 QueryParser with CJKAnalyzer, somehow
> it always generates an extra space, for example, if the input is "ABC",
> the query would be:
>
> myfield"AB BC " // should be myfield:"AB BC&
Hi,
When I use lucene 2.4.1 QueryParser with CJKAnalyzer, somehow
it always generates an extra space, for example, if the input is "ABC",
the query would be:
myfield"AB BC " // should be myfield:"AB BC"
If I create PhraseQuery directly it does work. From Luke I know indexing
works OK. In lucene
Hi,
I did not see method setConstantScoreRewrite method
in RangeQuery class?
Best regards, Lisheng
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Wednesday, May 20, 2009 11:10 AM
To: java-user@lucene.apache.org
Subject: Re: RangeQuery & TooManyClause
Hi,
I know lucene 2.9 would be the next release, do we have
the release date yet (roughly, 6 months away, or longer)?
Knowing this would help us to schedule our work, thanks
for helps!
Lisheng
-
To unsubscribe, e-mail: java-us
ts.
Mike
On Wed, Apr 8, 2009 at 4:47 PM, Zhang, Lisheng
wrote:
> Hi,
>
> Client said they did not index, all they do is searching (create
> Searcher objects), I looked at 1.4.3 and think this issue can
> happen in:
>
> private static IndexReader open(final Directory director
is on upgrading to 2.4.
Mike
On Wed, Apr 8, 2009 at 3:40 PM, Zhang, Lisheng
wrote:
> Hi,
>
> Sorry that my initial message is not clear, I read lucene source code (both
> 1.4.3
> and 2.4.0), and understood more.
>
> The problem is that when using lucene 1.4.3 sometimes when sea
It seems that in 2.4.0 we will never have this issue because this error can
only
happen when concurrent writing.
Is this true?
Thanks very much for helps, Lisheng
> -Original Message-
> From: Zhang, Lisheng
> Sent: Wednesday, April 08, 2009 9:08 AM
> To:
Hi,
We are using lucene 1.4.3, sometimes when two threads try to search,
one thread got error when creating MultiSearcher:
Lock obtain timed out:
Lock@/tmp/lucene-ba94511756a2670adeac03a50532c63c-commit.lock
I read lucene FAQ and searched previous discussions, it seems that this
error should be
Hi,
What's the best free tool for encoding detection? For example we have
a ASCII file README.txt, which needs to be indexed, but we need to
know its encoding before we can convert it to Java String.
I saw some free tools on the market, but have no experiences with any
of them yet? What is the be
ake a look at. I have used it in the past and it
worked reasonably well. Let me know what else you find and how it works for
you.
http://www.olivo.net/software/lc4j/
Good luck!
Jochen Frey
On Fri, Mar 27, 2009 at 9:54 AM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:
> Hi,
>
&
Hi,
Are you aware of any free software for language detection (given certain
text, see if it is French, or Japanese)? I saw Bob Carpenter's previous
mail which explained the principle nicely, but could not locate free tools?
Thanks very much for helps, Lisheng
--
Hi,
What is the best tool (free software) to extract text from
Microsoft Office 2007:
Word 2007, Excel 2007, Power Point 2007
so that we can index them by lucene?
Thanks very much for helps, Lisheng
-
To unsubscribe, e-mail:
tical problems that
should be fixed as mentioned in the JIRA task.
don't know if this helps...
On Tue, Feb 17, 2009 at 9:54 PM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:
> Hi,
>
> Are there free Hebrew and Hindi language analyzers for
> lucene? I searched a
Hi,
Are there free Hebrew and Hindi language analyzers for
lucene? I searched archive and found some discussions,
but did not see clear pointers to downloadable classes.
Thanks very much for helps, Lisheng
-
To unsubscribe, e-ma
Hi,
Inside (priority:beauty ..) there is an AND,
is that operator what you want?
Best regards, Lisheng
-Original Message-
From: Jamie [mailto:ja...@stimulussoft.com]
Sent: Friday, January 16, 2009 3:02 PM
To: java-user@lucene.apache.org
Subject: Search Across All Fields
Hi Everyone
I
lter.java
http://www.nabble.com/file/p20265229/SpanishStemmer.java SpanishStemmer.java
http://www.nabble.com/file/p20265229/stopWords.java stopWords.java
if you improve it, tell me.
Zhang, Lisheng wrote:
>
> Hi,
>
> Is there any Spanish analyzer available for lucene applications?
&g
Hi,
Is there any Spanish analyzer available for lucene applications?
I did not see any in lucene 2.4.0 contribute folders.
Thanks very much for helps, Lisheng
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
Hi,
I have a very large amount of documents indexed, one field is Brand
(untokenized), now I need to find the most popular brand (which brand
is used by most Docs), one way is:
1) open IndexReader.
2) call terms() to get all terms, then filter out terms in field Brand.
3) call termDocs(Term) to g
oo long then probably I will look into implementing a custom analyzer...
Zhang, Lisheng wrote:
>
> Hi,
>
> In case you donot want to toss away any stop words and even
> preserve case, WhiteSpaceAnalyzer can be used, also using
> WhiteSpaceTokenizer would serve as a test (but nee
s there any other way to handle this situation... Especially in
the
above mentioned case, the user is expecting around 5 records and the
query
is fetching more than 550 records.8-O
Thanks.
Zhang, Lisheng wrote:
>
> Hi,
>
> Do you mean that your query phrase is "Health Safety"
Hi Sirish,
A few hours ago I sent a reply to your message, if my
understanding is correct, you indexed a doc with text
as
Health and Safety
and you used phrase
Health Safety
to create a phrase query. If that is the case, this is
normal since you used StandardAnalyzer to tokenize the
input tex
Hi,
Do you mean that your query phrase is "Health Safety",
but docs with "Health and Safety" returned?
If that is the case, the reason is that StandardAnalyzer
filters out "and" (also "or, "in" and others) as stop
words during indexing, and the QueryParser filters those
words out also.
Best reg
Hi, Thanks for helps!
Yes, along the line you mentioned we can reduce the amount
of calculation, but we still need to loop through to count
all docs, so time may still be O(n), I am wondering if we
can avoid the loop to get count directly?
Best regards, Lisheng
-Original Message-
From: M
Hi,
We have been using lucene for years and it serves us well.
Sometimes when we issue a query, we only what to know
how many hits it leads, not want any docs back. Is it possible
to completely avoid score calculation to get total count back?
I understand score calculation needs a loop for all m
Hi,
I encountered one problem in lucene 1.4.3: I called
Searcher.search(, new Sort("myfiled");
In "myfiled", most values looks like number "123456" or sth
similiar, but one field contains a value "Just a TRY", then
I got error:
java.lang.ClassCastException at
org.pache.lucene.search.FieldDocSo
search for all documents should be at least linear to
the total number of documents.
Sören
Zhang, Lisheng schrieb:
> Hi,
>
> I indexed first 220,000, all with a special keyword, I did a simple
> query and only fetched 5 docs, with Hits.length()=220,000.
>
> Then I indexed 44
: linear?
On Tuesday 05 December 2006 03:49, Zhang, Lisheng wrote:
> I found that search time is about linear: 2nd time is about 2 times
> longer than 1st query.
What exactly did you measure, only the search() or also opening the
IndexSearcher? The later depends on index size, thus you sho
Hi,
I indexed first 220,000, all with a special keyword, I did a simple
query and only fetched 5 docs, with Hits.length()=220,000.
Then I indexed 440,000 docs, with the same keyword, query it
again and fetched a few docs, with Hits.length(0=440,000.
I found that search time is about linear: 2nd
01, 2006 2:34 PM
To: java-user@lucene.apache.org
Subject: Re: Search with accents
Yes...here's how I create my QueryParser:
QueryParser parser = new QueryParser("text", new BrazilianAnalyzer());
2006/8/1, Zhang, Lisheng <[EMAIL PROTECTED]>:
> Hi,
>
> Have you used the
Hi,
Have you used the same BrazilianAnalyzer when
searching?
Best regards, Lisheng
-Original Message-
From: Eduardo S. Cordeiro [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 01, 2006 1:40 PM
To: java-user@lucene.apache.org
Subject: Search with accents
Hello there,
I have a brazilia
Hi,
Currently we are using PDFBox to process PDF files and
POI to process DOC/XLS files, before send strings to lucene
for indexing,
Does any one know if PDFBox or POI can process multi-
byte characters like Japanese with various encodings (whatever
specified in PDF or DOC)?
Thanks very much for
er.search() throws
exceptions other than IOException (which is handled by
MultiSearcherThread). These will result in NullpointerExceptions as well.
Not much help I guess, but perhaps some more insight.
/Ronnie
Zhang, Lisheng wrote:
> Hi,
>
> I have not received any feedback yet, any
Hi,
I have not received any feedback yet, any comments
would be greatly appreciated!
Lisheng
-Original Message-
From: Zhang, Lisheng
Sent: Thursday, December 01, 2005 12:30 PM
To: 'java-user@lucene.apache.org'
Subject: NullPointerException in ParallelMultiSearcher
Hi,
We
Hi,
We are using lucene v1.4.3 for some time, in general it is working well.
We often try to search multiple collections at the same time, so we
are using ParallelMultiSearcher, but sometimes we got the following
exception:
java.lang.NullPointerException
at
org.apache.lucene.search
Hi,
We recently encountered a strange behavior in
lucene v1.4.3 QueryParser: we call
QueryParser.parse("-1", "myidfield", new StandardAnalyzer());
and get retured query as:
-myidfield:1 // apparently we want "myidfield:-1"
Currently we can use TermQuery to avoid QueryParser
to bypass this p
Hi,
I would like to know the JavaCC version used to build lucene 1.4?
I could not get this information from downloaded files (only mentioned
JavaCC site).
Thanks very much for helps, Lisheng
-
To unsubscribe, e-mail: [EMAIL PRO
1 - 100 of 105 matches
Mail list logo