Can we use glusterfs to replicate indexed data from one box to another ?

2013-08-19 Thread Zhang, Lisheng
Hi, We are considering to use glusterfs to replicate indexed data from one box to another, I searched Google and found that some people did seem to use glusterfs for this purpose, we are using lucene 3.6. I tested read/write in parallel (thread to search and another thread to index), and fou

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-08-09 Thread Zhang, Lisheng
space efficient when there are a tiny number of documents. Mike McCandless http://blog.mikemccandless.com On Fri, Aug 9, 2013 at 11:55 AM, Zhang, Lisheng wrote: > Hi Mike, > > Any more comments on this issue? > > Thanks and best regards, Lisheng > > -Original Message-

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-08-09 Thread Zhang, Lisheng
Hi Mike, Any more comments on this issue? Thanks and best regards, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Friday, August 02, 2013 7:55 AM To: java-user@lucene.apache.org Subject: RE: lucene 4.3 seems to be much slower in indexing

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-08-02 Thread Zhang, Lisheng
-luceneDir Thanks and best regards, Lisheng -Original Message- From: Zhang, Lisheng Sent: Thursday, August 01, 2013 11:16 AM To: 'java-user@lucene.apache.org' Subject: RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6? Hi Mike, First I really appreciate your

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-31 Thread Zhang, Lisheng
To: Lucene Users Subject: Re: lucene 4.3 seems to be much slower in indexing than lucene 3.6? On Tue, Jul 30, 2013 at 6:13 PM, Zhang, Lisheng wrote: > Hi Mike, > > I did more tests with realistic text from different languages (typical > text for 8 different languages, English one is nov

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-30 Thread Zhang, Lisheng
can each time create searcher on the fly, but seems lucene goes further away from that? Your guidance would be very appreciated, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Saturday, July 27, 2013 11:06 PM To: java-user@lucene.apache.org Sub

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-27 Thread Zhang, Lisheng
cCandless http://blog.mikemccandless.com On Fri, Jul 26, 2013 at 2:55 PM, Zhang, Lisheng wrote: > > Hi, > > I did some basic performance testing, just use random number to generate > text for indexing, > below I attached source java code. The command I used are: > > java TestReal43 inde

RE: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-26 Thread Zhang, Lisheng
, i can assure you lucene 4.3 is way more efficient than 3.6. Well after understanding and tweaking a few things ;) second can you help us understanding what is indexed and how? like what kind of fields? which merge policy ?... Thanks, Nicolas On Fri, Jul 26, 2013 at 11:55 AM, Zhang, Lisheng

RE: Detect a corrupted index

2013-07-26 Thread Zhang, Lisheng
Hi, I used in the following code to detect data corruption in lucene 4.3.0: / import org.apache.lucene.index.CheckIndex; ... CheckIndex checkIndex = new CheckIndex(getLuceneDirectory(folderPath)); CheckIndex.Status status = checkIndex.checkIndex();

lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-07-26 Thread Zhang, Lisheng
Hi, I did some basic performance testing, just use random number to generate text for indexing, below I attached source java code. The command I used are: java TestReal43 index -docCount 500 -start 1 -optimize true -luceneDir mmap java TestReal36 index -docCount 500 -start 1 -optimize true

RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Zhang, Lisheng
I am very sorry, I should have sent to solr user group, not lucene!! Best regards, Lisheng -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 12:17 PM To: 'java-user@lucene.apache.org' Subject: Solr query parser, needs to call setAutoGeneratePhraseQu

Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Zhang, Lisheng
Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (i

RE: lucene Indexer failed to close, but later indexing still OK?

2012-08-06 Thread Zhang, Lisheng
is corrupted but later healed itself)? Thanks very much for helps, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, August 02, 2012 10:56 AM To: java-user@lucene.apache.org Subject: lucene Indexer failed to close, but later indexing

lucene Indexer failed to close, but later indexing still OK?

2012-08-02 Thread Zhang, Lisheng
Hi, We are using lucene 2.3.2 on linux/ubuntu (we will upgrade lucene soon), recently we got exception: read past EOF #012java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:130) at org.apache.lucene.index.CompoundFileReader$CSIn

RE: Lucene indexed data corruption error

2012-06-30 Thread Zhang, Lisheng
blog post about the Java 7 bugs, too, they are closely related: > blog.thetaphi.de > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de > > > > "Zhang, Lisheng" schrieb: > > Hi, > > We have been using lucene 2.3.2 for ye

RE: Lucene indexed data corruption error

2012-06-30 Thread Zhang, Lisheng
helps, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Saturday, June 30, 2012 2:17 PM To: java-user@lucene.apache.org Subject: RE: Lucene indexed data corruption error Thanks for such a quick help! The java we use is: java -version java

RE: Lucene indexed data corruption error

2012-06-30 Thread Zhang, Lisheng
ee my blog post about the Java 7 bugs, too, they are closely related: blog.thetaphi.de -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de "Zhang, Lisheng" schrieb: Hi, We have been using lucene 2.3.2 for years well (yes, we should upgrade). Recently

Lucene indexed data corruption error

2012-06-30 Thread Zhang, Lisheng
Hi, We have been using lucene 2.3.2 for years well (yes, we should upgrade). Recently we encountered data corruption error when commiting IndexWriter: /// background merge hit exception: _14b:c61262 _1ag:c11225 _1gb:c9411 _1gv:c905 _1gw:c50 _1gx:c50 _1gy:c50 _1gz:c50 _1h0:c31 into _1h1 [opti

RE: Score per position

2011-12-13 Thread Zhang, Lisheng
6; so the total score is 0.5*0.5+0.8*0.7. So inside CustomScoreQuery essentially we need to fetch the payloads of good and morning separately (maybe using TermPositions?), and use them to score the document. Is this what you meant ?   Thanks, Arnon. From: "Zhang, Lisheng" To: java-user

RE: Score per position

2011-12-08 Thread Zhang, Lisheng
Hi, A few days ago I asked a similar question: 1) in coming lucene 4.0, there is a feature sort like payload in document level: >lucene 4 has a feature called IndexDocValues which is essentially a > payload per document per field. > > you can read about it here: > http://www.searchworkings.org/b

RE: Boost more recent document

2011-12-01 Thread Zhang, Lisheng
-Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, December 01, 2011 11:34 AM To: Zhang, Lisheng Cc: java-user@lucene.apache.org Subject: Re: Boost more recent document On Thu, Dec 1, 2011 at 8:30 PM, Zhang, Lisheng wrote: > Hi Simon, > >

RE: Boost more recent document

2011-12-01 Thread Zhang, Lisheng
, so I would like to try CustomScoreQuery without cache first? Thanks very much for helps, Lisheng -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, December 01, 2011 11:21 AM To: Zhang, Lisheng Cc: java-user@lucene.apache.org Subject: Re

RE: Boost more recent document

2011-11-30 Thread Zhang, Lisheng
r selected cache. Thanks very much for all your great helps, please point out if you see wrong in above statements? Best regards, Lisheng -Original Message----- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Wednesday, November 30, 2011 1:40 PM To: java-user@lucene.

RE: Boost more recent document

2011-11-30 Thread Zhang, Lisheng
, 2011 at 9:08 PM, Zhang, Lisheng wrote: > Thanks very much for your helps! I got the point, only problem is that > I cannot afford to to use FieldCache because in our app we have many > lucene index data folders, is there another simple way? > > Thanks again, Lisheng > >

RE: Boost more recent document

2011-11-30 Thread Zhang, Lisheng
...@googlemail.com] Sent: Wednesday, November 30, 2011 11:40 AM To: java-user@lucene.apache.org Subject: Re: Boost more recent document On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng wrote: > Hi, > > We need to boost document which is more recent (each doc has time stamp > attribute). I

Boost more recent document

2011-11-30 Thread Zhang, Lisheng
Hi, We need to boost document which is more recent (each doc has time stamp attribute). It seems that we cannot use doc boost at index time because it will be condensed into one byte (cannot differentiate 365 days), so we may use payload (save time stamp as payload) to boost at search time.

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
rd to estimate this in the abstract, I'm afraid you'll just have to try it. Best Erick On Mon, Nov 14, 2011 at 6:40 PM, Zhang, Lisheng wrote: > Our indexed data are around 200~300MB size (each folder), so it is > still small? > > Could you roughly estimate how big the indexed data

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
Our indexed data are around 200~300MB size (each folder), so it is still small? Could you roughly estimate how big the indexed data size (10GB?) needs to be, so that creating IndexReader each time could become a serious issue? Thanks very much for helps! Lisheng -Original Message- Fro

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
Thanks for your reply! The reason why we cannot reuse IndexReader is that our server holds many (>4000) independent index folders, each one corresponds to a separate URL. At any time any folder can be queried, so we cannot hold all of them into memory. In lucene 2.3.2 query is fast even if we rec

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
2.3.2 to 3.1.0 On Mon, Nov 14, 2011 at 11:09 AM, Zhang, Lisheng wrote: > We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In > Action" I learned > that we should "warm up" IndexSearcher and donot expect initial a few queries > to be fast. Make s

Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In Action" I learned that we should "warm up" IndexSearcher and donot expect initial a few queries to be fast. But due to our special app we cannot "warm up" (each query has to use a new IndexSearcher), in lucene 2.3.2 this se

RE: data corruption in lucene index 2.3.2

2011-10-29 Thread Zhang, Lisheng
you closed the IndexWriter (this was fixed in 2.4.0). This means even if you close the writer and a crash occurs the index could become corrupt. Did you have an OS/machine crash on this index? Mike McCandless http://blog.mikemccandless.com On Sat, Oct 29, 2011 at 12:15 PM, Zhang, Lisheng w

RE: data corruption in lucene index 2.3.2

2011-10-29 Thread Zhang, Lisheng
http://blog.mikemccandless.com On Fri, Oct 28, 2011 at 4:57 PM, Zhang, Lisheng wrote: > > We are using lucene 2.3.2 (yes we should upgrade) and recently we had > Exception when opening > index: > > ### > java.io.IOException: read past EOF "urn:schemas-microsoft-com:off

data corruption in lucene index 2.3.2

2011-10-28 Thread Zhang, Lisheng
We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception when opening index: ### java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) at org.apache.lucene.store.BufferedIndexInput.readByte(

RE: Case insensitive sortable column

2011-10-11 Thread Zhang, Lisheng
Hi, Another solution would be to define locale when creating SortField object, if using English locale the sorting should be case insensitive? Best regards, Lisheng -Original Message- From: Senthil V S [mailto:vss...@gmail.com] Sent: Tuesday, October 11, 2011 12:34 PM To: java-user@lucen

Possible partial update of a document?

2011-09-26 Thread Zhang, Lisheng
Hi, I know that we need to delete/index a document in order to update any part of it, but recently we need to index a field which changes rather frequently so that each time reindexing whole document would be inpractical for performance reason. This field is a small integer so I may just trea

RE: Lucene search result produced wrong result (due to java Collation)?

2011-02-28 Thread Zhang, Lisheng
helps, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Saturday, February 26, 2011 5:00 PM To: java-user@lucene.apache.org Subject: Lucene search result produced wrong result (due to java Collation)? Hi, Today I have noticed that sometimes

Lucene search result produced wrong result (due to java Collation)?

2011-02-26 Thread Zhang, Lisheng
Hi, Today I have noticed that sometimes lucene sort produced strange result in plain English names, like (String ASC) l yy liu yu I traced to lucene source code, it seems to be a java English Collator problem (I set Locale.English to SortField), below I reproduced issue by a trivial code (pu

RE: lucene 3.0.3 | phrase query problem

2011-02-11 Thread Zhang, Lisheng
Hi Kumar, 1) For your question in last mail: for tool luke, go to site http://www.getopt.org/luke/ and click "launch luke now", then pointing to your lucene data folder. Also the book "Lucene in Action" is a great source (go to .amazon.com and search this book) where everything (almost) is

RE: lucene 3.0.3 | phrase query problem

2011-02-09 Thread Zhang, Lisheng
Hi, I think using Field.Index.NOT_ANALYZED means ignoring StandardAnalyzer, so we index "sql. server" as one word. You may use luke to see how this field is indexed. In this case we can only search whole term (without case change even), if using the StandardAnalyzer to analyze "sql. server" w

outlook MSG file text extraction tool?

2011-02-03 Thread Zhang, Lisheng
Hi, Do you know any good open source tool to extract text from MS outlook MSG files? 1) Apache Tika seems not to support *.msg yet. 2) Apache POI recently started to support *.msg (3.7 10/2010), but I run into several problems (cannot process Japanese well, null pointer exception ..)? Thank

RE: How to handle more than Integer.MAX_VALUE documents?

2010-11-03 Thread Zhang, Lisheng
8 AM, Lance Norskog wrote: >>> 2billion is a hard limit. Usually people split indexes into multiple >>> index long before this, and use the parallel multi reader (I think) to >>> read from all of the sub-indexes. >>> >>> On Mon, Nov 1, 2010 at 2:16 PM, Zha

RE: How to handle more than Integer.MAX_VALUE documents?

2010-11-02 Thread Zhang, Lisheng
and use the parallel multi reader (I think) to >> read from all of the sub-indexes. >> >> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng >> wrote: >>> >>> Hi, >>> >>> Now lucene uses integer as document id, so it means we cannot have mo

How to handle more than Integer.MAX_VALUE documents?

2010-11-01 Thread Zhang, Lisheng
Hi, Now lucene uses integer as document id, so it means we cannot have more than 2^31-1 documents within one collection? Even if we use MultiSearcher the document id is still integer so it seems this is still a problem? We have been using lucene for some time and our document count is growing ra

RE: Need Help: Lucene with PHP/Java Bridge

2010-10-23 Thread Zhang, Lisheng
Hi, Have you compared your java version in these two boxes? Also PHP version? Did you run indexer from command line or from browser? I used Zend java bridge before and found java version too low may cause problem? Best regards, Lisheng -Original Message- From: dian puma [mailto:dianp.

RE: In lucene 2.3.2, needs to stop optimization?

2010-09-24 Thread Zhang, Lisheng
indexes that significantly improve search speed. I'm not sure..but I think indexWriter.getReader() for almost realtime was added to 2.9, so you can keep your writer always open and get very cheaply a new reader on each search request. On Fri, Sep 24, 2010 at 09:47, Zhang, Lisheng wrote:

RE: In lucene 2.3.2, needs to stop optimization?

2010-09-23 Thread Zhang, Lisheng
be appreciated, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, September 23, 2010 6:11 PM To: java-user@lucene.apache.org Subject: In lucene 2.3.2, needs to stop optimization? Hi, We are using lucene 2.3.2, now we need to index e

In lucene 2.3.2, needs to stop optimization?

2010-09-23 Thread Zhang, Lisheng
Hi, We are using lucene 2.3.2, now we need to index each document as fast as possible, so user can almost immediately search it. So I am considering stop IndexWriter optimization during real time, then in relatively off-time like late night we may call IndexWriter optimize method explicitly onc

RE: Building maven artifacts

2010-07-19 Thread Zhang, Lisheng
:03 AM To: java-user@lucene.apache.org Subject: Re: Building maven artifacts Hi, I don't know. I tried to setup somethind like this: But error is the same. Maybe there are any other parameters? 2010/7/16 Zhang, Lisheng > Hi, > > I never this kind of build before, but just

RE: Building maven artifacts

2010-07-16 Thread Zhang, Lisheng
Hi, I never this kind of build before, but just from the error message I guess it could mean two variables: ${project.artifactId} ${project.version} are not defined (otherwise exact jar file name would be printed out)? Could it be some environment setup issue? Best regards, Lisheng -Origi

Could multiple indexers change same collections at the same time?

2010-06-24 Thread Zhang, Lisheng
Hi, I remembered I tested earlier lucene 1.4 and 2.4, and found the following: # it is OK for multiple searchers to search the same collection. # it is OK for one IndexerWriter to edit and multiple searchers to search at the same time. # it is generally NOT OK for multiple IndexerWriter to

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
Hi, It looks good to me, but I did not test, when testing, we may print out both initialQuery.toString() // query produced by QueryParser finalQuery.toString() // query after your new function as comparison, besides testing the query result. Best regards, Lisheng -Original Message- F

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
have not done that myself before, but feel it should work. Best regards, Lisheng -Original Message- From: Christopher Condit [mailto:con...@sdsc.edu] Sent: Friday, April 30, 2010 2:08 PM To: java-user@lucene.apache.org Cc: Zhang, Lisheng Subject: RE: Modify TermQueries or Tokens Hi Li

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
rked for me well. Best regards, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Friday, April 30, 2010 1:41 PM To: java-user@lucene.apache.org Subject: RE: Modify TermQueries or Tokens Hi, Lucene already have class WildcardQuery, I think

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
Hi, Lucene already have class WildcardQuery, I think you can add "*" on either side (or both), when creating Term: http://lucene.apache.org/java/3_0_1/api/core/index.html But notice by default QueryParser cannot parse *queryString. Best regards, Lisheng -Original Message- From: Christo

RE: Improving Zend lucene search - general guidance?

2010-02-19 Thread Zhang, Lisheng
p://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! Zhang, Lisheng wrote: > Hi, > > I have been using Java/lucene for a few years and it works well for me. > &g

Improving Zend lucene search - general guidance?

2010-02-19 Thread Zhang, Lisheng
Hi, I have been using Java/lucene for a few years and it works well for me. Recently we started to use PHP/lucene from Zend, I found some problems, especially that for each query, it immediately loads whole term id/score (other info..) array into memory, this would cause memory exhausion if t

RE: Lucene 1.4.3 "Already closed" IOException

2009-10-23 Thread Zhang, Lisheng
?  Maybe time to consider an upgrade. > > Anyway, if you're getting that exception when creating a searcher I > guess you are using a constructor that takes an IndexReader and a > further guess would be that something has closed it. > > -- > Ian. > > > On Tue,

RE: Lucene 1.4.3 "Already closed" IOException

2009-10-20 Thread Zhang, Lisheng
Hi, We are using lucene 1.4.3, sometimes we encounter an error when creating Searcher object with IOException: "Already closed". I searched lucene message archive but did not see conclusive answer, any help would be very appreciated. Best regards, Lisheng --

Lucene 1.4.3 "Already closed" IOException

2009-10-20 Thread Zhang, Lisheng
Hi, We are using lucene 1.4.3, sometimes we encounter an error when creating Searcher object with IOException: "Already closed". I searched lucene message archive but did not see conclusive answer, any help would be very appreciated. Best regards, Lisheng ---

RE: Search with whitespaces

2009-09-25 Thread Zhang, Lisheng
Hi, Simplest way is to add one condition (assuming field is f1): f1:notebook f1:"note book" which means (notebook OR "note book"), 2nd condition is phrase search. Best regards, Lisheng -Original Message- From: Alex Bredariol Grilo [mailto:abgr...@gmail.com] Sent: Friday, September 25,

Concurrent indexing thread/process safety issue - can we have document-lock instead of directory-lock in future?

2009-09-15 Thread Zhang, Lisheng
Hi, I read through the lucene thread/process safety issue for concurrent indexing, my understanding is that each indexing through IndexWriter will lock the whole index directory. Now we need to index a community blog where many people add/update, so queuing all those indexing requests would be a

RE: Lucene memory usage

2009-06-10 Thread Zhang, Lisheng
Hi, Does this issue has anything to do with the line: > TopScoreDocCollector collector = new TopScoreDocCollector(10); if we do: > TopScoreDocCollector collector = new TopScoreDocCollector(2); instead (only see top two documents), could memory usage be less? Best regards, Lisheng -Or

RE: Possible bug in QueryParser when using CJKAnalyzer (lucene 2.4.1)

2009-06-02 Thread Zhang, Lisheng
9 10:39 PM > To: java-user@lucene.apache.org > Subject: Re: Possible bug in QueryParser when using CJKAnalyzer (lucene > 2.4.1) > > > I'm not sure this is the same case, but there is a report and patch for > CJKTokenizer in JARA: > > https://issues.apache.org/jira/br

RE: Possible bug in QueryParser when using CJKAnalyzer (lucene 2.4.1)

2009-06-02 Thread Zhang, Lisheng
LUCENE-973 Koji Zhang, Lisheng wrote: > Hi, > > When I use lucene 2.4.1 QueryParser with CJKAnalyzer, somehow > it always generates an extra space, for example, if the input is "ABC", > the query would be: > > myfield"AB BC " // should be myfield:"AB BC&

Possible bug in QueryParser when using CJKAnalyzer (lucene 2.4.1)

2009-06-01 Thread Zhang, Lisheng
Hi, When I use lucene 2.4.1 QueryParser with CJKAnalyzer, somehow it always generates an extra space, for example, if the input is "ABC", the query would be: myfield"AB BC " // should be myfield:"AB BC" If I create PhraseQuery directly it does work. From Luke I know indexing works OK. In lucene

RE: RangeQuery & TooManyClausesException : Lucene 2.4

2009-05-20 Thread Zhang, Lisheng
Hi, I did not see method setConstantScoreRewrite method in RangeQuery class? Best regards, Lisheng -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, May 20, 2009 11:10 AM To: java-user@lucene.apache.org Subject: Re: RangeQuery & TooManyClause

Lucene 2.9

2009-05-18 Thread Zhang, Lisheng
Hi, I know lucene 2.9 would be the next release, do we have the release date yet (roughly, 6 months away, or longer)? Knowing this would help us to schedule our work, thanks for helps! Lisheng - To unsubscribe, e-mail: java-us

RE: Lucene 1.4.3: Error when creating Searcher

2009-04-08 Thread Zhang, Lisheng
ts. Mike On Wed, Apr 8, 2009 at 4:47 PM, Zhang, Lisheng wrote: > Hi, > > Client said they did not index, all they do is searching (create > Searcher objects), I looked at 1.4.3 and think this issue can > happen in: > > private static IndexReader open(final Directory director

RE: Lucene 1.4.3: Error when creating Searcher

2009-04-08 Thread Zhang, Lisheng
is on upgrading to 2.4. Mike On Wed, Apr 8, 2009 at 3:40 PM, Zhang, Lisheng wrote: > Hi, > > Sorry that my initial message is not clear, I read lucene source code (both > 1.4.3 > and 2.4.0), and understood more. > > The problem is that when using lucene 1.4.3 sometimes when sea

RE: Lucene 1.4.3: Error when creating Searcher

2009-04-08 Thread Zhang, Lisheng
It seems that in 2.4.0 we will never have this issue because this error can only happen when concurrent writing. Is this true? Thanks very much for helps, Lisheng > -Original Message- > From: Zhang, Lisheng > Sent: Wednesday, April 08, 2009 9:08 AM > To:

Lucene 1.4.3: Error when creating Searcher

2009-04-08 Thread Zhang, Lisheng
Hi, We are using lucene 1.4.3, sometimes when two threads try to search, one thread got error when creating MultiSearcher: Lock obtain timed out: Lock@/tmp/lucene-ba94511756a2670adeac03a50532c63c-commit.lock I read lucene FAQ and searched previous discussions, it seems that this error should be

Encoding detection free software?

2009-03-27 Thread Zhang, Lisheng
Hi, What's the best free tool for encoding detection? For example we have a ASCII file README.txt, which needs to be indexed, but we need to know its encoding before we can convert it to Java String. I saw some free tools on the market, but have no experiences with any of them yet? What is the be

RE: Free software for language detection

2009-03-27 Thread Zhang, Lisheng
ake a look at. I have used it in the past and it worked reasonably well. Let me know what else you find and how it works for you. http://www.olivo.net/software/lc4j/ Good luck! Jochen Frey On Fri, Mar 27, 2009 at 9:54 AM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > Hi, > &

Free software for language detection

2009-03-27 Thread Zhang, Lisheng
Hi, Are you aware of any free software for language detection (given certain text, see if it is French, or Japanese)? I saw Bob Carpenter's previous mail which explained the principle nicely, but could not locate free tools? Thanks very much for helps, Lisheng --

Text extraction tool for Microsoft Office 2007

2009-02-21 Thread Zhang, Lisheng
Hi, What is the best tool (free software) to extract text from Microsoft Office 2007: Word 2007, Excel 2007, Power Point 2007 so that we can index them by lucene? Thanks very much for helps, Lisheng - To unsubscribe, e-mail:

RE: Hebrew and Hindi analyzers

2009-02-18 Thread Zhang, Lisheng
tical problems that should be fixed as mentioned in the JIRA task. don't know if this helps... On Tue, Feb 17, 2009 at 9:54 PM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > Hi, > > Are there free Hebrew and Hindi language analyzers for > lucene? I searched a

Hebrew and Hindi analyzers

2009-02-17 Thread Zhang, Lisheng
Hi, Are there free Hebrew and Hindi language analyzers for lucene? I searched archive and found some discussions, but did not see clear pointers to downloadable classes. Thanks very much for helps, Lisheng - To unsubscribe, e-ma

RE: Search Across All Fields

2009-01-16 Thread Zhang, Lisheng
Hi, Inside (priority:beauty ..) there is an AND, is that operator what you want? Best regards, Lisheng -Original Message- From: Jamie [mailto:ja...@stimulussoft.com] Sent: Friday, January 16, 2009 3:02 PM To: java-user@lucene.apache.org Subject: Search Across All Fields Hi Everyone I

RE: Any Spanish analyzer available?

2008-10-31 Thread Zhang, Lisheng
lter.java http://www.nabble.com/file/p20265229/SpanishStemmer.java SpanishStemmer.java http://www.nabble.com/file/p20265229/stopWords.java stopWords.java if you improve it, tell me. Zhang, Lisheng wrote: > > Hi, > > Is there any Spanish analyzer available for lucene applications? &g

Any Spanish analyzer available?

2008-10-23 Thread Zhang, Lisheng
Hi, Is there any Spanish analyzer available for lucene applications? I did not see any in lucene 2.4.0 contribute folders. Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,

How to find most popular terms quickly?

2008-02-26 Thread Zhang, Lisheng
Hi, I have a very large amount of documents indexed, one field is Brand (untokenized), now I need to find the most popular brand (which brand is used by most Docs), one way is: 1) open IndexReader. 2) call terms() to get all terms, then filter out terms in field Brand. 3) call termDocs(Term) to g

RE: Phrase Query Problem

2007-12-18 Thread Zhang, Lisheng
oo long then probably I will look into implementing a custom analyzer... Zhang, Lisheng wrote: > > Hi, > > In case you donot want to toss away any stop words and even > preserve case, WhiteSpaceAnalyzer can be used, also using > WhiteSpaceTokenizer would serve as a test (but nee

RE: Phrase Query Problem

2007-12-18 Thread Zhang, Lisheng
s there any other way to handle this situation... Especially in the above mentioned case, the user is expecting around 5 records and the query is fetching more than 550 records.8-O Thanks. Zhang, Lisheng wrote: > > Hi, > > Do you mean that your query phrase is "Health Safety"

RE: Phrase Query Problem

2007-12-17 Thread Zhang, Lisheng
Hi Sirish, A few hours ago I sent a reply to your message, if my understanding is correct, you indexed a doc with text as Health and Safety and you used phrase Health Safety to create a phrase query. If that is the case, this is normal since you used StandardAnalyzer to tokenize the input tex

RE: Phrase Query Problem

2007-12-17 Thread Zhang, Lisheng
Hi, Do you mean that your query phrase is "Health Safety", but docs with "Health and Safety" returned? If that is the case, the reason is that StandardAnalyzer filters out "and" (also "or, "in" and others) as stop words during indexing, and the QueryParser filters those words out also. Best reg

RE: How to avoid score calculation completely?

2007-05-24 Thread Zhang, Lisheng
Hi, Thanks for helps! Yes, along the line you mentioned we can reduce the amount of calculation, but we still need to loop through to count all docs, so time may still be O(n), I am wondering if we can avoid the loop to get count directly? Best regards, Lisheng -Original Message- From: M

How to avoid score calculation completely?

2007-05-23 Thread Zhang, Lisheng
Hi, We have been using lucene for years and it serves us well. Sometimes when we issue a query, we only what to know how many hits it leads, not want any docs back. Is it possible to completely avoid score calculation to get total count back? I understand score calculation needs a loop for all m

Sort in Lucene 1.4.3

2007-04-28 Thread Zhang, Lisheng
Hi, I encountered one problem in lucene 1.4.3: I called Searcher.search(, new Sort("myfiled"); In "myfiled", most values looks like number "123456" or sth similiar, but one field contains a value "Just a TRY", then I got error: java.lang.ClassCastException at org.pache.lucene.search.FieldDocSo

RE: Lucene search performance: linear?

2006-12-05 Thread Zhang, Lisheng
search for all documents should be at least linear to the total number of documents. Sören Zhang, Lisheng schrieb: > Hi, > > I indexed first 220,000, all with a special keyword, I did a simple > query and only fetched 5 docs, with Hits.length()=220,000. > > Then I indexed 44

RE: Lucene search performance: linear?

2006-12-05 Thread Zhang, Lisheng
: linear? On Tuesday 05 December 2006 03:49, Zhang, Lisheng wrote: > I found that search time is about linear: 2nd time is about 2 times > longer than 1st query. What exactly did you measure, only the search() or also opening the IndexSearcher? The later depends on index size, thus you sho

Lucene search performance: linear?

2006-12-04 Thread Zhang, Lisheng
Hi, I indexed first 220,000, all with a special keyword, I did a simple query and only fetched 5 docs, with Hits.length()=220,000. Then I indexed 440,000 docs, with the same keyword, query it again and fetched a few docs, with Hits.length(0=440,000. I found that search time is about linear: 2nd

RE: Search with accents

2006-08-01 Thread Zhang, Lisheng
01, 2006 2:34 PM To: java-user@lucene.apache.org Subject: Re: Search with accents Yes...here's how I create my QueryParser: QueryParser parser = new QueryParser("text", new BrazilianAnalyzer()); 2006/8/1, Zhang, Lisheng <[EMAIL PROTECTED]>: > Hi, > > Have you used the

RE: Search with accents

2006-08-01 Thread Zhang, Lisheng
Hi, Have you used the same BrazilianAnalyzer when searching? Best regards, Lisheng -Original Message- From: Eduardo S. Cordeiro [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 01, 2006 1:40 PM To: java-user@lucene.apache.org Subject: Search with accents Hello there, I have a brazilia

Can PDFBox or POI handle multi-byte characters with different enc odings?

2006-02-10 Thread Zhang, Lisheng
Hi, Currently we are using PDFBox to process PDF files and POI to process DOC/XLS files, before send strings to lucene for indexing, Does any one know if PDFBox or POI can process multi- byte characters like Japanese with various encodings (whatever specified in PDF or DOC)? Thanks very much for

RE: NullPointerException in ParallelMultiSearcher

2005-12-21 Thread Zhang, Lisheng
er.search() throws exceptions other than IOException (which is handled by MultiSearcherThread). These will result in NullpointerExceptions as well. Not much help I guess, but perhaps some more insight. /Ronnie Zhang, Lisheng wrote: > Hi, > > I have not received any feedback yet, any

RE: NullPointerException in ParallelMultiSearcher

2005-12-20 Thread Zhang, Lisheng
Hi, I have not received any feedback yet, any comments would be greatly appreciated! Lisheng -Original Message- From: Zhang, Lisheng Sent: Thursday, December 01, 2005 12:30 PM To: 'java-user@lucene.apache.org' Subject: NullPointerException in ParallelMultiSearcher Hi, We

NullPointerException in ParallelMultiSearcher

2005-12-01 Thread Zhang, Lisheng
Hi, We are using lucene v1.4.3 for some time, in general it is working well. We often try to search multiple collections at the same time, so we are using ParallelMultiSearcher, but sometimes we got the following exception: java.lang.NullPointerException at org.apache.lucene.search

Abnormal behavior in QueryParser

2005-10-07 Thread Zhang, Lisheng
Hi, We recently encountered a strange behavior in lucene v1.4.3 QueryParser: we call QueryParser.parse("-1", "myidfield", new StandardAnalyzer()); and get retured query as: -myidfield:1 // apparently we want "myidfield:-1" Currently we can use TermQuery to avoid QueryParser to bypass this p

JavaCC version for lucene 1.4

2005-09-27 Thread Zhang, Lisheng
Hi, I would like to know the JavaCC version used to build lucene 1.4? I could not get this information from downloaded files (only mentioned JavaCC site). Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PRO

  1   2   >