Any commets are suggestions are greatly appreciated.
Regards
Ganesh
- Original Message -
From: "Ganesh" <[EMAIL PROTECTED]>
To:
Sent: Thursday, October 23, 2008 3:45 PM
Subject: Re: Multisearcher will maintain index order sorting?
Multisearcher after performing search on second inde
any comments / help on this question ?
thanks,
Aashish
Hi,
I want to use lucene for a simple search engine. If I use the code like
this,
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse(line);
searcher.search(query)
above code doesn't give me regular expr
thks steve, i get it.
2008/10/24 Steven A Rowe <[EMAIL PROTECTED]>
> Hi James,
>
> On 10/23/2008 at 8:30 AM, James liu wrote:
> > public class AnalyzerTest {
> >@Test
> >public void test() throws ParseException {
> >QueryParser parser = new MultiFieldQueryParser(new
> String[]{"ti
Hi,
Is there any Spanish analyzer available for lucene applications?
I did not see any in lucene 2.4.0 contribute folders.
Thanks very much for helps, Lisheng
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
Glen Newton wrote:
2008/10/23 Michael McCandless <[EMAIL PROTECTED]>:
Mark Miller wrote:
Glen Newton wrote:
2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
It sounds like you might have some thread synchronization issues
outside
of
Lucene. To simplify things a bit, you might try just usin
2008/10/23 Michael McCandless <[EMAIL PROTECTED]>:
>
> Mark Miller wrote:
>
>> Glen Newton wrote:
>>>
>>> 2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
>>>
It sounds like you might have some thread synchronization issues outside
of
Lucene. To simplify things a bit, you might try just u
Mark Miller wrote:
Glen Newton wrote:
2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
It sounds like you might have some thread synchronization issues
outside of
Lucene. To simplify things a bit, you might try just using one
IndexWriter.
If I remember right, the IndexWriter is now pretty effi
Also, could you kill your process with -QUIT (on Linux; maybe there is
something analogous on Windows?) when you see the threads hanging?
That will give a stack dump for every thread.
Mike
Grant Ingersoll wrote:
Can you describe your process a bit more? Are you measuring just
the Luce
Glen Newton wrote:
2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
It sounds like you might have some thread synchronization issues outside of
Lucene. To simplify things a bit, you might try just using one IndexWriter.
If I remember right, the IndexWriter is now pretty efficient, and there
isn't
Can you describe your process a bit more? Are you measuring just the
Lucene part or the whole ingestion part as well? If it's the latter,
how do you know the issue is in Lucene? PDF extraction is annoying at
best and highly problematic at its worst. Not saying it isn't Lucene,
but I've
Hi James,
On 10/23/2008 at 8:30 AM, James liu wrote:
> public class AnalyzerTest {
>@Test
>public void test() throws ParseException {
>QueryParser parser = new MultiFieldQueryParser(new String[]{"title",
> "body"}, new StandardAnalyzer());
>Query query1 = parser.parse("中文"
2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
> It sounds like you might have some thread synchronization issues outside of
> Lucene. To simplify things a bit, you might try just using one IndexWriter.
> If I remember right, the IndexWriter is now pretty efficient, and there
> isn't much need to inde
It sounds like you might have some thread synchronization issues outside
of Lucene. To simplify things a bit, you might try just using one
IndexWriter. If I remember right, the IndexWriter is now pretty
efficient, and there isn't much need to index to smaller indexes and
then merge. There is a
You might want to look at my indexing of 6.4 million PDF articles,
full-text and metadata. It resulted in an 83GB index taking 20.5 hours
to run. It uses multiple writers, is massively multithreaded.
More info here:
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
Che
Hi,
We are trying to index large collection of PDF documents, sizes varying
from few KB to few GB. Lucene 2.3.2 with jdk 1.6.0_01 (with PDFBox for
text extraction) and on Windows as well as CentOS Linux. Used java -Xms
and -Xmx options, both at 1080m, even though we have 4GB on Windows and
32 GB
You can search the archives for some background info. Also, Michael
Busch has a nice presentation from ApacheCon at http://people.apache.org/~buschmi/apachecon/AdvancedIndexingLuceneAtlanta07.ppt
Basically, the payload allows you to associate an arbitrary byte array
with 1 or more terms.
O
Hi all,
Has anyone used the payload functionality in Lucene? I would really
appreciate if someone can provide an explain using a code or something.
Thanks,
Anshul
Well, assuming that token_count is an indexed field
in your documents (i.e. not something you're
computing on the fly), just use a RangeQuery for the numeric
part. Actually, you probably want to use
ConstantScoreRangeQuery...
The only thing you have to watch is that Lucene does a
lexical compare,
It looks to me like you've got a space between the
characters in the second example
Best
Erick
2008/10/23 James liu <[EMAIL PROTECTED]>
> public class AnalyzerTest {
> @Test
> public void test() throws ParseException {
> QueryParser parser = new MultiFieldQueryParser(new String[]{"
Compass handles that nicely.
You can first query, lucene and building a IN (...) in your SQL db.
Or you can ask your SQL first, and handling it with a bitset in Lucene.
M.
On Thu, 23 Oct 2008 14:27:53 +0200, Niels Ott <[EMAIL PROTECTED]>
wrote:
> Hi everybody,
>
> I need to query for documents
Hi everybody,
I need to query for documents not only for search terms but also for
numeric values (or other general types). Let me try to explain with a
hypothetical example.
Assuming there is a value for the number words in each document (or the
number of person names, or whatever), I would wan
public class AnalyzerTest {
@Test
public void test() throws ParseException {
QueryParser parser = new MultiFieldQueryParser(new String[]{"title",
"body"}, new StandardAnalyzer());
Query query1 = parser.parse("中文");
Query query2 = parser.parse("中 文");
System.out.pri
Multisearcher after performing search on second index, adds the resultant
docid with the maxdocid of the first index. In my case it would be 3. After
incrementing the docid, the document is inserted in to the
FieldDocSortedHitQueue. FieldDocSortedHitQueue is an extension of priority
queue shoul
because when you want to find X of second index, shoud pass docId=3 to
MultiSearcher and MultiSearcher can Find Sub Search of this Document with
calculation length of all subSearcher.
for example when you get doc with DocID 3(Second X), multisearch (see the
code of multisearcher doc(int i)), mines
In IndexA there are 3 docs
DocID, Terms
0,X
1,X Y
2,X Z
In IndexB there are 3 docs
DocID, Terms
0,X
1,X Y
2,X Z
When i do sort on indexed order using Multisearcher and
ParallelMultiSearcher, it returns the result
0,X
3,X
1,X Y
4,X Y
2,X Z
5,X Z
But it should be in the order of 0,1,2,3,4,5. Co
Hi Grant and Jose,
just to give some more details, as Jose said avg_length is precalculated
at indexing time using an specific Similarity class. Basically this can
be done through the lengthNorm method, for each document and field the
total length is stored, when the indexing process is finish
Multisearcher and ParallelMultiSearcher, when requested to sort on doc
(indexed order), it merges the result by docID of each DB.
Regards
Ganesh
- Original Message -
From: "Paul Smith" <[EMAIL PROTECTED]>
To:
Sent: Thursday, October 23, 2008 10:57 AM
Subject: Re: Multisearcher will m
27 matches
Mail list logo