Because some of the queries that I have to convert (without modifying
them, unfortunately) have a half literally a page of statements
expressed like that that, if expanded, would equal a several page long
lucene query.
On Wed, Sep 2, 2009 at 6:42 PM, Luis Alves wrote:
> Why can't you use a OR? got
Why can't you use a OR? gotham OR gothic
Is it possible to translate this sort of Perl regex into a lucene query:
/goth(am|ic)/
Where the only results that would be returned would be gotham or gothic?
Thanks,
Mike
-
To unsub
Have you tried the regex package in lucene's contrib?
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/regex/package-summary.html
there are several implementations, I am not sure if one really exactly
"perl compatible", but for your example I think it will do the trick.
On Wed, S
Is it possible to translate this sort of Perl regex into a lucene query:
/goth(am|ic)/
Where the only results that would be returned would be gotham or gothic?
Thanks,
Mike
-
To unsubscribe, e-mail: java-user-unsubscr...@lucen
Hi Grant,
I have now followed Daniel's advice and catch the exception with:
try {
indexWriter.addDocument(doc);
} catch (CorruptIndexException ex) {
throw new IndexerException ("CorruptIndexException on doc: " +
doc.toString(), ex);
} catch (IOException ex) {
Hi,
If I use tika for parsing HTML code and inject parsed String to a lucene
analyzer. What about the offset information for KWIC and return to text
(like the google cache view)? how can I keep track of the offsets
between tika parser and lucene analyzer?
What are the solutions/ideas to do a sort
See "DuplicateFilter" in contrib.
http://markmail.org/message/lsvnpu7mwhht3a4p
Cheers
Mark
- Original Message
From: Ganesh
To: java-user@lucene.apache.org
Sent: Wednesday, 2 September, 2009 12:38:35
Subject: Re: First result in the group
I have a field called category and all docume
I see ... the solution I have in mind is not simple, but it follows the
Collector approach. Index categories as payloads of documents such that
there is one field (cats:all for example) that includes a posting list for
all documents, each has the categories it is associated w/ in its payload:
cats:
I have a field called category and all documents will have belong to some
category( say some belong to X and some Y etc). The field values may change
dynamically. I want the search results to be filterted to retrieve one document
per category.
This is similar to 'group by' feature in database.
What do you mean by "first result in the group"? What is a group?
On Wed, Sep 2, 2009 at 1:36 PM, Ganesh wrote:
> Hello all,
>
> I want to retrieve the first result in the group. How to acheive this?
> Currently i am parsing all the results, using a hash and avoiding duplicate
> entries.
>
> Is
So long as you can ensure, external to Lucene, that only one
IndexWriter is open at once on the index, you can disable all of
Lucene's normal locking. But you must be certain: if you accidentally
allow two IndexWriter's to be open at once, it will quickly corrupt
the index.
Beyond locking, Lucene
Hello all,
I want to retrieve the first result in the group. How to acheive this?
Currently i am parsing all the results, using a hash and avoiding duplicate
entries.
Is there any better way?
Regards
Ganesh
Send instant messages to your online friends http://in.messenger.yahoo.com
--
Hey there, AFAIK this problem on S3 has not been solved but anyway
there might be other solutions to overcome this problem. As you are
running on amazon anyway you might wanna consider to have some locking
service like ZooKeeper (http://hadoop.apache.org/zookeeper/) which
could help you with other
>>I need to start off with this project where we can find the ranking of
>>controversial articles. Could anyone kindly help me how to start?
Check out the wikipedia "logging" dumps which contain the reasons for actions
on page titles (including ip blocks and deletes) but without the bulk of the
I am exploring the possibility of creating large lucene indices via ec2/s3.
Till now I have got only teh following url :
http://www.kimchy.org/lucene-and-amazon-s3/
But still dont know whether the lucene locking problem ( on a distributed FS
like S3/DFS ) is fixed or not. Any information is great
hossman wrote:
>
> "the second", and "no"
>
Thanks for that.
Concerning the *theoretical* performance difference, for the mid-size index,
what will it be in % roughly?
Are there any way to make indexReader.docFreqs() reflect the changes faster,
i.e. without the need to optimize()?
-
Kon
16 matches
Mail list logo