For a quick java approach, give yourself 3 minutes and try to use
DBSight to access the database. You can simply use "select * from
mw_searchindex" as a starting point. It'll build the index for you.
However, you may need to pluggin your custom analyzer for media wiki's
format(Or maybe not).
--
C
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Helmut Jarausch schrieb:
> I know how to set DEFAULT_OPERATOR_AND for an individual QueryParser
> Objekt (after creation)
>
> Since I always want this to be set, is there a means to set a (global)
> option such that any QueryParser object has this de
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Helmut Jarausch schrieb:
Hi Helmut,
> I know how to set DEFAULT_OPERATOR_AND for an individual QueryParser
> Objekt (after creation)
>
> Since I always want this to be set, is there a means to set a (global)
> option such that any QueryParser object
Liaqat,
Out of curiosity - what are you using to analyze and index Urdu? AraMorph or
something else?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Liaqat Ali <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, Decembe
Otis, if you're willing to use some non-Java code for your task...
1) Wikipedia uses Lucene for their full-text searches, and the module
is part of Mediawiki. You could use this as follows:
- Install Mediawiki
- Load your Wikipedia dump into MW (and MySQL)
- Build a search index for the Lucene
Hi,
I need to index a Wikipedia dump. I know there is code in contrib/benchmark
for indexing *English* Wikipedia for benchmarking purposes. However, I'd like
to index a non-English dump, and I actually don't need it for benchmarking, I
just want to end up with a Lucene index.
Any suggestions
Michael McCandless wrote:
Ruslan Sivak wrote:
I have an index of about 10mb. Since it's so small, I would like to
keep it loaded in memory, and reload it about every minute or so,
assuming that it has changed on disk. I have the following code,
which works, except it doesn't reload the cha
Bob,
Move the following line in your if block:
Sort sort = new Sort(sortColumn, desc);
That will fix your OOM problem.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Bob Daha <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday,
Have you tried with ~3 or ~4? Just curious...
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Shakti_Sareen <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 3:13:47 AM
Subject: help required ... ~ operator
Shakti,
I think you provided the answer:
"sign* NOT Machine"
or
"sign* -Machine"
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Shakti_Sareen <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 4:48:58 AM
Subje
Ruslan Sivak wrote:
I have an index of about 10mb. Since it's so small, I would like
to keep it loaded in memory, and reload it about every minute or
so, assuming that it has changed on disk. I have the following
code, which works, except it doesn't reload the changes.
protected String
On Wednesday 12 December 2007 03:34:08 Helmut Jarausch wrote:
> Hi,
>
> I know how to set DEFAULT_OPERATOR_AND for an individual QueryParser
> Objekt (after creation)
>
> Since I always want this to be set, is there a means to set a (global)
> option such that any QueryParser object has this defaul
The on-disk index gets updated. Something like this:
The second indexDoc function is what does the actual indexing, but this
should have the relevant content.
public void indexDoc(int userId) throws ClassNotFoundException,
SQLException, CorruptIndexException, IOException
{
IndexWri
Take a look at: https://issues.apache.org/jira/browse/LUCENE-794
This is an extension to the Highlighter that highlights span and
proximity queries. If you rewrite the query it will also do fuzzy
queries. I am sure you can easily steal some of the code to do what you
want.
Keep in mind, beca
I can't speak to the errors, but how is the index being updated? An
indexwriter buffers changes and periodically flushes them out to
disk. So the writer may not have flushed your data, depending
upon how it's written.
Best
Erick
On Dec 11, 2007 5:37 PM, Ruslan Sivak <[EMAIL PROTECTED]> wrote:
>
Look into SpanNearQuery. It has a slop which lets you say how close you
want the terms to be. For a single document, if you are going to be
doing a lot of these searches, I recommend using a MemoryIndex.
Russ
Jose Luna wrote:
Hello,
I am looking for some advice regarding which tools I migh
I have an index of about 10mb. Since it's so small, I would like to
keep it loaded in memory, and reload it about every minute or so,
assuming that it has changed on disk. I have the following code, which
works, except it doesn't reload the changes.
protected String indexName;
protected Ind
use luke to explroe the index. the content is present in the content field.
However, it is not stored so you can only search on it.
On Aug 1, 2007 9:59 AM, Srinivasarao Vundavalli <[EMAIL PROTECTED]>
wrote:
> Hi,
> Where does (in which field) nutch stores the content of a document
> while in
Thanks for pointing me to the right class to use.
On Dec 11, 2007 3:23 AM, Doron Cohen <[EMAIL PROTECTED]> wrote:
> Yes that's right, my mistake.
>
> In fact even after reading your comment I was puzzled
> because PhraseScorer indeed requires *all* phrase-positions
> to be satisfied in order to m
Hello,
I am looking for some advice regarding which tools I might use to solve
my problem. I apologize ahead of time for the long explanation.
Problem Description: I would like to index a set of very large HTML
documents. I would then be able to run two different kinds of queries:
proximi
Hi,
I know how to set DEFAULT_OPERATOR_AND for an individual QueryParser
Objekt (after creation)
Since I always want this to be set, is there a means to set a (global)
option such that any QueryParser object has this default operator.
Many thanks for a hint,
Helmut Jarausch
Lehrstuhl fuer Nume
Hi,
I know how to set DEFAULT_OPERATOR_AND for an individual QueryParser
Objekt (after creation)
Since I always want this to be set, is there a means to set a (global)
option such that any QueryParser object has this default operator.
Many thanks for a hint,
Helmut Jarausch
Lehrstuhl fuer Nume
I had a similar problem (I think). Look at using a WildcardFilter
(below), possibly wrapped in a CachingWrapperFilter, depending if you
want to re-use it. I over-rode the method QueryParser.getWildcardQuery
to customize it. In your case you would probably have to specifically
detect for the presenc
Ok I'm still struggling with this and a QueryFilter didn't help me one bit
:-(
I'm trying to query for books by "Charles Dickens" that start with "m". I
have constructed a QueryFilter for the author search and a PrefixQuery for
the title search. A simplified version of my code is below.
'
Hi all,
I am using StandardAnalyzer() to index the data.
Actual data is: "signals by magnets of different strength"
I want to search for "sign* NOT Machine".how can I do that??
I am using QueryParser.
Please help on this issue.
Thanks
Shakti Sareen
DISCLAIMER:
This email (incl
Yes that's right, my mistake.
In fact even after reading your comment I was puzzled
because PhraseScorer indeed requires *all* phrase-positions
to be satisfied in order to match. The answer is that
the OR logic is taken care of by MultipleTermPositions,
so the scorer does not need to be aware of a
Hi all,
I am using StandardAnalyzer() to index the data.
Actual data is: "signals by magnets of different strength"
when I am parsing a query: "signals strength"~2 , I am getting a hit.
But when I am parsing a query "strength signals"~2 , I am not getting a
hit.
WHY???it should work
27 matches
Mail list logo