Hi David,
Thanks for your suggestion.
I'll give a try.
David Spencer wrote:
You could try downloading a copy of the wikipedia and processing the
entries yourself. I don't know how well represented other languages
are but there's lot of English.
Ahmet Aksoy wrote:
Hi,
I have a project which will
You could try downloading a copy of the wikipedia and processing the
entries yourself. I don't know how well represented other languages are
but there's lot of English.
Ahmet Aksoy wrote:
Hi,
I have a project which will be used in order to supply automatic
dictionary helps in different language
Thanks otis,
I copied the index and I am playing around with the copy. I first had to change
the code to force the unlock of the directory. and from what you just said all
the new segments that are in my directory the index doesn't know about them so
deleting them shouldn't hurt.
-Origina
You should be able to re-try the merge (from the beginning - there is
no way to restart it at any point other than the beginning). The merge
and the new index is "finalized" at the very end of the merge, so if it
failed before that, your Lucene index (the segments file) still doesn't
know about th
it is as long as you use an Analyzer (when indexing, and when parsing your
query strings) that doesn't strip/convert whatever characters you consider
an "end of line" (newline? linefeed?) durring tokenization.
: Date: Wed, 11 May 2005 12:41:52 -0400
: From: "Govoni, Darren" <[EMAIL PROTECTED]>
:
hey guys,
my application died while I was merging two indexes. acoording to my
undestanding, if I just delete the new files that have been created while I
started merging, the index won't be affected. is this true?. what will happen
if i just restart the merging from where the application died?
Hi,
I have a project which will be used in order to supply automatic
dictionary helps in different languages.
I'm using Lucene for indexing, and searching the words in it.
It is an open source project in java at address
http://belletmen.dev.java.net
Now, I will prepare a function to find the natu
All,
I've just released Zilverline version 1.3.0.
This version has a webservice for indexing,
and is localized for the chinese language.
This version is fully webbased, all settings, collections, preferences
can be set via the web interface. You don't need to edit any config
files anymore. Also I'm
in your query parser, you'll need to use an Analyzer that knows that
"documenttype" should not be tokenized, and the raw user string entered by
the user should be treated as the query Term value.
you can make you own analyzer that subclass StandardAnalyzer and only does
the special behavior for t
well ... once you have the list of all "category" names that are in docs
which match your orriginal query, you can either redo the orriginal query
with "and category:" to get the counts, or you can pre-compute (and
save) a BitSet for each category in your index (esay to build using a
HitCollec
If you think content field is more important, you could boost it at
indexing time. If you want to boost at search time, and you are using
QueryParser, you could just use the term^float syntax. I think what
you have down there is ok, too, but I suppose you'd need an if/else so
you boost only the c
In that case just look at the first N hits and don't even mention the
rest.
Otis
--- Kai Gülzau <[EMAIL PROTECTED]> wrote:
> >Note that it may not make sense filtering by an arbitrary score
> >(normalized or not).
>
> I don't like the gooogle effect
> with an endless amo
What happens if you swap these 2 lines?
System.out.println("Docs number : " + ir.numDocs());
ir.close();
If I were you, I'd try using minMergeDocs instead of RAMDirectory. It
makes things much simpler. You shouldn't need to optimize the index.
Otis
--- Rifflar
Hello,
It sounds like you missed the Index Format page:
http://lucene.apache.org/java/docs/fileformats.html
That's the best index format documentation currently available.
Otis
--- Sujatha Das <[EMAIL PROTECTED]> wrote:
>
> Hi,
> I couldn't find documentation on these issues,
> so a url as
Hi! Seema,
Change your document.java so that content field is added for example:
doc.add(Field.Text("contents", "some dummy text"));
-Original Message-
From: Seema Jain [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 11, 2005 6:20 AM
To: java-user@lucene.apache.org
Subject: Getting subpar
You can also leverage the 'fields' capability in lucene and perhaps match them
against columns to do field-based searching.
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Wed 5/11/2005 12:50 PM
To: java-user@lucene.apache.org
Subject: Re: indexing relational ta
Dick Hollenbeck wrote:
As sources of indexable text we always see HTML, XML, PDF, etc. but I
have not seen much mention of relational tables as a source. Anybody
know why?
I think no specific reason - Lucene is able to index just pure text,
anything else must go through format converters first
Hi,
I'm trying to perform a query and ened to specify a string pattern occurring
at the end of a line.
Is this possible? Thanks.
Darren
As sources of indexable text we always see HTML, XML, PDF, etc. but I
have not seen much mention of relational tables as a source. Anybody
know why?
We have a database with 60,000 records in 6 tables and aproximately 15
*text* fields per table. Can we use lucene to index this with JDBC
being
Now Suppose,There are two fields,"content","summary",but i think the
query in content field may have highter weight than the summary field. how
can i do it?
I overload the parse function,and add weights which store every fields
weights.
public static Query parse(String query,String[] fie
When created, an IndexReader opens all the segment files and hangs
onto them. Any updates to the index through an IndexWriter (including
commit and optimize) will not affect already open IndexReaders.
-Yonik
On 5/11/05, Naomi Dushay <[EMAIL PROTECTED]> wrote:
> It's my impression that with optimi
It's my impression that with optimize running so long, there will be a
significant period of time (many minutes) when the old IndexReader will not
be able to find the segment/documents it needs. Am I wrong about that?
- Naomi
> Could you explain why you need to copy the index? It doesn't seem
Hi,
Daniel's suggestions was quite correct. Is the "/" suposed to be turned into a
whitespace? In that case, how do I stop it? I do wish to search for the entire
exact word "Blankett/Mall".
Regards,
Björn
_
Björn Lilja | Technology S
Hi ,
I am using Lucene API for Text indexing , searching and highlighting .I am
using Lucene SANDBOX API for highlighting of keywords .
My requirement is to get the subpart of a lucene query . Lucene query , which
is made up of Field-value pair. How can i get the value of a particular field ?
Consider a situation in which i have indexed the terms under two different
fields (say FIELD_TEXT and FIELD_SYNONYM).
What if I wanted to support queries like
"jaguar NEAR london", when i have indexed a document with
"panthers in zoos around London". So given that Lucene doesn't support
cross-fie
-- Forwarded message --
Date: Fri, 1 Apr 2005 15:34:10 -0500
From: Erik Hatcher <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: proximity search in lucene
On Apr 1, 2005, at 2:29 PM, Sujatha Das wrote:
Hi,
Does Lucene support "
Hi,
I couldn't find documentation on these issues,
so a url as response should be just fine.
The inverted index must look like
FIELD-1
term -> (doc,offset)pairs
Is this correct?
Say I am trying to index the documents in a corpus under two
different fields. For instance, I want to store with
every w
>Note that it may not make sense filtering by an arbitrary score
>(normalized or not).
I don't like the gooogle effect
with an endless amount of paging links. ;)
The user should get only the top percentage of docs/products
he can handle reasonable.
Regards,
Kai Gü
28 matches
Mail list logo