: than just on/off), but the original QP shows the problem with
: setAllowLeadingWildcard(true). The compiled JavaCC code will always create a
: PrefixQuery if the last character is *, regardless of any other wildcard
: characters before it. Therefore the query is based on the Term:
Yep, defini
I'm extracting text from Word using TextMining.org extractors - it works better
than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do.
However, I'm trying to find out about licence issues with the TM jar. The TM
website seems to be permanently hacked these days.
Anyon
Hi Guys.
Ok thanks for the replies. You guys are right that it is to do with the system
and not with Lucene. However, what i'm trying to do is to pinpoint and narrow
down to the exact place that causes the system to fail. and then from there try
to remedy the problem.
The odd thing is that th
OK. Thank you. We'll have to consider using this approach.
I guess the drawback here is that ":" will not longer work as a field
operator. ?:-(
We were also considering using the following approach.
String newquery = query.replace(query, ": ", " ");
It seems this way a co
Felix Litman wrote:
Yes. thank you. How did you make that modification not to treat ":" as a
field-name terminator?
Is it using this Or some other way?
I removed the : handling stuff from QueryParser.jj in the method:
Query Clause(String field) :
I removed this section
---
[
LOOKAHE
What file system is the hard disc? If it is FAT32 one of your indexing files
is probably getting bigger than 4.7 gigs - the maximum file size in FAT32
Damien
-Original Message-
From: maureen tanuwidjaja [mailto:[EMAIL PROTECTED]
Sent: 23 February 2007 02:07
To: java-user@lucene.apache.or
yes I do have around 75 GB of free space on that HDD...I do not invoke any
index reader...hence the program only calls indexwriter to optimize the
index,and that's it..
I am also perplexed why it tells that it have not enough disk space to do
optimization...
Michael McCandless <[EMAIL
Chris Hostetter wrote:
: So I don't see why using a SpanNear that respects order and a large
: IncrementGap won't solve your problem.. Although it would return "odd"
i think the use case he's worreid about is that he needs to be able to
find matches just on the "start" of a persons name, ie.
Chris Hostetter wrote:
i'm not very familiar with this issue, but are you using
setAllowLeadingWildcard(true) ? ... if not it definitely won't work.
That's not the issue. (I've modified QP to allow "minWildcardPrefix" rather
than just on/off), but the original QP shows the problem with
setAl
Hello,
If you have experience using XML and doing web services requests
Solr is what you need. It's production quality code and evolving
quickly. It has a remarkable amount of extra functionality.
For CORBA type programmers, go with terracotta. It looks to go a
step further beyond sharing object
Thanks for the suggestions.
I'm using the Lucene packaged with Gate, which is lucene-1.3-final.jar
(ancient I suppose).
I am now seeing the threading problems with GATE, and although I was
hoping to stay with Gate in case we need some of it's capabilities,
although with the current design we cou
Hello,
This snippet may help to understand TopDocs:
http://mail-archives.apache.org/mod_mbox/lucene-general/200508.mbox/%
[EMAIL PROTECTED]
Also, paging through Lucene results is 'do-it-yourself' exercise using
hits.length() until someone contributes a good implementation.
Oversimplifying, i
You might have some luck searching the mailing list for "faceted search", as
I remember there's been quite a discussion on that topic and I *think* it
applies...
Even if you use a HitCollector, you still have to categorize your document,
and all you have is the doc id to work with. But I think yo
I have a query that can return documents that represent different types of
things (e.g. books, movies, coupons, etc)
There is a "object_type" keyword on each document, so I can tell that a
document is a coupon or a book etc...
The problem is that I need to display a count of each item type tha
: Actually I don't see how it could not be multi-threaded,
: since it seems normal to me that I run it in a web application which is
: multi-threaded for each user request ?
every application in the world is not a web application.
if you are dealing with multiple threads, you will need to o somet
This sounds like it has absolutely nothing to do with Lucene, and
everything to do with good security permissions -- your Zope/python front
end is most likely running as a user thta does not have write permissions
to the directory where your index lives. you'll need to remedy that.
you can write
i'm not very familiar with this issue, but are you using
setAllowLeadingWildcard(true) ? ... if not it definitely won't work.
: Date: Thu, 22 Feb 2007 15:36:43 +1100
: From: Antony Bowesman <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Q
: > What is the point to calculate score if the result set is going to be sorted
: > by some field?
: No point, I believe, unless your sort includes relevance score. I
...which is non trivial information to deduce, since a SortField can
contain a SortComparatorSource which uses a ScoreDocCompar
I would not do this from scratch...if you are interested in Solr go that
route else I would build off http://issues.apache.org/jira/browse/LUCENE-390
- Mark
Mohammad Norouzi wrote:
Hi all,
I am going to build a Searcher pooling. if any one has experience on
this, I
would be glad to hear his/h
22 feb 2007 kl. 19.22 skrev Otis Gospodnetic:
I believe it's a SpellChecker implementation deficiency, and Karl
will probably suggest looking at LUCENE-626 as an alternative. And
I'll ask you to please report back how much better than the contrib
SpellChecker Karl's solution is.
:)
The
I believe it's a SpellChecker implementation deficiency, and Karl will probably
suggest looking at LUCENE-626 as an alternative. And I'll ask you to please
report back how much better than the contrib SpellChecker Karl's solution is.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Did any one have success implementing "did you mean" feature for multi-word
queries as described in Tom White's excellent "Did you Mean Lucene?" article?
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
...and more specifically, using the CompositeDidYouMeanParser implementation as
Yes. thank you. How did you make that modification not to treat ":" as a
field-name terminator?
Is it using this Or some other way?
String newquery = query.replace(query, ":", " ");
Thank you,
Felix
Antony Bowesman <[EMAIL PROTECTED]> wrote: Not sure if you're still after a
solution, but I ha
Hi Mike,
> I have a collection of XML files that I would like to parse using Digester
> in order to index them for Lucene. A DTD file has been supplied for the XML
> files, but none of those files has a line associating them
> with the DTD file. Can the Digester's register function be used to tel
- Original Message ---From: dmitri <[EMAIL PROTECTED]>
> What is the point to calculate score if the result set is going to be sorted
> by some field?
No point, I believe, unless your sort includes relevance score. I believe
there is a Lucene patch that involves a Matcher (a new concept fo
OK, I was off on a tangent. We've had several discussions where people were
effectively trying to replace a RDBMS with Lucene and finding out it that
RDBMSs are very good at what they do ...
But in general, I'd probably approach it by doing the RDBMS work first and
indexing the result. I think th
Actually I don't see how it could not be multi-threaded,
since it seems normal to me that I run it in a web application which is
multi-threaded for each user request ?
Erick, could u please explain to me your comment ?
Thank u.
__
Matt
-Original Mess
Thanks Erick you've helped a lot and so has everyone else.
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: 22 February 2007 13:00
To: java-user@lucene.apache.org
Subject: Re: Returning only a small set of results
See TopDocs, HitCollector, etc. You'll have to dig
Thanks Erick
but we have to because we need to execute very big queries that create
traffik network and are very very slow. but with lucene we do it in some
milliseconds. and now we indexed our needed information by joining tables.
it works fine, besides, it returns the exact result as we can get
I know this has been discussed several times, but sure don't remember the
answers. Search the mail archive for "multiple languages" and you'll find
some good suggestions. But as I remember, it's not a trivial issue.
But I don't see why the "three different documents" approach wouldn't work.
You c
Well, it's your logic that takes the request from the user and executes the
search. So it's your logic that has to take care of any coordination between
threads that use the same reader. This is a standard multi-threading
resource-sharing issue.
If your application is not multi-threaded, I don't
Hi All,
Our application that uses Lucene for indexing will be used to index
documents that each of which contains parts written in different
languages. For example some document could contain English, Chinese and
Brazilian text. So how to index such document? Is there some best
practice to do
See TopDocs, HitCollector, etc. You'll have to dig through the documentation
and try a few experiments to make sense of it all, one sentence explanations
aren't much help.
But think of Hits as a convenience class for getting the best-scoring 100
documents and use the other classes if you want to
don't do either one Search this mail archive for discussions of
databases, there are several long threads discussing this along with various
options on how to make this work. See particularly a mail entitled
*Oracle/Lucene
integration -status- *and any discussions participated in by Marcelo O
Is your disk almost full? Under Linux, when you reach about 90% used on
a file system, only the superuser can allocate more space (e.g. create
files, add data to files, etc.).
--MDC
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
My question is what happen when a re-opening of the reader occurs and in
the same time a user does a query on the index ? And are there solutions
for this.
__
Matt
-Original Message-
From: Michael McCandless [mailto:[EMAIL PROTECTED]
Sent: Thursda
<[EMAIL PROTECTED]> wrote:
> I need to merge indexes,
> if I want the user to see the changes (the merged indexes), I heard I
> need to close the index reader and re-open it again.
Yes. More generally, whenever there have been changes to an index
that you want your readers/searchers to see, you
"maureen tanuwidjaja" wrote:
> I had an exsisting index file with the size 20.6 GB...I havent done any
> optimization in this index yet.Now I had a HDD of 100 GB,but apparently
> when I create program to optimize(which simply calls writer.optimize()
> to this indexfile),it gives the error
22 feb 2007 kl. 05.21 skrev maureen tanuwidjaja:
I also would like to know wheter searching in the indexfile eats
lots of memory...I always ran out of memory when doing
searching,i.e. it gives the exception java heap space(although I
have put -Xmx768 in the VM argument) ...Is there any way
22 feb 2007 kl. 10.09 skrev Martin Braun:
the only thing I have found in the list before concerning this subject
is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not
sure if
it does the things I want.
I am not sure if we get enough queries for a search over an index base
on th
Its really Great to have the tool compatible with Lucene 2.1.
It saves lot of energy.
Thanks once again.
supriya
Andrzej Bialecki wrote:
Hi all,
I'm happy to announce that a new version of Luke - the Lucene Index
Toolbox - is now available. As usually, you can get it from:
http://www.ge
What can you use in place of Hits and how do they differ?
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: 21 February 2007 22:43
To: java-user@lucene.apache.org
Subject: Re: Returning only a small set of results
: A question about efficiency and the internal wor
I have a collection of XML files that I would like to parse using Digester
in order to index them for Lucene. A DTD file has been supplied for the XML
files, but none of those files has a line associating them
with the DTD file. Can the Digester's register function be used to tell it
to use that D
Hello All,
I am implementing a query auto-complete function à la google. Right now
I am using a TermEnum enumerator on a specific field and list the Terms
found.
That works good for Searches with only one Term, but when the user's
typing two or three words the function will autocomplete each Term
This is a very common use case and Lucene is most likely not
the problem cause.
My guess is that (1) the first attempt to write anything to
disk failed. (2) opening the IndexWriter succeeded because
(a) the index exists already (from previous successful run) and
(b) locks are maintained in /tmp or
Hi,
I need to merge indexes,
if I want the user to see the changes (the merged indexes), I heard I
need to close the index reader and re-open it again.
But I will need to do this avery x minutes for some reasons,
So I wondered what could happen if user does a query just when a re-open
of the read
Hi!
I'm writing a java program that uses Lucene 1.4.3 to index and create a vector
file of words found in Text Files. The purpose is for text mining.
I created a Java .Jar file from my program and my python script calls the Java
Jar executable. This is all triggered by my DTML code.
I'm runnin
Hello
In our application we have to index the database tables, there is two way to
make this
1- index each table in a separate directory and then keep all relation in
order to get right result. in this method, we should use filters to overcome
the problem of searching on another search result.
2.
Hi all,
I am going to build a Searcher pooling. if any one has experience on this, I
would be glad to hear his/her recommendation and suggestion. I want to know
what issues I should be apply. considering I am going to use this on a web
application with many user sessions.
thank you very much in a
49 matches
Mail list logo