That would make sense yes.
But the problem is I'm having a general query filed. I don't know user
entered String or a number, or what he meant... Is 2008 a year (number) or
part of an address String e.g. keeping the address.
Or maybe he's combining stuff like "Potter 19,99"
Robert Muir wrote:
On Thu, Mar 26, 2009 at 07:06:26AM -0400, Michael McCandless wrote:
> We'd need to add a few methods to IndexReader,
Eep. IndexReader's too big as it is.
> eg querying whether
> compound file format is in use, whether separate norms are stored,
> "get me total size in bytes of all files" (or
marcel,
I'd suggest parsing/display numbers in a locale-sensitive way with
NumberFormat (be sure to supply correct locale)... and keeping them in the
index one consistent way (i.e. 19.99)
On Thu, Mar 26, 2009 at 6:03 PM, Marcel Overdijk
wrote:
>
> Thanks for your reply.
>
> It's indeed a webap
On Thu, Mar 26, 2009 at 6:28 PM, Matt Schraeder wrote:
> I'm new to Lucene and just beginning my project of adding it to our web
> app. We are indexing data from a MS SQL 2000 database and building
> full-text search from it.
>
> Everything I have read says that building the index is a resource h
Marcel,
First of all, do you really want the user to search price:19.99 ?
Maybe you should use some logic like price>=19.99?
If so, you should use range query to handle this case.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www
There are many things you need to synchronize with database. Besides
just changed fields, you may need to deal with deleted database records,
etc.
In general, it's not efficient to pull over data that's changing
often.and may not have much effect on search. It'll overload Lucene
unnecessarily
Thanks for your reply.
It's indeed a webapp with a html front-end.
I agree letting end-user enter a Lucene query might not what you want.
Probably I will be using an "all" index which indexes all fields of my
entity. So in the book example including book title, isbn, price,
author.firstname, aut
What does the front end look like? Is it a web page or a custom app? And
do you expect your users to actually enter the field name? I'd be reluctant
to allow any but the geekiest of users to enter the Lucene syntax (i.e. the
field
names). Users shouldn't know anything about the underlying structure
You've got a great grasp of the issues, comments below. But before you
do, a lot of this kind if thing is incorporated in SOLR, which is build on
Lucene. Particularly updating an index then using it.
So you might take a look over there. It even has a DataImportHandler...
WARNING: I've only been mo
First of all I'm new into Lucene. I'm experimenting right now with it in
combination with Hibernate Search.
What I'm wondering is of I can index numbers related to i18n.
E.g. I have a Book entity with a price attribute.
A book with a price of 19.99 can be found while searching for price:19.99.
I'm new to Lucene and just beginning my project of adding it to our web
app. We are indexing data from a MS SQL 2000 database and building
full-text search from it.
Everything I have read says that building the index is a resource heavy
operation so we should use it sparingly. For the most part
I am out of the office until 2009-03-30..
I will check emails at night. For anything emergent, you can call my cell
phone (86) 131 6290 0375.
Note: This is an automated response to your message Re: Memory Leak? sent
on 26/3/09 22:34:46.
This is the only notification you will receive while this
OK I opened LUCENE-1573 for this.
Mike
On Thu, Mar 26, 2009 at 8:48 AM, Jeremy Volkman wrote:
> The indexer thread was part of a worker pool. I "stopped" the pool which
> interrupted all of the worker threads. So, the interruption came from my
> code.
>
> I didn't notice whether one CPU was pegg
Another thing is to limit the max # merge threads CMS will run at
once. It defaults to 3 now.
Mike
On Thu, Mar 26, 2009 at 2:08 PM, Jason Rutherglen
wrote:
> I used the NoMergePolicy to build the index as I noticed the indexing is
> faster, meaning the system simply creates large multi-megabyte
I used the NoMergePolicy to build the index as I noticed the indexing is
faster, meaning the system simply creates large multi-megabyte segments in
the ram buffer, flushes them out and doesn't worry about merging which
causes massive disk trashing. I am pondering some benchmarks to find the
optima
Hi,
I'm not aware of anything in LingPipe that would do the Q&A part, though LP
(and GATE) may have the building blocks for what you need. For example, they
both must have sentence boundary detection/sentence chunking, which might be
one of the first sub-tasks you'd need to do to begin findin
I don't think you can write to the same index (file) from multiple
locations at the same time and expect predictable behaviour.
Afficionados will correct me if I'm wrong, but I think pessimistic
locking file system (think NTFS) would simply not allow this,
optimistic locking (think ext3) would resu
OK thanks for bringing closure.
Mike
On Thu, Mar 26, 2009 at 8:37 AM, Chetan Shah wrote:
>
> Ok. I was able to conclude that the I am getting OOME due to my usage of HTML
> Parser to get the HTML title and HTML text. I display 10 results per page
> and therefore end up calling the org.apache.luc
Thank you guys for the reply. Solr seems to be a good solution for
distributed indexes but the app is already written with a Lucene index.
So I had a question on Ian's answer as to going for 2 indexes.
My app is on a weblogic cluster with two servers. The app is installed on
both the servers.
Wha
The indexer thread was part of a worker pool. I "stopped" the pool which
interrupted all of the worker threads. So, the interruption came from my
code.
I didn't notice whether one CPU was pegged, however I did take a series of
JVM stack dumps and each one showed the finishMerges thread in the RUNN
Ok. I was able to conclude that the I am getting OOME due to my usage of HTML
Parser to get the HTML title and HTML text. I display 10 results per page
and therefore end up calling the org.apache.lucene.demo.html.HTMLParser 10
times.
I modified my code to store the title and html summary in the
OK I like this theory, and I think it can cause a spin loop in doWait
(do you see one CPU pegged?), and starvation in the merging thread.
Do you know who called Thread.interrupt() in your case? Does your
code do that explicitly somewhere?
IndexWriter is not doing the right thing when the thread
Hi Michael,
I originally wasn't thinking correctly about the doWait() method releasing
the monitor. I was thinking about it more of a sleep method instead (which
would not release the monitor).
Regardless, I think I've pinpointed the problem. In my stacktrace, "Indexing
Thread" had been interrupt
Marvin Humphrey wrote:
> On Wed, Mar 25, 2009 at 06:15:35AM -0400, Michael McCandless wrote:
>
>> I'm torn. MergePolicy (and MergeScheduler) are "expected" to be
>> something expert users could alter; their API is designed to be
>> exposed & stable. I think they should be visilbe in the javadocs
Are there any other threads running? Can you post their stack traces too?
Are you sure nothing is happening? EG, if you look in the index, do
you see files slowly increasing in size (indicating there is a merge
running).
These two traces are actually normal. The ArticleIngestor thread is
tryin
Hi
I was wondering if soemthing like LingPipe or Gate (for text extraction)
might be an idea? I've started looking at it and I'm just thinking it may
be applicable (I maybe wrong).
Cheers
Amin
On Wed, Mar 25, 2009 at 4:18 PM, Grant Ingersoll wrote:
> Hi MFM,
>
> This comes down to a preprocess
26 matches
Mail list logo