: > imeediately? ... in the totally generic case, this isn't a safe
: This was implemented as an easy way to control the maximum search time
: for typical queries. I'm open for suggestions how to improve it. One
The only thing i can think of that would truely timeout *any* query is a
seperate Ti
Hello
It's a whil that I am using lucene and as most of people seemingly do, I
used to save only some important fields of a docuemnt in the index. But
recently I thought why not store the whole document bytes as an untokenized
field in the index in order to ease the retrieval process? For example
I will try to explain it like an algorithm what I am trying to do:
1. There are 70 Dump files which have 10,000 record tags which I have pasted
in my earlier mails. I split every dumpfile and create 10,000 xml files each
with a single and its child tags. This is because there are some
parsing is
But I think you have a problem here with searching the lucene
index and deleting duplicate titles. Say you have the
following titles:
title one
title one is a nice file
title one is a really nice file
Further assume you're about to add a duplicate "title one"
Searching on "title one" will give y
I'm not sure what the lock issues is. What version of Lucene are you
using? And what is your filesystem like? There are some known locking
issues with some versions of Lucene and some filesystems,
particularly NFS mounts as I remember... It would help if you told
us the entire stack trace rather t
Markus,
What you were thinking is fine - search and, if found, delete first, then add.
Lucene allows duplicate and offers no automated way for avoiding them.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Origi
Yep I did that, and now my code looks as follows.
The time taken for indexing one file is now => Elapsed Time in Minutes ::
0.3531, which is really great, but after processing 4 dumpfiles(which means
40,000 small xml's), I get :
caught a class java.io.IOException
40114 with message: Lock obta
Nicolas,
Thank you for the reply! The problem that my categories are not static they
are generated at runtime, they are added and removed all the time. For that
reason I have no way to pre-generate the filters.
Dima
On 3/18/07, Nicolas Lalevée <[EMAIL PROTECTED]> wrote:
Le dimanche 18 mars 20
Move index writer creation, optimization and closure outside of your
loop. I would also use a SAX parser. Take a look at the demo code
to see an example of indexing.
Cheers,
Grant
On Mar 18, 2007, at 12:31 PM, Lokeya wrote:
Erick Erickson wrote:
Grant:
I think that "Parsing 70 file
I'm using contrib/benchmark to do some tests for my ApacheCon talk
and have some questions.
1. In looking at micro-standard.alg, it seems like not all braces are
closed. Is a line ending a separator too?
2. Is there anyway to dump out what params are supported by the
various tasks? I am e
Erick Erickson wrote:
>
> Grant:
>
> I think that "Parsing 70 files totally takes 80 minutes" really
> means parsing 70 metadata files containing 10,000 XML
> files each.
>
> One Metadata File is split into 10,000 XML files which looks as below:
>
>
>
>
> oai:CiteSee
On 3/17/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
Ack! ... this is what happens when i only skim a patch and then write with
my odd mix of authority and childlike speling
I'm telling ya, man, ya gotta get Firefox, use Gmail (or at least a
web-interfaced e-mail client) and turn on th
Grant:
I think that "Parsing 70 files totally takes 80 minutes" really
means parsing 70 metadata files containing 10,000 XML
files each.
Lokeya:
Can you confirm my supposition? And I'd still post the code
Grant requested if you can.
So, you're talking about indexing 10,000 xml files in
You might also want to search the mail archive for
"faceted search" and/or Categories. This topic has been discussed
under that heading, I believe...
Erick
On 3/18/07, Dima May <[EMAIL PROTECTED]> wrote:
I have a Lucene related questions/problem.
My search results can potentially get very l
Can you post the relevant indexing code? Are you doing things like
optimizing after every file? Both the parsing and the indexing sound
really long. How big are these files?
Also, I assume you machine is at least somewhat current, right?
On Mar 18, 2007, at 1:00 AM, Lokeya wrote:
Thank
Chris Hostetter wrote:
Ack! ... this is what happens when i only skim a patch and then write with
my odd mix of authority and childlike speling
: * it creates a single (static) timer thread, which counts the "ticks",
: every couple hundred ms (configurable). It uses a volatile int counter,
:
Le dimanche 18 mars 2007 06:55, Dima May a écrit :
>
>
> I have a Lucene related questions/problem.
>
> My search results can potentially get very large 200,000+. I want to
> categorize my results. So for example if I have an indexed field "type"
> that has such things as CDs, books, videos, powe
17 matches
Mail list logo