Re: Searching any part of a string

2008-06-27 Thread Chris Hostetter
: Thanks for the suggestions. I've used indexed n-grams before to implement : spell-checking; I think in this case I may take a look at WildcardTermEnum : and RegexTermEnum. It seems like a good solution because I am doing my own : results ordering so Lucene's scoring is irrelevant in this case. I

Re: Question: Can lucene do parallel indexing?

2008-06-27 Thread Phil Myers
> If I'm using a computer that has multiple cores, or if I > want to use several > computers to speed up the indexing process, how should I do > that? Is there > some kind of support for that in the API? Yes. There are some comments on this near the end of this page: http://wiki.apache.org/lucene

Question: Can lucene do parallel indexing?

2008-06-27 Thread David Lee
If I'm using a computer that has multiple cores, or if I want to use several computers to speed up the indexing process, how should I do that? Is there some kind of support for that in the API? David Lee

QueryWrapperFilter performance

2008-06-27 Thread Jordon Saardchit
Hello All, Sort of new to lucene but have a general question in regards to performance. I've got a single index of rather large size (about 7 million docs). I've ran a couple different queries against it, which are described below. * WildcardQuery: (*term*) Which returns roughly 12000 hits i

RE: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Bill.Chesky
Hmmm, I think maybe I am missing something. In your design is the 'data' field indexed, i.e. searchable? Or is it an unindexed, stored field? I was thinking that both 'data' and 'data_type' were indexed and searchable. Maybe the confusion stems from the fact that for the Document correspon

Re: Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

2008-06-27 Thread Hasan Diwan
Kumar: Assuming you want to index a pre-parsed document... 2008/6/27 Erick Erickson <[EMAIL PROTECTED]>: >> If it supports, what should be done in Lucene demo 2.3.2 to search queries >> on file with above mentioned extensions? The new ODF-compatible Office 2007 is not supported by POI. However, yo

Re: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Matthew Hall
Yup, you're pretty much there. The only part I'm a bit confused about is what you've said in your data field there, I'm thinking you mean that for the data_type: "State", you would have the data entry of "California", right? If so, then yup, you are spot on ^^ We use this technique all the

Re: Sorting issues

2008-06-27 Thread Erick Erickson
I can't count how many times I've said "It must be a bug in the compiler", but I *can* count how rarely I've been right . Glad you're on a path to resolution. Erick On Fri, Jun 27, 2008 at 3:09 PM, <[EMAIL PROTECTED]> wrote: > Thanks Eric, > > I did find the problem using Luke, I see that all o

Re: Sorting issues

2008-06-27 Thread Robert . Hastings
Thanks Eric, I did find the problem using Luke, I see that all of the documents have the same category field, so I must not be adding the field correctly when I index them. Bob Bob "Erick Erickson" <[EMAIL PROTECTED]> 06/27/2008 01:58 PM Please respond to java-user@lucene.apache.org To

Re: Sorting issues

2008-06-27 Thread Erick Erickson
I can't really help since I've never had to go into the guts of Lucene and see where sorting is applied, so I don't know where to point you . But the sorting has always worked for me, and I don't remember anyone else posting a similar issue in the last year or so. Which means that the first thing

Re: Sorting issues

2008-06-27 Thread Robert . Hastings
Actually, I do a global search and the order comes out: 1, 2, 8, 3, 5, 6, 7,8, 4, 9. I'm having trouble finding in the code where the sort actually gets applied. Can you help me out there? Bob "Erick Erickson" <[EMAIL PROTECTED]> 06/27/2008 12:19 PM Please respond to java-user@lucene.apac

RE: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Bill.Chesky
Erick, Thanks for the response. I'm very sure the TokenStream is expensive. Not always but in some case, yes, it can take a long time to complete. However, I do like your approach. I'm going to try a different approach suggested by another poster first, but this is very interesting. Thank

RE: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Bill.Chesky
Matthew, Thanks for the reply. This looks very interesting. If I'm understanding correctly your db_key, data and data_type are Fields within the Document, correct? So is this how you envision it? Document: State=California Field: 'db_key'='1395' (primary key into relational table, correct

Re: Read index into RAM?

2008-06-27 Thread Anshum
Hi Darren, Assuming that you use a *nix/*nux machine, the best way to work that out would be to have your index moved to a tmpfs. Steps to have that done : 1. Mount a tmpfs (It uses RAM by default) 2. Copy your index to your new mount point 3. Open your index readers pointing to the new directory

Re: Read index into RAM?

2008-06-27 Thread Erick Erickson
I posted this reply to this question last time you posted it >From the docs... RAMDirectory public *RAMDirectory*(Directory dir) throws IOException Creates a new RAMDirectory instance from a different Directoryim

Read index into RAM?

2008-06-27 Thread Darren Govoni
Hi, Is it possible to read a disk-based index into RAM (entirely) and have all searches operate on it there? I saw some RAMDirectory examples, but it didn't look like it will transfer a disk index into RAM. thanks D - To unsu

Re: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Erick Erickson
How sure are you that the TokenStream is that expensive? But assuming you are AND that the values for these properties aren't that big, the simple-minded approach that comes to my simple mind is to just iterate through the stream yourself, assemble a string from the returned tokens and pass the str

Re: Sorting issues

2008-06-27 Thread Erick Erickson
That's surprising. Could you post a brief example of your index and search code? It sounds like you're saying docs 1, 2, 3 all have category aaa docs 4, 5, 6 all have category bbb docs 7, 8, 9 all have category ccc But if you search for category:bbb you don't get docs 4, 5, and 6 Is this a fair

Re: Doubt on IndexWriter.close()

2008-06-27 Thread Michael McCandless
Which version of Lucene are you using? Recent versions do not allow addDocument to be called after close. Mike java_is_everything <[EMAIL PROTECTED]> wrote: > > Hi all. > > IndexWriter.close() API states that :: > > "Flushes all changes to an index and closes all associated files.". > > What doe

Re: Searching any part of a string

2008-06-27 Thread Mark Ferguson
Hi Erick, Thanks for the suggestions. I've used indexed n-grams before to implement spell-checking; I think in this case I may take a look at WildcardTermEnum and RegexTermEnum. It seems like a good solution because I am doing my own results ordering so Lucene's scoring is irrelevant in this case.

Re: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Matthew Hall
I'm not sure if this is helpful, but I do something VERY similar to this in my project. So, for the example you are citing I would design my index as follows: db_key, data, data_type Where the data_type is some sort of value representing the thing that's on the left hand side of your property

Sorting issues

2008-06-27 Thread Robert . Hastings
I just implemented a sorting feature on our application where the user can change the sort on a query and reexecute the search. It works fine on text fields where most of the documents have different field values. However, on fields that are categories, that is, there are only four distinct va

RE: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Bill.Chesky
Grant, Thanks for the reply. What we're trying to do is kind of esoteric and hard to explain without going into a lot of gory details so I was trying to keep it simple. But I'll try to summarize. We're trying to index entities in a relational database. One of the entities we're trying to in

Re: Can we know "number-of-documents-that-will-be-flushed"?

2008-06-27 Thread Michael McCandless
Yes, it will. The javadocs for that method is rather confusing; I'll correct it. Mike On Fri, Jun 27, 2008 at 6:44 AM, java_is_everything <[EMAIL PROTECTED]> wrote: > > Hi Mike. Thanks for the reply. > > Just one doubt. Will it work if the indexwriter directory is "not" a > RAMDirectory? > > Loo

Re: Problem with search an exact word and stemming

2008-06-27 Thread Matthew Hall
Also, please note that I thought about it and realized that I mispoke when I sent out my original suggestion. You don't want an untokenized field in your case, you want an unstemmed one instead. This will allow you to get the functionality you are looking for.. at least I believe so ^^ Anyh

Re: Lucene CFS naming significance

2008-06-27 Thread mick l
> That not true. Unless you always have the same "data" every time you build the index, and if you build it every time from the beggining (not rewriting the docs) >>Lucas, The same tables are being converted to an index each time, but there will just be extra rows. I do rebuild the index each

Re: Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

2008-06-27 Thread Erick Erickson
Lucene doesn't actually support any of the document types. What happens is that some program is used to parse the files into an indexable stream and that stream is indexed. That used to be POI in the old days. I confess I haven't used the latest demo, but I assume that under the covers there's som

Doubt on IndexWriter.close()

2008-06-27 Thread java_is_everything
Hi all. IndexWriter.close() API states that :: "Flushes all changes to an index and closes all associated files.". What does "closes all associated files" mean, since we are apparently able to still addDocument() even after calling IndexWriter.close() ? Looking forward to a reply. Ajay garg

Re: Can you create a Field that is a copy of another Field?

2008-06-27 Thread Grant Ingersoll
On Jun 27, 2008, at 12:01 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED] > wrote: Hello Lucene Gurus, I'm new to Lucene so sorry if this question basic or naïve. I have a Document to which I want to add a Field named, say, "foo" that is tokenized, indexed and unstored. I am using the "

Re: Lucene CFS naming significance

2008-06-27 Thread Lucas F. A. Teixeira
Folks, Could anyone tell me the significance of the naming of the cfs files in the luceneindex e.g. _1pp.cfs, _2kk.cfs etc. > Just names that won`t repeat in the same folder. I have observed many differently named files being created temporarily while the index is being built, but the same set

Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

2008-06-27 Thread Kumar Gaurav
Dear all, Currently I am using Lucene jave 2.3.2 demo to parse Microsoft 2003 and 2007 docs and PDF files. It is able to parse files with *.pdf, *.doc, *.xls etc. But it does not search in files of Microsoft 2007 docs. It shows indexing *.docx and other Microsoft 2007 doc files. Does Lu

Re: Can we know "number-of-documents-that-will-be-flushed"?

2008-06-27 Thread java_is_everything
Hi Mike. Thanks for the reply. Just one doubt. Will it work if the indexwriter directory is "not" a RAMDirectory? Looking forward to a reply. Ajay Garg Michael McCandless-2 wrote: > > IndexWriter.numRamDocs() should give you that. > > Mike > > java_is_everything <[EMAIL PROTECTED]> wrote:

Lucene CFS naming significance

2008-06-27 Thread mick l
Folks, Could anyone tell me the significance of the naming of the cfs files in the luceneindex e.g. _1pp.cfs, _2kk.cfs etc. I have observed many differently named files being created temporarily while the index is being built, but the same set of named files are in place once the index has finish

Re: Preventing index corruption

2008-06-27 Thread Michael McCandless
If you open your IndexWriter with autoCommit=false, then no changes will be visible in the index until you call commit() or close(). Added documents can still be flushed to disk as new segments when the RAM buffer is full, but these segments are not referenced (by a new segments_N file) until commi

Re: Preventing index corruption

2008-06-27 Thread John Byrne
Hi, Rather than disabling the merging, have you considered putting the documents in a separate index, possibly in memory, and then deciding when to merge them with the main index yourself? That way, you can change you mind and simply not merge the new documents if you want. To do this, yo

Re: Can we know "number-of-documents-that-will-be-flushed"?

2008-06-27 Thread Michael McCandless
IndexWriter.numRamDocs() should give you that. Mike java_is_everything <[EMAIL PROTECTED]> wrote: > > Hi all. > > Is there a way to know "number-of-documents-that-will-be-flushed", just > before giving a call to flush() method? > I am currently using Lucene 2.2.0 API. > > Looking forward to repli

Re: Problem with search an exact word and stemming

2008-06-27 Thread renou oki
Thanks for the reply. I will try to add an other data field. I thought about this solution but i was not very sure. I thought that was an easier solution to do that... best regards Renou 2008/6/26 Matthew Hall <[EMAIL PROTECTED]>: > You could also add another data field to the index, with an