Re: doc.getFields argument error

2007-01-18 Thread David
Yes, It is a bug of PyLucene.Thanks! 2007/1/19, Chris Hostetter <[EMAIL PROTECTED]>: : fields = doc.getFields() : TypeError: getFields() takes exactly one argument (0 given) : : how to fix this error? this sounds like a problem with the port ... when you have questio

Re: doc.getFields argument error

2007-01-18 Thread Chris Hostetter
: fields = doc.getFields() : TypeError: getFields() takes exactly one argument (0 given) : : how to fix this error? this sounds like a problem with the port ... when you have questions about using pylucene please start by mailing the pylucene community before mailing the

Re: Counting hits in a document

2007-01-18 Thread Chris Hostetter
: It was late this afternooon and I was square-eyed, so I didn't add the : detail. The app we're working on first returns a summary list of all the : books that match a query, no hit information. Next, the user clicks on a : returned title and we show the hits by chapter. That is, a list of chapte

doc.getFields argument error

2007-01-18 Thread David
Hi all : in Lucene JavaDoc, the getFields signature is : public final List *getFields*() I use PyLucene and write the following code: for i, doc in hits: fields = doc.getFields() for field in fields:

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
It is 4 in the morning here in Greece, so I will try it tomorrow...sometime I must sleep! I will come up with the results tomorrow. Thanks! Vagelis markrmiller wrote: > > A...I brushed over your example too fast...looked like normal > counting to me...I see now what you mean. So OMIT_NORM

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
A...I brushed over your example too fast...looked like normal counting to me...I see now what you mean. So OMIT_NORMS probably did work. Are you getting the results through hits? Hits will normalize. Use topdocs or a hitcollector. - Mark Vagelis Kotsonis wrote: But i don't want to get th

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
But i don't want to get the frequency of each term in the doc. what I want is 1 if the term exists in the doc and 0 if it doesn't. After this, I want all thes 1s and 0s to be summed and give me a number to use as a score. If I set the TF value as 1 or 0, as I described above, I get the right num

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
Dont return 1 for tf...just return the tf straight with no changes...return freq. For everything else return 1. After that OMIT_NORMS should work. If you want to try a custom reader: public class FakeNormsIndexReader extends FilterIndexReader { byte[] ones = SegmentReader.createFakeNorms(max

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
I feel kind of stupid...I don't get what hossman says in his post. I got the thing abou the OMMIT_NORMS and I tried to do it by calling Field.setOmitNorms(true); before adding a field in the index. After that I re-indexed my collection but still not making any difference. Tell me if I got it rig

Re: Counting hits in a document

2007-01-18 Thread Mark Miller
Mark: Very most excellent. I'll give it a look in the morning. I hope that the class doesn't need the raw text since I don't have it any more, but your comment "Give it a query it will give you the spans" makes me hopeful. Should have been more specific: Just give it a query and an appropriat

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
Sorry your having trouble find it! Allow me...bingo: http://www.gossamer-threads.com/lists/lucene/java-user/43251?search_string=sorting%20by%20per%20doc%20hit;#43251 Prob doesn't have great keyword for finding it. That should get you going though. Let me know if you have any questions. - Mark

Re: Counting hits in a document

2007-01-18 Thread Erick Erickson
Hoss: It was late this afternooon and I was square-eyed, so I didn't add the detail. The app we're working on first returns a summary list of all the books that match a query, no hit information. Next, the user clicks on a returned title and we show the hits by chapter. That is, a list of chapter

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
Before I make this questions I have been looking the list for over 2 hours and I didn't find something to make me understand how to do what I want. After you sent the message I made a quick pass through all your messages, but I didn't find something. I also searched for FakeNormsIndexReader and s

Re: Counting hits in a document

2007-01-18 Thread Mark Miller
Just threw together a highlighter that can handle spans (combining a rewrite with dumspans from LIA) and used this: http://issues.apache.org/bugzilla/attachment.cgi?id=15568 Nice spans extractor from Mark (not me ). Give it a query it will give you the spans. - Mark Erick Erickson wrote: H

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
I just did the same thing. If you search the list you'll find the thread where Hoss gave me the info you need. It really comes down to makeing a FakeNormsIndexReader. The problem you are having is a result of the field size normalization. - mark Vagelis Kotsonis wrote: Hi all. I am trying to

Re: Searching/indexing date/time values or numeric values?

2007-01-18 Thread Doron Cohen
for all documents where: > : > > : > CREATEDATE > "1/1/2007" > : > > : > The above is obvisouly pseudo-query expression. I did find something > : > called Range searches on the query syntax documentation page and it says > : > the sorting is done le

custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
Hi all. I am trying to make some experiments in an algorithm that scores results by counting how many words of the query submited are in a document. For example if i enter the query A B D A The similarities I want to get for the documents follows: A A C F D (2-found A and D) A B D S S A (3 -

Re: how to make RangeQuery action as > < != operators?

2007-01-18 Thread Chris Hostetter
1) "!=" is really not a range operation at all. 2) if you look at the javadocs for RangeQuery you will see... Constructs a query selecting all terms greater than lowerTerm but less than upperTerm. There must be at least one term and either term may be null, in which case there is no bound o

Re: Searching/indexing date/time values or numeric values?

2007-01-18 Thread Chris Hostetter
ing is done lexicographically. I guess that means it's sorted : > by letter. I would then need to store all my date/time values in a : > format like mmdd hh:mm:ss. : > And search, CREATEDATE:[20070101 00:00:00 TO 20070118 00:00:00], where : > the second date/time value is something li

Re: Counting hits in a document

2007-01-18 Thread Chris Hostetter
The Spans interface has a skipTo for jumping to a specific documentId (or the first matching document with a higher documentId) once you've done that, then the doc(), start(), and end() calls will tell you info about the match (which doc it's in, where that match starts, nd where it ends) ... use

Re: rewriting wildcard query before highlighting

2007-01-18 Thread Daniel Naber
On Thursday 18 January 2007 14:48, Mark Miller wrote: > Would it be more efficient to make a RAM index with just > the doc to be highlighted and then pass the reader of that into the > rewrite method before highlighting a query that expands? Yes, that's a valid approach, especially using MemoryIn

Counting hits in a document

2007-01-18 Thread Erick Erickson
Hi again. I've been struggling for the last couple of days and getting nowhere, so it's time to swallow my pride and say "Help" OK, let's say I have a document indexed and I do NOT have access to the raw text. I need to find the offset of all the hits for a query on a single document. Advice

Re: Integrated File parser available?

2007-01-18 Thread Erik Hatcher
On Jan 18, 2007, at 3:22 AM, Supheakmungkol SARIN wrote: I'd like to know whether there exists any integrated JAVA API that we can use to parse most of today's popular file formats? Currently I have been using one API for one file format and it's not so convenient. If you wrapped all your

meta information of hits

2007-01-18 Thread Tomas Fischer
Hi! I got lost. Short version: Is it possible to index tons of files, execute a query for word 'foo'. Look at *each* hit in the 10 best files and receive some meta information? Extended version: I have html like files, which I want to index with Lucene. FileA: foo bar foo FileB: foo

rewriting wildcard query before highlighting

2007-01-18 Thread Mark Miller
Looking for opinions: If I pass the reader of a large index to the query rewrite method just to highlight a single doc it seems the query I generate will expand to much more than I need. Would it be more efficient to make a RAM index with just the doc to be highlighted and then pass the reader of

Re: search in all fields

2007-01-18 Thread karl wettin
18 jan 2007 kl. 09.54 skrev David: Hi all: I study Lucene and I want build search on all the fields, I find MultiFieldQueryParser can search on multiple fields, but we must specify fields. maybe we can add a field named all_field that contain all the fields when indexing, but it make t

Re: Websphere and Dark Matter

2007-01-18 Thread Rollo du Pre
Thanks for this information, it was very useful. Rol. Nadav Har'El wrote: On Tue, Jan 16, 2007, Rollo du Pre wrote about "Re: Websphere and Dark Matter": I was hoping it would, yes. Does websphere not release memory back to the OS when it not longer needs it? I'm concerned that if the memory

Re: how to make RangeQuery action as > < != operators?

2007-01-18 Thread Kapil Chhabra
In my case, I know the upper and lower limits of my field. So it becomes easy for me to run such queries. eg. *> * RangeQuery(value, upperLimit, inc) *<* RangeQuery(lowerLimit, value, inc) *!=* Use the in this case. eg: -field:value Regards, kapilChhabra David wrote: Hi all: I need to m

how to make RangeQuery action as > < != operators?

2007-01-18 Thread David
Hi all: I need to make range query actions as > < and != operators, The RangeQuery class just support RangeQuery(begin, end, inclusive), but How to support > < and != ? Appreciate your help! -- David

Re: search in all fields

2007-01-18 Thread John Song
Here is my experience of getting a good search relevancy: pre processing is paramount. Pre process your data and using perl is much powerful and flexible then putting all the logic in a customized analyzer. And if you want to search multiple fields, create a field called "all" and cat all the

search in all fields

2007-01-18 Thread David
Hi all: I study Lucene and I want build search on all the fields, I find MultiFieldQueryParser can search on multiple fields, but we must specify fields. maybe we can add a field named all_field that contain all the fields when indexing, but it make the index file larger. so how to make sea

Integrated File parser available?

2007-01-18 Thread Supheakmungkol SARIN
Dear all, I'd like to know whether there exists any integrated JAVA API that we can use to parse most of today's popular file formats? Currently I have been using one API for one file format and it's not so convenient. Thank in advance for your response. Best regards, S.S.