Lucene newbee quesiton- Term Positions

2007-10-07 Thread Developer Developer
Hello, I have simple lucene 2.2 index created. I want to list all the terms and their positions in a document. how can I do it ? Can you please provide some sample code. Thanks !

Re: Lucene newbee quesiton- Term Positions

2007-10-07 Thread Developer Developer
storing the entire document, then when you > needed to count terms, just using one of the tokenizers and > counting them yourself > > Best > Erick > > On 10/7/07, Developer Developer <[EMAIL PROTECTED]> wrote: > > > > Hello, > > > > I have si

Use of Field(String name, TokenStream tokenStream)

2007-10-07 Thread Developer Developer
Hello Frens, I am observing that a Field constructed using tokenStream i.e Filed fl = new Field(String name, TokenStream tokenStream) is not converted to the lower case when stored in the index. The terms in the index are exactly same as those in tokenStream. When I do a phrase search,the Phras

Re: Crawling in Nutch

2007-12-11 Thread Developer Developer
use luke to explroe the index. the content is present in the content field. However, it is not stored so you can only search on it. On Aug 1, 2007 9:59 AM, Srinivasarao Vundavalli <[EMAIL PROTECTED]> wrote: > Hi, > Where does (in which field) nutch stores the content of a document > while in

Accessing parsed content in Nutch

2007-12-12 Thread Developer Developer
I believe nutch stores parsed content somewhere. Can you please let me know how I can access the parsed content given a url ? Thanks !

Merging Lucene documents

2008-01-06 Thread Developer Developer
Hello Friends, I have a unique requirement of merging two or more lucene indexed documents into just one indexed document . For example Document newDocutmet = doc1+doc2+doc3 In order to do this I am planning to extract tokenstreams form each document ( i.e doc1, doc2 and doc3) , and use them to

Re: Merging Lucene documents

2008-01-06 Thread Developer Developer
is is an index process and (presumably) can take some time, > you could either concatenate the strings together in memory and index > the string or write it to a file on disk and then index *that*. > > If this is way off base, perhaps a bit more explanation of the problem > you're tr

Re: Question regarding adding documents

2008-01-07 Thread Developer Developer
here is another approach. StandardAnalyzer st = new StandardAnalyzer(); StringReader reader= new StringReader("text to index..."); TokenStream stream = st.tokenStream("content", reader); Then use the Field constructor such as *Field

Re: Merging Lucene documents

2008-01-07 Thread Developer Developer
ill > be an utter pain in the neck because you'll be waiting forever for each > iteration of your index since you have to go back to the site each time. > *Then* worry about elegance ... > > But I will say, about offsets: If you're overriding next(), you can make > them a

Re: Self Join Query

2008-01-08 Thread Developer Developer
Provide more details please. Can you not use boolean query and filters if need be ? On Jan 8, 2008 7:23 AM, sachin <[EMAIL PROTECTED]> wrote: > > I need to write lucene query something similar to SQL self joins. > > My current implementation is very primitive. I fire first query, get the > res

Re: How to model hierarchy info to be searched related to a document

2008-01-13 Thread Developer Developer
Roger, Why can;t you have one document for every combination of dimension, level ? Add cube name , id and description too as a field to all documents , all it would be reduntant information, but you can live with it i suppose? I think you are developing an application to search a cube ? what do

Re: How to model hierarchy info to be searched related to a document

2008-01-14 Thread Developer Developer
I am not sure why you are afraid of adding more fields to the document. Having 20-30 fields to a document is not a bad thing to do. Do you have any constraints to limit the number of fields in the document? On Jan 14, 2008 7:59 AM, Roger Camargo <[EMAIL PROTECTED]> wrote: > Thanks for ans

Re: How to model hierarchy info to be searched related to a document

2008-01-14 Thread Developer Developer
Yeah I think what u need is one Filed where you store a list of propertytag and value combination and also be able to search on the filed on values and identify that the value is for a particular propertytag. something like propertytag1, value propertytag2,value propertytag3,value etc To be fran

Re: Retain the index

2008-01-25 Thread Developer Developer
Check if there are any lock files in your index directory after the process is completed. There should be no lock files if the index was correctly closed . On Jan 25, 2008 8:59 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > This should not be happening. I've got to assume that you have > more t

How does lucene handle content-type

2007-04-05 Thread Developer Developer
I am using WGET to download content from the www with ---save-header option. The save-header option saves the hppt header to the downloaded files. Does Lucene make use of content type while indexing or I have to parse the header , determine the content-type and determine the right set of action