Counting and Categorisation

2007-02-08 Thread Kainth, Sachin
This email is meant for Chris Hostetter and of course anyone else who may know about this, I wonder if I can ask you a question. I have been reading of how you at CNET have implemented categorisation and counting so that if i type "Kodak Easyshare" in the reviews section you not only get a big li

'a', 's' and 't' don't index properly

2007-02-08 Thread Kainth, Sachin
> Hello, > > I have a database of tracks, artists and albums and I'm indexing these > 3 attributes plus also the first letter of the track thus (incidently > I'm using dotlucene but the implementation of dotlucene is similar to > the Java one): > >Document Doc = new Document(); >String Al

RE: Counting and Categorisation

2007-02-08 Thread Kainth, Sachin
solr is as it seems to be more suited to my application? Thanks Sachin -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 13:48 To: java-user@lucene.apache.org Subject: Re: Counting and Categorisation On Feb 8, 2007, at 8:28 AM, Kainth, Sachin wrote

RE: 'a', 's' and 't' don't index properly

2007-02-08 Thread Kainth, Sachin
hat they do. Oh, and get a copy of Luke if you haven't already. It'll let you examine your index, see the results of using various analyzers etc. Best Erick On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > I have a database of tracks, a

RE: 'a', 's' and 't' don't index properly

2007-02-08 Thread Kainth, Sachin
rom the javadoc... public final class *SimpleAnalyzer*extends Analyzer An Analyzer that filters LetterTokenizer with LowerCaseFilter. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Thanks Erik, > > Do you know of an analyzer which doesn't remove the characters 'a

categorisation

2007-02-08 Thread Kainth, Sachin
Chris has given an example of how to perform categorisation of lucene searches: String[] mfgs = ...; String query = "+category:cameras +price:[0 to 10]"; Query q = QueryParser.parse(query); Hits results = searcher.search(q, mySort) BitSet all = (new QueryFilter(q)).bits(reader) int[

RE: Analyzers

2007-02-08 Thread Kainth, Sachin
Can you give me an example of how this might be done? -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 17:34 To: java-user@lucene.apache.org Subject: Re: Analyzers Use PerFieldAnalyzerWrapper. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]>

RE: Analyzers

2007-02-08 Thread Kainth, Sachin
/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > Hi all, > > I wanted to know if it is possible to store some fields in an index > with one analyzers and other fields with another analyzer? > > Chee

RE: Empty search

2007-02-09 Thread Kainth, Sachin
e.org Subject: Re: Empty search 8 feb 2007 kl. 18.46 skrev Kainth, Sachin: > Is it my imagination or does lucene produce an error if you present it > with an empty string to search for? I presume you are referring to the QueryParser? It sounds about right that it would throw an except

RE: categorisation

2007-02-09 Thread Kainth, Sachin
It makes sense to me only if you tell me that all the bits in the BitSet "all" will be 1. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 18:37 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 8, 2007, at 12:36

RE: Empty search

2007-02-09 Thread Kainth, Sachin
You are right I didn't think about it at all to be honest. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 10:46 To: java-user@lucene.apache.org Subject: Re: Empty search 9 feb 2007 kl. 11.34 skrev Kainth, Sachin: > Yep it is the querypar

RE: categorisation

2007-02-09 Thread Kainth, Sachin
Ahhh it all makes sense to me now :-) -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 12:01 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 9, 2007, at 5:40 AM, Kainth, Sachin wrote: > It makes sense to me only if you tell

RE: categorisation

2007-02-09 Thread Kainth, Sachin
But does that not imply that a second search is made against the index by the line: BitSet all = (new QueryFilter(q)).bits(reader) -Original Message- From: Kainth, Sachin [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 12:05 To: java-user@lucene.apache.org Subject: RE: categorisation

RE: categorisation

2007-02-09 Thread Kainth, Sachin
Are you saying that without solr I will have caching problems under load? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 14:06 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 9, 2007, at 7:07 AM, Kainth, Sachin wrote: >

RE: categorisation

2007-02-09 Thread Kainth, Sachin
What does solr provide and how can I use it with dotLucene? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 09 February 2007 14:11 To: java-user@lucene.apache.org Subject: Re: categorisation On Feb 9, 2007, at 9:08 AM, Kainth, Sachin wrote: > Are you saying t

Lucene Web Service

2007-02-09 Thread Kainth, Sachin
Hello all, Does anyone know if there is a .NET version of Lucene Web Service? Cheers This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in w

RE: Lucene Web Service

2007-02-09 Thread Kainth, Sachin
: Lucene Web Service Hi You could try SOLR http://lucene.apache.org/solr/ This is obviously Java but you can access it using .NET... Hope this helps Patrick On 09/02/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > Hello all, > > Does anyone know if there is a .NET version of Luc

Solr issue

2007-02-12 Thread Kainth, Sachin
Hello all, When running the example in the solr release has anyone come up with the following issue when going to http://localhost:8983/solr/admin/: HTTP ERROR: 500 Unable to compile class for JSP Generated servlet error: 12-Feb-2007 16:24:17 org.apache.jasper.compiler.Compiler generateClass SEV

RE: Solr issue

2007-02-12 Thread Kainth, Sachin
-jar start.jar Regards, Marius Hanganu On 2/12/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hello all, > > When running the example in the solr release has anyone come up with > the following issue when going to http://localhost:8983/solr/admin/: > > HTTP ER

RE: Please Help me

2007-02-13 Thread Kainth, Sachin
I believe that this happens because "AND", "OR" and "NOT" are all reserved words for joining together other search terms and therefore if you don't want the exception thrown then you must capture any "AND", "OR" and "NOT"s that are entered on their own and not pass them to the QueryParser. -O

RE: Please Help me

2007-02-13 Thread Kainth, Sachin
I have a similar request. Does anyone know if Lucene is capable of implementing polyheirarchical taxonomies? -Original Message- From: Saroja Kanta Maharana [mailto:[EMAIL PROTECTED] Sent: 13 February 2007 13:45 To: java-user@lucene.apache.org Subject: Re: Please Help me Hi All, A

Caching

2007-02-14 Thread Kainth, Sachin
Hi all, I have read that Lucene performs caching of search results so that if you perform the same search in succession the second result should be returned faster. What I wanted to ask is whether this caching is any good or whether it's a good idea to add some sort of caching layer on top of Luc

RE: Caching

2007-02-14 Thread Kainth, Sachin
that these are the caches that are built at the first query. So, say storing the results of a query somewhere and returning that stored copy for the *next* query that is identical is not something I'd expect Lucene to do. Best Erick On 2/14/07, Kainth, Sachin <[EMAIL PROTECTED]>

Fields

2007-02-19 Thread Kainth, Sachin
Hi all, I have a few question regarding indexing documents. 1. With my experience of indexing documents with lucene so far I have done things like: Doc.Add(Field.Text("album", Album)); Where Album is a string representing an album name. Now with this sort of indexing what I do is a search such

RE: Fields

2007-02-19 Thread Kainth, Sachin
--- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 19 February 2007 16:05 To: java-user@lucene.apache.org Subject: Re: Fields See below. On 2/19/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi all, > > I have a few question regarding indexing documents. > > 1. W

Search in all fields

2007-02-19 Thread Kainth, Sachin
Hi All, I want to be able to do a search for a term in all fields in a document. One way this can be done is to put every element of a document in the default field (or I guess any other single named field) as well as separate fields in which those elements belong. So for example if for my docu

RE: Search in all fields

2007-02-20 Thread Kainth, Sachin
one and only one document, so unless you need complex queries, I'd just think about rewriting simple queries with ANDs as a SpanNearQuery. Best Erick On 2/19/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi All, > > I want to be able to do a search for a term in all f

RE: Search for a term in all fields

2007-02-21 Thread Kainth, Sachin
//www.nabble.com/Search-in-all-fields-tf3254569.html : Date: Tue, 20 Feb 2007 12:29:25 - : From: "Kainth, Sachin" <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Search for a term in all fields : : Hi all, : : How do I

pagination

2007-02-21 Thread Kainth, Sachin
Hello, I was wondering if Lucene provides any mechanism which helps in pagination. In other words is there a way to return the first 10 of 500 results and then the next 10 and so on. Cheers This email and any attached files are confidential and copyright protected. If you are not the address

RE: Search for a term in all fields

2007-02-21 Thread Kainth, Sachin
ll terms you specify in a query as field:term. Having some "special character" in the index doesn't help you because you still have to specify the field. And your two choices are still either a BooleanQuery that mentions all fields or indexing the data into a single field. Best Erick

RE: pagination

2007-02-21 Thread Kainth, Sachin
e.org Subject: Re: pagination See TopDocs, HitCollector, etc. Don't iterate through a Hits objects to get docs beyond, say, 100 since it's designed to efficiently return the first 100 documents but re-executes the queries each 100 or so times you advance to the next document. Erick On

Returning only a small set of results

2007-02-21 Thread Kainth, Sachin
Hi all, A question about efficiency and the internal workings of the Hits class. When we make a call to IndexSearcher's search method thus: Hits hits = searcher.Search(query); Do we actually, physically get back all the results of the query even if there are 20 million results or for efficiency

RE: Returning only a small set of results

2007-02-22 Thread Kainth, Sachin
What can you use in place of Hits and how do they differ? -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 21 February 2007 22:43 To: java-user@lucene.apache.org Subject: Re: Returning only a small set of results : A question about efficiency and the internal wor

RE: Returning only a small set of results

2007-02-22 Thread Kainth, Sachin
27;t go to the other classes unless you start getting performance problems with Hits. The main take-away from Hits is that it'll re-execute the query every 100 documents you read from it or so, so the only time you care is when you find yourself assembling large numbers of documents... Erick O

Index maintainance

2007-02-23 Thread Kainth, Sachin
Hi all, Just wondering how one would perform index maintainance. I know how to add new documents: writer = new IndexWriter(IndexDirectory, new PorterAnalyzer(), false); (incidently, I wrote PorterAnalyzer myself for the PorterStemFilter since I couldn't find an analyzer using it) But what I do

RE: Index maintainance

2007-02-23 Thread Kainth, Sachin
I've just been looking at IndexReader and it seems you can do it using that, but I don't know which concrete implementation of IndexReader to use. -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 23 February 2007 15:07 To: java-user@lucene.apache.org Subject: R

Index modification

2007-02-23 Thread Kainth, Sachin
Hi all, I am using the IndexModifier class to perform index modification. I have deleted 1 document from an index and the output indicates that 1 document does indeed get deleted. However, running the program again reveals that the document deleted has appeared again in the index. This despite

Date searches

2007-02-26 Thread Kainth, Sachin
Hi all, I have an index in which dates are represented as ranges of two integers (there are two fields one foreach integer). The two integers are years. AD dates are represented as a positive integer and BC dates as a negative one There are three possible types of ranges. These are listed below

Date Searches

2007-02-26 Thread Kainth, Sachin
Anybody? > __ > From: Kainth, Sachin > Sent: 26 February 2007 13:36 > To: 'java-user@lucene.apache.org' > Subject: Date searches > > Hi all, > > I have an index in which dates are represented a

RE: Date Searches

2007-02-26 Thread Kainth, Sachin
not numeric ranges. Is there a way to use numeric ranges? -Original Message- From: Seeta Somagani [mailto:[EMAIL PROTECTED] Sent: 26 February 2007 15:23 To: java-user@lucene.apache.org Subject: RE: Date Searches This might help. http://www.catb.org/~esr/faqs/smart-questions.html -

Spanned indexes

2007-03-01 Thread Kainth, Sachin
Hi all, Is it possible in Lucene for an index to span multiple files? If so what is the recommendation in this case? Is it better to span after the index reaches a particular size? Furthermore, does Lucene ever span a single record between two or more index files in this case or does it ensure

RE: indexing pdfs

2007-03-08 Thread Kainth, Sachin
Hi Aswin, You can try pdfbox to convert the pdf documents to text and then use Lucene to index the text. The code for turning a pdf to text is very simple: private static string parseUsingPDFBox(string filename) { // document reader PDDocument doc = PDDocument.loa

RE: indexing pdfs

2007-03-08 Thread Kainth, Sachin
kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 11:35 To: java-user@lucene.apache.org Subject: Re: indexing pdfs Is the only way index pdfs is to convert it into a text and then only index it ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi Aswin, > > You can tr

RE: indexing pdfs

2007-03-08 Thread Kainth, Sachin
link pls ashwin On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Well you don't need to actually save the text to disk and then index > the saved index file, you can directly index that text in-memory. > > The only other way I have heard of is to use Ifilters. I

Multiple segments

2007-03-08 Thread Kainth, Sachin
Hi all, I have been performing some tests on index segments and have a problem. I have read the file formats document on the official website and from what I can see it should be possible to create as many segments for an index as there are documents (though of course this is not a great idea). H

RE: Plural word search

2007-03-08 Thread Kainth, Sachin
Hi Tony, Lucene certainly does support it. It just requires you to use a tokeniser that performs stemming such as any analyzer that uses PorterStemFilter. Sachin -Original Message- From: Tony Qian [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 16:52 To: java-user@lucene.apache.org Subj

RE: indexing pdfs

2007-03-09 Thread Kainth, Sachin
7 02:48 To: java-user@lucene.apache.org Subject: Re: indexing pdfs hi sachin the link wat u gave me only a zip file and an exe file for downoad. and this zip file also contains no class files.but wouldn't we be requiring a jar file or class file ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECT

Complete field search

2007-03-13 Thread Kainth, Sachin
Hi all, Is it possible to search whether a term is equal to the entire contents of a field rather than that the field contains a term? So for example if I have a field with this text: "world cup" and I do a search for "cup" I want it to return false but for another field that contains exactly the

IndexReader.GetTermFreqVectors

2007-03-13 Thread Kainth, Sachin
Hi all, The documentation for the above method mentions something called a vectorized field. Does anyone know what a vectorized field is? This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is stric

RE: IndexReader.GetTermFreqVectors

2007-03-14 Thread Kainth, Sachin
enabled TermVector when creating the Document. i.e. new Field(, TermVector.YES) (see http://lucene.apache.org/ java/docs/api/org/apache/lucene/document/Field.TermVector.html for the full array of options) -Grant On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote: > Hi all, >