Re: Searching sets of documents

2008-10-14 Thread Ganesh
Sorry. I misunderstood. As Karsten suggested, perform search for each term and do the logical operation on the collected hits. Regards Ganesh - Original Message - From: <[EMAIL PROTECTED]> To: Sent: Tuesday, October 14, 2008 6:20 PM Subject: Re: Searching sets of documents The p

Re: Question regarding sorting and memory consumption in lucene

2008-10-14 Thread Mark Harwood
Yes, StringIndex's public fields make life awkward. Re initialization - I did think you could try use arrays of byte arrays. First 256 terms can be addressed using just one byte array, on encountering a 257th term an extra byte array is allocated. References to terms then require indexing into

Re: search with accent not match

2008-10-14 Thread lekamm
http://www.blardone.org/2008/10/12/lucene-query-accented-character/ Is specific about Php, but can be easily use try to solve the same problem in Java. I had the same problem as "Christophe from paris", and changing the query to it's html encoded equivalent makes my search queries work. So Perh

Re: Searching sets of documents

2008-10-14 Thread Anshum
Hi Spring, If I got your question correctly, you want to search for Folders/Docs depending on the condition, right! Why don't you index the folder name as well and so you could fire a query saying Folder:A and (TEXT:x and TEXT:y) So here the search would run only on folder A for the keywords. In

I am not able to run Lucene 2.4 Demo

2008-10-14 Thread prabina pattanayak
Hi All, i am a beginner to Lucene.  and i am trying to use Lucene 2.4. when i have set lucene-core-2.4.0.jar & lucene-demos-2.4.0.jar in my CLASSPATH. and trying to run:  java org.apache.lucene.demo.IndexFiles E:\prabina\lucene-2.4demo\src it shows the error: caught a class java.io.FileNotFou

Re: distinct field values

2008-10-14 Thread Antony Bowesman
Akanksha Baid wrote: I have indexed multiple documents - each of them have 3 fields ( id, tag , text). Is there an easy way to determine the set of tags for a given query without iterating through all the hits? For example if I have 100 documents in my index and my set of tag = {A, B, C}. Query

Re: distinct field values

2008-10-14 Thread Anshum
You could go through this implementation. Have been using this (improvised) for a while now. There might be better ways to do so too. so you could check! http://www.gossamer-threads.com/lists/lucene/java-user/35704?search_string=categorycounts;#35704 -- Anshum Gupta Naukri Labs! http://ai-cafe.blo

Re: distinct field values

2008-10-14 Thread Khawaja Shams
Hi, You may also want to take a look at Carrot2: http://demo.carrot2.org/demo-stable/main Lucene documentation references them, but I was disappointed to see that they had an open source version (really old) and one that you can buy. It may work for you. Also, take a look at SOLR's implementatio

Re: Custom Sorting Based on Input Value

2008-10-14 Thread Chris Hostetter
: 3> maybe you could provide a custom sorter by using : SortComparator, although you should look at the warnings : in the API. : : Now I'll wait for Hoss to say "Isn't that what XXX provides" ... I can't think of anything that would solve this problem direclty, mianly because i can't think of a

Re: querying without hits

2008-10-14 Thread Chris Hostetter
: Could one of you point me to an example of code for querying without using : the deprecated class Hits ? The demo code included with Lucene releases was updated in Lucene 2.4 so that it does not use the Hits class. -Hoss -

Re: distinct field values

2008-10-14 Thread Chris Hostetter
: For example if I have 100 documents in my index and my set of tag = {A, B, C}. : Query Q on the text field returns 15 docs with tag A , 10 with tag B and none : with tag C (total of 25 hits). Is there a way to determine that the set of : tags for query Q = {A, B} without iterating through all 25

Closing Index Reader

2008-10-14 Thread Khawaja Shams
Hello, I am using the reopen method in the IndexReader class. In the case of the IndexReader being updated, I would like to create a new IndexSearcher and close the old IndexReader. When closing an instance of IndexReader, do I have to wait for currently executing searches (through an IndexSearche

Spellchecker Evaluation Criteria

2008-10-14 Thread mattspitz
So, it appears to me that the criteria for a "good suggestion" is the n-gram overlap of a given term, not the edit distance. Thus, if we're looking for "britney", but we mess up and type "birtney", "kortney" will come up before "birtney." Is there a way to force the SpellChecker to use the edit

Re: search with accent not match

2008-10-14 Thread Chris Hostetter
: http://www.blardone.org/2008/10/12/lucene-query-accented-character/ thta post appears to be specificly about a PHP function to convert UTF-8 characters to their HTML equivilents ... which doesn'trelaly seem relevant to the posters question ... : > I'm use FrenchAnalyzer for index ..

Re: is there an histogram feature in lucene ak Magelan

2008-10-14 Thread Chris Hostetter
: Have a look at SOLR (*lucene.apache.org/solr*). It is based on Lucene and : provides additional functionalities including faceted search. to more generally answer you question: while you may not find a lot of info searching for "lucene histogram" you should find loots of info about achieving

Re: Question regarding sorting and memory consumption in lucene

2008-10-14 Thread Chris Hostetter
: Actually looking at this a little deeper maybe Lucene could/should : automatically be doing this "short" optimisation here? At the moment it can't, the array's in StringIndex are public. The other thing that would be a bit tricky is the initialization ... i can't think of any easy way to kn

Re: Searching Log Files

2008-10-14 Thread Paul Smith
On 15/10/2008, at 7:37 AM, Chris Gilliam wrote: Hello Everyone, New to Lucene.. We currently roughly 100Gig of log files. We are needing to build a search application that can return rows of data from the files and combine the results? Does Lucene index the content in the files? Will i

Terms Matching Query

2008-10-14 Thread Paul Davis
I'm working on indexing JSON documents via Lucene and I've run into a bit of a snag. Currently, I'm indexing JSON documents by adding fields that are path/value pairs. For example, given a JSON document like: { "name": { "first": "Paul", "last": "Davis" } "jobs": ["hotdog v

Re: WELCOME to java-user@lucene.apache.org

2008-10-14 Thread Chris Gilliam
Hello Everyone, New to Lucene.. We currently roughly 100Gig of log files. We are needing to build a search application that can return rows of data from the files and combine the results? Does Lucene index the content in the files? Will it be able to find matching criteria say a date and then

Searching Log Files

2008-10-14 Thread Chris Gilliam
Hello Everyone, New to Lucene.. We currently roughly 100Gig of log files. We are needing to build a search application that can return rows of data from the files and combine the results? Does Lucene index the content in the files? Will it be able to find matching criteria say a date and then

Re: distinct field values

2008-10-14 Thread Akanksha Baid
Is there something I could do to Index the documents differently to accomplish this? Currently I am looking at all the hits to generate the set of tags for the query. If I need to implement the same thing within Lucene, I am not sure if I will gain anything performance wise. Or am I wrong about

Re: Unique tokens analyzer

2008-10-14 Thread markharw00d
Related: https://issues.apache.org/jira/browse/LUCENE-725 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

ThreadSafe SpellChecker?

2008-10-14 Thread mattspitz
I was wondering if the Lucene SpellChecker class was threadsafe, specifically, indexDictionary(). Such that: for (int i = 0; i < numReaders; i++) { //spawn new thread to run: spellchecker.indexDictionary(new LuceneDictionary(readers[i], myField)); } Would work. Thanks, Matt -- Vie

Re: Newbie Question - Lucene Sorting NOT Ignoring NULL values

2008-10-14 Thread Yonik Seeley
You're on the right track I think... perhaps try using RangeFilter directly rather than creating your own class. Something like: Filter filter = RangeFilter.More("lastUpdatedDate",""); searcher.search(query, filter) If that works for you, then the next step would be to look at CachingWrapperFilt

Re: Newbie Question - Lucene Sorting NOT Ignoring NULL values

2008-10-14 Thread Reetha Hariharan
Hi Yonik, Thanks for your reply. In my case I don't want those document that has Null value for the field that I am willing to sort I tried writing my own filter using RangeFilter, but it doesn't work. I used something like the following in my custom filter. public class NotNullRangeFilter ext

Unique tokens analyzer

2008-10-14 Thread Rafael C. de Almeida
Hello, Is there a analyzer that will tokenize the stream such that there's no repeated tokens in the stream? I have a keyword-field on my document, so if one keyword already appears on the list there's no point in having it shown again. Does it make sense having that analyzer? Or indexing

Re: Newbie Question - Lucene Sorting NOT Ignoring NULL values

2008-10-14 Thread Yonik Seeley
On Tue, Oct 14, 2008 at 4:35 AM, Reetha Hariharan <[EMAIL PROTECTED]> wrote: > I am searching using one field, say X and want to sort the results using > another, say Y (Which can have null values). But I am expecting Sort to > ignore all the null values and just sort only records that has values i

Re: Modification of positional information encoding

2008-10-14 Thread Renaud Delbru
Hi Michael, Michael McCandless wrote: Also, this issue was just opened: https://issues.apache.org/jira/browse/LUCENE-1419 which would make it possible for classes in the same package (oal.index) to use their own indexing chain. With that fix, if you make your own classes in oal.index pa

Re: RE: Searching sets of documents

2008-10-14 Thread Steffen Ramlow
Well, this is what I thought too. Thank you. Original-Nachricht > Datum: Tue, 14 Oct 2008 02:11:53 -0700 (PDT) > Von: "Karsten F." <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: RE: Searching sets of documents > > Hi spring, > unit of retrieval in lucene is a

Re: Searching sets of documents

2008-10-14 Thread spring
The problem is the logical combination of documents in folders not of terms in documents. See original post. Original-Nachricht > Datum: Tue, 14 Oct 2008 16:29:15 +0530 > Von: "Ganesh" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: Searching sets of documen

Re: Custom Sorting Based on Input Value

2008-10-14 Thread Erick Erickson
I don' know how tight your result must be, but here's a couple of ideas 1> you could boost your target by a huge amount, although forming your query might be "interesting". If you somehow worked the clause fieldA:5^1 say. I suspect that some of your results wouldn't be on top, but it might

Re: Searching sets of documents

2008-10-14 Thread Ganesh
What is your problem? If the foldernames are already stored then it could be retrieved from search. Use DuplicateFilter on field "foldername" to get the unique list of folders. Hope this helps. Regards Ganesh - Original Message - From: <[EMAIL PROTECTED]> To: Sent: Tuesday, Octob

Re: Searching sets of documents

2008-10-14 Thread spring
The folder name and the document name are stored for each document. Original-Nachricht > Datum: Tue, 14 Oct 2008 14:11:09 +0530 > Von: "Ganesh" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: Searching sets of documents > You should have stored the foldernam

RE: Searching sets of documents

2008-10-14 Thread Karsten F.
Hi spring, unit of retrieval in lucene is a document. There are no joins between document sets like in sql. What you can do is to collect all hits for each term query on level of folders and than implement the logical „and“ or „or“ by your own. For this you could reuse the existing implementation

Re: LUCENE-1282 worked around in Lucene 2.4?

2008-10-14 Thread Michael McCandless
2.4.0 does have the workaround for that JRE bug. Mike Michael Bell wrote: this is the issue with Java 6's server VM. Yes I know it's fixed in Sun's beta update to Java 1.6, but did the workaround get committed to 2.4? It is not documented in the CHANGELOG. Thanks

Re: querying without hits

2008-10-14 Thread Ganesh
Hello David, Use TopDocs or TopFieldDocs to collect only required hits. TopDocs topDocs = searcher.search(query,10) int docID = topDocs.scoreDocs[index].doc; Document doc = searcher.doc(docID); Regards Ganesh - Original Message - From: "David Massart" <[EMAIL PROTECTED]>

Re: Searching sets of documents

2008-10-14 Thread Ganesh
You should have stored the foldername or fullpath of the file as part of Lucene document otherwise it is difficult to retrieve. Regards Ganesh - Original Message - From: "叶双明" <[EMAIL PROTECTED]> To: Sent: Tuesday, October 14, 2008 6:13 AM Subject: Re: Searching sets of documents

Newbie Question - Lucene Sorting NOT Ignoring NULL values

2008-10-14 Thread Reetha Hariharan
Hi, I am a newbie. I just configured lucene using hibernate search. But I find that the sorting doesn't ignore null values. I am searching using one field, say X and want to sort the results using another, say Y (Which can have null values). But I am expecting Sort to ignore all the null values

Re: distinct field values

2008-10-14 Thread Anshum
Hi, You could try changing (or extending) TopFieldDocCollector and do your processing there (that is what I tried... and it worked fine). But that would mean changing lucene code a little bit. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody,

LUCENE-1282 worked around in Lucene 2.4?

2008-10-14 Thread Michael Bell
this is the issue with Java 6's server VM. Yes I know it's fixed in Sun's beta update to Java 1.6, but did the workaround get committed to 2.4? It is not documented in the CHANGELOG. Thanks - To unsubscribe, e-mail:

distinct field values

2008-10-14 Thread Akanksha Baid
I have indexed multiple documents - each of them have 3 fields ( id, tag , text). Is there an easy way to determine the set of tags for a given query without iterating through all the hits? For example if I have 100 documents in my index and my set of tag = {A, B, C}. Query Q on the text field r