facet vs group search

2012-02-27 Thread jianwen lou
What is the difference between faceted search and group search? I read the wikipedia page:http://en.wikipedia.org/wiki/Faceted_search and here is a project named bobo-browse that implements faceted search,any explanation for details? thanks -- * *twitter.com/loujianwen

Reverse wildcarding

2012-02-27 Thread Michael Bell
(This is an expanded version of the post I made before in the hopes someone will comment) I am trying to port the reverse wildcard support from SOLR to base Lucene. In broad strokes, I will use a PerFieldAnalyzer map with the INDEXWRITER such that fields that I want to be indexed both ways will

Re: QueryParser strange behavior

2012-02-27 Thread Damerian
Στις 27/2/2012 11:45 πμ, ο/η Ian Lea έγραψε: Does your analyzer look for a field called content, not contents? -- Ian. On Sat, Feb 25, 2012 at 6:37 AM, Damerian wrote: Hello! I have a small issue with the QueryParser in my program. It uses my custom filter to Parse its queries, but i get u

RE: Customizing indexing of large files

2012-02-27 Thread Prakash Reddy Bande
Hi, Thanks all. So the answer is a custom Reader implementation. I was beating around the bush with Tokenizer. Regards, Prakash Bande Director - Hyperworks Enterprise Software Altair Eng. Inc. Troy MI Ph: 248-614-2400 ext 489 Cell: 248-404-0292 -Original Message- From: Steven A Row

RE: Customizing indexing of large files

2012-02-27 Thread Steven A Rowe
PatternReplaceCharFilter would probably work, or maybe a custom CharFilter? *CharFilter has the advantage of preserving original text offsets, for highlighting. Steve > -Original Message- > From: Glen Newton [mailto:glen.new...@gmail.com] > Sent: Monday, February 27, 2012 12:57 PM > To

Re: Customizing indexing of large files

2012-02-27 Thread Glen Newton
Hi, Understood. Write a custom FileReader that filters out the text you do not want. This will do it streaming. Glen On Mon, Feb 27, 2012 at 12:46 PM, Prakash Reddy Bande wrote: > Hi, > > Description is multiline, in addition there is other text also. So, > essentially what I need id to jump t

RE: Customizing indexing of large files

2012-02-27 Thread Prakash Reddy Bande
Hi, Description is multiline, in addition there is other text also. So, essentially what I need id to jump the DATA_END as soon as I hit DATA_BEGIN. I am creating the field using the constructor Field(String name, Reader reader) and using StandardAnalyser. Right now I am using FileReader which

Re: Customizing indexing of large files

2012-02-27 Thread Glen Newton
I'd suggest writing a perl script or insert-favourite-scripting-language-here script to pre-filter this content out of the files before it gets to Lucene/Solr Or you could just grep for "Data' and"Description" (or is 'Description' multi-line)? -Glen Newton On Mon, Feb 27, 2012 at 11:55 AM, Prakas

Customizing indexing of large files

2012-02-27 Thread Prakash Reddy Bande
Hi, I want to customize the indexing of some specific kind of files I have. I am using 2.9.3 but upgrading is possible. This is how my file's data looks * Data for 2010 Description: This section has a general description of the data. DATA_BEGIN Month P1

Re: Can I detect incorrect language selection after creating an index?

2012-02-27 Thread Glen Newton
Do the check _before_ indexing. Use https://code.google.com/p/language-detection/ to verify the language of the text document before you put it in the index. -Glen Newton http://zzzoot.blogspot.com/ On Mon, Feb 27, 2012 at 10:53 AM, Ilya Zavorin wrote: > Suppose I have a bunch of text documents

Can I detect incorrect language selection after creating an index?

2012-02-27 Thread Ilya Zavorin
Suppose I have a bunch of text documents in language X but I index ithem using an analyzer for language Y. Once the index is created, is it possible to perform some sort of simple "sanity" check to see if the original language selection was wrong? I presume I can try searching for some common wo

RE: Most recent document within a group ...

2012-02-27 Thread Dragon Fly
I'll give it a try, thanks. > Date: Mon, 27 Feb 2012 08:29:57 -0500 > Subject: Re: Most recent document within a group ... > From: erickerick...@gmail.com > To: java-user@lucene.apache.org > > Just try it. Sorting doesn't load the document, it does load > the unique values for the sort field. Wh

Re: Most recent document within a group ...

2012-02-27 Thread Erick Erickson
Just try it. Sorting doesn't load the document, it does load the unique values for the sort field. Which is why indexing dates benefits from using the coarsest resolution you can, i.e. don't store millisecond resolution if all you care about is the day something was published. In fact, sorting doe

RE: Most recent document within a group ...

2012-02-27 Thread Dragon Fly
Erick, what if the search returns 100,000 hits? I'm trying to avoid loading a large number of documents from disk (i.e. a slow operation) and then pick up the top one. I know how to execute a search (sorted by date). Is there a way to just load the first hit from disk? I don't know which Luce

Re: QueryParser strange behavior

2012-02-27 Thread Ian Lea
Does your analyzer look for a field called content, not contents? -- Ian. On Sat, Feb 25, 2012 at 6:37 AM, Damerian wrote: > Hello! > > I have a small issue with the QueryParser in my program. > It uses my custom filter to Parse its queries, but i get unexpexted results > from when i am having