Re: Beginner's questions

2013-03-26 Thread Sashidhar Guntury
hi, I think this stack overflow question might be of some help to you- http://stackoverflow.com/questions/2842500/updating-lucene-index Note that the constructor method has changed and you might have to specify the append mode in the indexWriterConfig method. Take a look at this - http://lucene.a

Beginner's questions

2013-03-26 Thread Paul
Hi All, I've just begun to get my feet wet with Lucene and have a few simple questions: 1. Must the index writer read and index files on disk, or can i create documents in memory and ask the writer to index them? 2. I think I've seen examples of the behavior I asked about in (1). In these exam

[WEBINAR] - "Lucene/Solr 4 – A Revolution in Enterprise Search Technology"

2013-03-26 Thread Erik Hatcher
Excuse the blatant marketing, though for the benefit of the community... Join me tomorrow/today (March 27) for a webinar on what's new and improved in Lucene and Solr 4. It's the last call to register. Help me break the webinar syst

Re: Filter based on the sum of values of two fields

2013-03-26 Thread Yann-Erwan Perio
On Sun, Mar 24, 2013 at 10:46 AM, Wei Wang wrote: Hi, > For example, assume we have fields F1 and F2, we would like to find > all documents with condition F1+F2 > 5.0. This filter may be combined > with other filters to form a BooleanFilter. > > The question is, is there any way to construct an

Re: Simplifying Lucene 4 storage formats

2013-03-26 Thread Michael McCandless
I think you can get the most bang for your buck using the high-level controls to disable the parts of the index you don't need ... all codecs respect these. EG, index your fields with omitNorms=true, so no boost & length normalization is stored in the index / loaded at search time. Index with Ind

Re: Filter based on the sum of values of two fields

2013-03-26 Thread Wei Wang
Can someone give some hint on this? Or this is a tough problem. Thanks in advance. On Sun, Mar 24, 2013 at 2:46 AM, Wei Wang wrote: > Hello, > > We have documents with many numerical fields. In some search scenario, > we would like to create a filter based on the sum of the values of two > field

Simplifying Lucene 4 storage formats

2013-03-26 Thread Vitaly Funstein
This is probably a pretty general inquiry, but I'm just exploring this as an option at the moment. It seems that Lucene 4 adds some freedom to define how data is actually written to underlying storage by exposing the codec API. However, I find the learning curve for understanding what bits to chan

Re: DocValues memory usage

2013-03-26 Thread Michael McCandless
DiskDocValuesFormat is the right thing to use: it loads certain things into RAM, eg the compressed bits that tell it the addresses of the bytes on disk, but then leaves the actual bytes on disk. I believe the old DirectSource was more extreme as it left the addresses on disk too, so there were 2 s

Re: DocValues memory usage

2013-03-26 Thread Duke
I made the same experiment and got same result. Then I used per-field codec with DiskDocValuesFormat, it works like DirectSource in 4.0.0, but I'm not feeling confident with this usage. Anyone can say more about removing DirectSource API? On 2013-3-26, at 22:59, Peter Keegan wrote: > Inspir

DocValues memory usage

2013-03-26 Thread Peter Keegan
Inspired by this presentation of DocValues: http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene I decided to try them out in 4.2. I created a 1M document index with one DocValues field: BinaryDocValuesField conceptsDV = new BinaryDocValuesField("con

Re: Problems to get suggestions from an intermediate word using AnalyzingSuggester

2013-03-26 Thread Michael McCandless
AnalyzingSuggester only matches by prefix, by design. You can try AnalyzingInfixSuggester, which is currently two alternative patches on https://issues.apache.org/jira/browse/LUCENE-4845 And please post back any feedback you have on the issue ... as the issue stands I don't think either approach

Re: Get BitSet from Filter object in 4.1

2013-03-26 Thread Simon Willnauer
You can do Filter#getDocIdSet(reader, acceptedDocs).bits() yet, this method might return null if the filter can not be represented as bits or for other reasons like performance. simon On Tue, Mar 26, 2013 at 10:37 AM, Ramprakash Ramamoorthy wrote: > Team, > > We are migrating from 2.3

Get BitSet from Filter object in 4.1

2013-03-26 Thread Ramprakash Ramamoorthy
Team, We are migrating from 2.3 to 4.1, and we have implemented a method which does this *BitSet searchTermBits = searchQueryFilter.bits(reader); *searchQueryFilter is of type Filter and reader is an IndexReader object. How would I achieve the same using 4.1, any pointers wou

Re: Compression and Highlighter

2013-03-26 Thread Simon Willnauer
^5 ;) On Mon, Mar 25, 2013 at 11:02 PM, Bushman, Lamont wrote: > Thank you very much for the help Simon. I am amazed I was able to accomplish > what I wanted. I didn't store the body in the Index. And I used Highlighter > to return the best fragments by parsing my original document. > __

Problems to get suggestions from an intermediate word using AnalyzingSuggester

2013-03-26 Thread Andres Garcia
Hi all, My use case is very simple, given a string I would like to suggest all the possible urls that contain that string (given the limitations of the tokenizer and suggester). So far I have created a custom analyzer and tokenizer to parse urls, and that analyzer is used to create an AnalyzingSu