Re: [ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Andy
Congrats! A couple questions: 1) Which version of Solr is this based on? 2) How is LWE different from standard Solr? How should one choose between the two? Thanks. --- On Wed, 12/15/10, Grant Ingersoll wrote: > From: Grant Ingersoll > Subject: [ANN] General Availability of LucidWorks Enterp

Re: Re: Scale up design

2010-12-15 Thread Ganesh
Thanks for your information. My current stats: 250 GB of data, 40 GB of Index Size, 60 million records is working fine with 1 GB RAM. We are storing minmal amount of data in index. We are doing sorting on Date. Even in single system, the database are shard. We are planning to build hosted sol

lucene locking

2010-12-15 Thread Donld Hill
I have a app that seems to be locking on some search calls. I am including the stacktrace for the blocked and blocker thread. We are using the following jars lucene-snowball-2.1.0.jar and lucene-2.1.0.jar. The indexes are located on the local disk. We are running on multiple JVM's against the

Re: Where does Lucene recognise it has encountered a new term for the first time?

2010-12-15 Thread Li Li
I don't understand your problem well. but needing know when a new term occur is a hard problem because when new document is added, it will be added to a new segment. I think you can only do this in the last merge in optimization stage. You can read the codes in SegmentMerger.mergeTermInfos() . I

Scoring problem with MultiPhraseQuery?

2010-12-15 Thread Mike Cawson
I'm using MultiPhraseQuery to implement a fuzzy phrase query. E.g. user enters "blue lorry" and I expand 'blue' to 'turquoise', and 'glue' and 'lorry' to 'truck', 'van', 'lory' and 'lorrie'. I can then construct a MultiPhraseQuery with those lists of terms. The search works correctly but the

Where does Lucene recognise it has encountered a new term for the first time?

2010-12-15 Thread Mike Cawson
I’m using Lucene to index database records and text documents. I want to provide efficient fuzzy queries over the data so I’m using a secondary Lucene index for all of the distinct terms encountered in the primary index. Each ‘document’ in the secondary index is a term from the primary index wi

Re: Multivalued scoring

2010-12-15 Thread Erick Erickson
As Ryan mentions, you really should consider piling them all into a single index. Yes, it seems really wasteful to re-index author, URL with every last photo, but try it and see if the size is acceptable. Or, more accurately, whether performance is acceptable. Best Erick On Wed, Dec 15, 2010 at 1

RE: Multivalued scoring

2010-12-15 Thread Ryan Aylward
Would you be able to create a single index with all photos? Your searches would go against the photo index. At that point, you would have the most relevant photos regardless of album. You could then introduce a sort to your Lucene search to ensure all photos from a given album are grouped togeth

[ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Grant Ingersoll
Lucid Imagination is pleased to announce the general availability of our Apache Solr/Lucene powered LucidWorks Enterprise (LWE). LWE is designed to make it easier for people to get up to speed on search by providing easier management, integration with libraries commonly used in building search

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Grant Ingersoll
Have a look at http://lucene.apache.org/java/3_0_2/scoring.html on how Lucene's scoring works. You can override the Similarity class in Solr as well via the schema.xml file. On Dec 15, 2010, at 10:28 AM, Pavel Minchenkov wrote: > Hi, > Please give me advise how to create custom scoring. I ne

Re: Forcing specific index file names

2010-12-15 Thread Earl Hood
On Wed, Dec 15, 2010 at 1:41 PM, Chris Hostetter wrote: > files with the same names should be the same, files with differnet names > should be very different -- but if your binary diff tool is finding > commonalities between files in new segments as the index grows overtime, > and you feel like yo

Re: Forcing specific index file names

2010-12-15 Thread Chris Hostetter
: In my testing, when the filenames are the same, doing an xdelta on the : files (mainly the file that contains most of the data, the .cfs file), : there is a significant reduction in the size of the patch file created. AS noted elsewhere in this thread, the filenames themselves are significant

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Doron Cohen
Also, when taking the Similarity suggestion below note two things in Lucene's default behavior that you seem to wish to avoid: The first is IDF - but only for multi-term queries - otherwise ignore this comment. For multi term queries to only consider term frequency and doc length, you may want to

Multivalued scoring

2010-12-15 Thread Dennis Hendriksen
Hi, We are using a Lucene 3.x index to search for photo albums based on textual properties such as photo album title/author/URL and photo captions/URLs. Goal is to find the most relevant photo albums for a user query and display the best matching photos for these albums. In our current solution w

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Ian Lea
Sounds to me that lucene should do a pretty good job without any extra work on your part. See javadocs for org.apache.lucene.search.Similarity for details on how it works. You can change things by providing your own implementation. There is also the org.apache.lucene.search.function package but

Custom scoring for searhing geographic objects

2010-12-15 Thread Pavel Minchenkov
Hi, Please give me advise how to create custom scoring. I need to result that documents were in order, depending on how popular each term in the document (popular = how many times it appears in the index) and length of the document (less terms - higher in search results). For example, index contai

Re: Forcing specific index file names

2010-12-15 Thread Earl Hood
On Wed, Dec 15, 2010 at 7:49 AM, Doron Cohen wrote: > Perhaps I'll change my mind after understanding the scenario that creates > this, but for now I'd rather not to ignore the file names differences. It may be possible to control the data generation process, so the filenames are consistent. Chan

Re: Forcing specific index file names

2010-12-15 Thread Doron Cohen
> I could make an exception in the patch creation program to detect > that there is a lucene directly, and diff the .cfs files, even if > they have different names, but was seeing if I can avoid that > so the patch program can be agnostic about the contents of the > directory tree. > Doing only th

Re: Scale up design

2010-12-15 Thread Toke Eskildsen
On Wed, 2010-12-15 at 09:42 +0100, Ganesh wrote: > What is the advantage of going for 64 Bit. Larger maximum heap, more memory in the machine. > People claim performance and usage of more RAM. Yes, pointers normally take up 64bit on a 64bit machine. Depending on the application, the overhead can

Re: Scale up design

2010-12-15 Thread Ganesh
What is the advantage of going for 64 Bit. People claim performance and usage of more RAM. In 32 Bit OS, JVM handles 1 to 1.5 GB of RAM then in case of 64 Bit, Single JVM cannot use more than 1.5 GB RAM? What if we host multiple JVM instance in the single system. Please help me with some mor