Re: Recent Paging Change?

2009-02-10 Thread Otis Gospodnetic
Hi, I don't recall any paging changes. Perhaps you can run things with something like YouKit and see where queries were spending the most time in the old version and where they are spending time in the new version? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Performance degradation caused by choice of range fields

2009-02-10 Thread Otis Gospodnetic
Hi, Did you commit (reopen the searcher) during the performance degradation period and did any of your queries use sort? If so, perhaps your JVM is accumulating those thrown-away FieldCache objects and then GC has more and more garbage to clean up, causing pauses and lowering your overall thro

Re: optimization failed

2009-02-10 Thread Otis Gospodnetic
Hi Qingdi, Hm, I've never encountered this problem. You didn't mention your Solr version. If I were you I would grab the nightly build tomorrow, because tonight's Solr nightly build should include the very latest Lucene jars. Of course, this means running Solr 1.4-dev. Otis -- Sematext --

Re: Is there a way to query for this value?

2009-02-10 Thread Otis Gospodnetic
Hi Ian, I'll assume this actually did get indexed as a single token, so there is no problem there. As for query string escaping, perhaps this method from Lucene's QueryParser will help: /** * Returns a String where those characters that QueryParser * expects to be escaped are escaped by

score filter

2009-02-10 Thread Cheng Zhang
Hello, Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did not work. Many thanks, Kevin

Is there a way to query for this value?

2009-02-10 Thread Ian Connor
I have tried to escape the characters as best I can, but cannot seem to find one that works. The value is: 10.1002/(SICI)1096-9136(199604)13:4<390::AID-DIA121>3.0.CO;2-4 It is a doi (see http://doi.org), so is a valid value to search on. However, when I query this through ruby or even the admin

Recent Paging Change?

2009-02-10 Thread wojtekpia
Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm seeing much worse performance (75% drop in throughput) when I request 20 records starting at record 180 (page 10 in my application). Thanks. Wojtek -- View this message in context: http://www.nabble.com/Recent-Paging-

Re: Vertical Partitioning advice

2009-02-10 Thread Mark Kranz
I ended up pursuing the ParallelWriter http://issues.apache.org/jira/browse/LUCENE-600 , so we can map different fields to different indexes. This appears to keep the indexes in sync, although I still need to do more testing. However, some ugly hackery was needed to get it to extend SolrIndexWrit

optimization failed

2009-02-10 Thread Qingdi
Our index size is about 60G. Most of the time, the optimization works fine. But this morning, the optimization kept creating new segment files until all the free disk space (300G) was used up. Here is how the files generated during optimization look like: =

Re: Solr Cell (ExtractingRequestHandler) and plain text files

2009-02-10 Thread Erik Hatcher
On Feb 10, 2009, at 10:57 AM, Grant Ingersoll wrote: So, this seems to be an issue with Tika and it's mime type detection of plain text. For some discussion on it, see http://www.lucidimagination.com/search/document/64e27546d23e67b9/mime_type_identification_of_plain_text_files and also http

Re: Vertical Partitioning advice

2009-02-10 Thread Grant Ingersoll
ParallelReader is definitely out there on the Lucene landscape. See http://www.lucidimagination.com/search/page:2?q=ParallelReader for some background discussion, including Doug's original post on it and some others view of the use case. The key is that the small index has to be rebuilt in

Re: Solr Cell (ExtractingRequestHandler) and plain text files

2009-02-10 Thread Grant Ingersoll
So, this seems to be an issue with Tika and it's mime type detection of plain text. For some discussion on it, see http://www.lucidimagination.com/search/document/64e27546d23e67b9/mime_type_identification_of_plain_text_files and also https://issues.apache.org/jira/browse/TIKA-154, which has

Re: Solr Cell (ExtractingRequestHandler) and plain text files

2009-02-10 Thread Grant Ingersoll
OK, I have reproduced this. Let me debug for a moment and then we can likely file a JIRA On Feb 9, 2009, at 10:17 PM, Erik Hatcher wrote: One other person has reported this to me off-list, and I just encountered it myself. ExtractingRequestHandler does not handle plain text files properl

Re: Feedback needed on sharding/distributed search

2009-02-10 Thread Yonik Seeley
On Tue, Feb 10, 2009 at 10:13 AM, Rajiv2 wrote: > 1. What is the benefit of using sharding/distributed search over keeping > the > index intact? Primarily response time of single requests. If your response times are fast enough with a single index, then simply replicate that index for fau

Feedback needed on sharding/distributed search

2009-02-10 Thread Rajiv2
Hello, we’re currently in the midst of re-designing our search hardware architecture and I have some questions about sharding and distributed search. 1. What is the benefit of using sharding/distributed search over keeping the index intact? 2. What is the best approach to determinin

Re: Moving from single core to multicore

2009-02-10 Thread Michael Lackhoff
On 10.02.2009 02:39 Chris Hostetter wrote: > : Now all that is left is a more cosmetic change I would like to make: > : I tried to place the solr.xml in the example dir to get rid of the > : "-Dsolr.solr.home=multicore" for the start and changed the first entry > : from "core0" to "solr" and moved

Re: solr booosting

2009-02-10 Thread Marc Sturlese
Thanks Hoss, that was really useful information. hossman wrote: > > > : As I understood lucene's boost, if you search for "John Le Carre" it > will > : give better score to the results that contains just the searched string > that > : results that have, for example, 50 words and the search is c