Re: Getting better snippets in highlighting component

2013-03-29 Thread Jorge Luis Betancourt Gonzalez
Hi Jack: Thanks for the reply, exactly I know is a common thing to encounter this TOC in a lot of files, I'm plying with the regex fragmenter to be a little more selective about the generated snippets, but no luck so far. - Mensaje original - De: "Jack Krupansky" Para: solr-user@lucene

Re: Getting better snippets in highlighting component

2013-03-29 Thread Jack Krupansky
It looks like a table of contents. The dots are followed by the page number, followed by the text from the next table of contents entry, and repeat. Even Google doesn't do anything special for this. For example, search for "chapter 1 chapter 2 pdf": [PDF] 2013 Publication 505 - Internal Reven

Getting better snippets in highlighting component

2013-03-29 Thread Jorge Luis Betancourt Gonzalez
Hi all: I'm building a document search plattform, basically indexing a lot of PDF files. Some of this files has an index, which means that when I query for "normativos" in my application (built using Symfony2+PHP+Solarium) I get a few results like this:

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-29 Thread adityab
@Mark attached are the full logs from both master and slave. Hope this might be some help. console_master.log console_slave.log Ignore the mbeans call in

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-29 Thread Mark Miller
That's pretty weird stuff. As a workaround, you might stop replicating your conf files - that takes a sketchier path at the moment. The key to solving this is to figure out how the heck the slave is increasing it's gen…that should require a commit. In this case, *lots* of them. Commits that don

RE: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Vaillancourt, Tim
Here we go: https://issues.apache.org/jira/browse/SOLR-4470 Tim -Original Message- From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] Sent: Friday, March 29, 2013 3:25 PM To: solr-user@lucene.apache.org Subject: RE: Basic auth on SolrCloud /admin/* calls Agreed, we don't have client

RE: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Vaillancourt, Tim
Agreed, we don't have clients hitting Solr directly, it is used like a backend database in our usage by intermediaries, similar to say MySQL. Although restricting the access to Solr to fewer hosts is something, I still feel an application has no business being able to perform admin level calls,

Re: Too many fields to Sort in Solr

2013-03-29 Thread adityab
Joel, thanks for your excellent idea using docValues. its working exactly as you described. So far my unit test case has no issues and i see low memory foot print. Will be sending the build for performance that should give comparable numbers. Now i see another replication issue in 4.2. there is

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-29 Thread adityab
Something is really wrong with replication. Check the document attached which has the screen shot. I - re-indexed the master after adding new fields to schema file (its part of config file replication) The UI shows master as gen '6' where as in slaves log the Master gen is '7' The attached docu

Re: DocValues vs stored fields?

2013-03-29 Thread Marcin Rzewucki
By the way: even if a field has DocValues with "on disk" option enabled it has to have stored="true" to be retrievable. Why ? On 29 March 2013 20:51, Otis Gospodnetic wrote: > Hi, > > The current field update mechanism is not really a field update > mechanism. It just looks like that from the

4.2 Admin UI

2013-03-29 Thread Chris R
I've notice on the Admin UI that on some of my nodes that Core Selector combo box doesn't populate. Known issue? Chris

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-29 Thread Chris R
Yes, removing the absolute value cured the problem, but I feel like there should be a better option than the "default". Given multiple collections, there should be some ability within the API to lay down the directory structure in a different way e.g. ./collection/shard as opposed to the current a

Re: per-fieldtype similarity not working

2013-03-29 Thread mike.vogel
Any example or suggestion for how to patch the wrapper so that coord method is called for the field type with the custom similarity? -- View this message in context: http://lucene.472066.n3.nabble.com/per-fieldtype-similarity-not-working-tp3987050p4052470.html Sent from the Solr - User mailing

Re: DocValues vs stored fields?

2013-03-29 Thread Marcin Rzewucki
Hi Otis, Currently, whole record has to be stored on disk in order to update single field. Are you trying to say that it won't be necessary with the use of DocValues ? Sounds great! Regards. On 29 March 2013 20:51, Otis Gospodnetic wrote: > Hi, > > The current field update mechanism is not re

Re: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Mark Miller
This has always been the case with Solr. Solr's security model is that clients should not have access to it - only trusted intermediaries should have access to it. Otherwise, it should be locked down at a higher level. That's been the case from day one and still is. That said, someone did do so

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-29 Thread Mark Miller
Those are paths? /data/solr off the root? When using the collections api, you really don't want to set an absolute data dir - it should be relative, I'd just take the default. Then, even though many shards shard that solrconfig and data dir, they will all find a nice home relative to the instan

Re: DocValues vs stored fields?

2013-03-29 Thread Otis Gospodnetic
Hi, The current field update mechanism is not really a field update mechanism. It just looks like that from the outside. DocValues should make true field updates implementable. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki wrote

Re: DocValues vs stored fields?

2013-03-29 Thread Marcin Rzewucki
Hi, Atomic updates (single field updates) do not depend on DocValues. They were implemented in Solr4.0 and works fine (but all fields have to be retrievable). DocValues are supposed to be more efficient than FieldCache. Why not enabled by default ? Maybe because they are not for all fields and beca

Query Elevation exception on shard queries

2013-03-29 Thread Ravi Solr
Hello, We have a Solr 3.6.2 multicore setup, where each core is a complete index for one application. In our site search we use sharded query to query two cores at a time. The issue is, If one core has docs but other core doesn't for an elevated query solr is throwing a 500 error. I woudl rea

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-29 Thread Chris R
So, upgraded to 4.2 this morning. I had gotten to the point where I okay with the collection creation process in 4.1 using the API vice the solr.xml file in 4.0, but now 4.2 doesn't seem to want to create the instanceDir? e.g. the Dashboard reports the following when my solr.data.dir is set to /d

Re: Synonyms problem

2013-03-29 Thread Plamen Mihaylov
Thank you a lot, Walter. I removed most of the filters and now it returns the same number of results. It looks simply this way:

RE: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Vaillancourt, Tim
Yes, I should have mentioned this is under 4.2 Solr. I sort of expected what I'm doing might be unsupported, but basically my concern is under the current SOLR design, any client with connectivity to SOLR's port can perform Admin-level API calls like create/drop Cores or Collections. I'm only

Solr metrics in Codahale metrics and Graphite?

2013-03-29 Thread Walter Underwood
What are folks using for this? wunder -- Walter Underwood wun...@wunderwood.org

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-29 Thread adityab
+1 I have observed this same issue no change on master and slave is bumped up with higher index number. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4052445.html Sent from the Solr - User mailing list archive

Add fuzzy to edismax specs?

2013-03-29 Thread Walter Underwood
I've implemented this for the second time, so it is probably time to contribute it. I find it really useful. I've extended the query spec parser for edismax to also accept a tilde and to generate a FuzzyQuery. I used this at Netflix (on 1.3 with dismax), and re-implemented it for 3.3 here at Ch

Re: Synonyms problem

2013-03-29 Thread Walter Underwood
There are several problems with this config. Indexing uses the phonetic filter, but query does not. This almost guarantees that nothing will match. Numbers could match, if the filter passes them. Query time has two stopword filters with different lists. Indexing only has one. This isn't fatal,

Re: Synonyms problem

2013-03-29 Thread Plamen Mihaylov
Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration:

Re: Synonyms problem

2013-03-29 Thread Steve Rowe
The XPath expressions used to collect the charFilter sequence, the tokenizer, and the token filter sequence are evaluated independently of each other - see line #244 through #251:

dataimport

2013-03-29 Thread A. Lotfi
Hi, When I hit Execute button in Query tab I only see : Last Update: 12:34:58 Indexing since 01s Requests: 1 (1/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s) Started: about an hour ago did not see  any green entry saying Indexing Completed.  Thanks

Re: Synonyms problem

2013-03-29 Thread Walter Underwood
Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: >

Re: Synonyms problem

2013-03-29 Thread Thomas Krämer | ontopica
Hi Plamen You should set expand to true during ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: > Hey guys, > > I have the following problem - I have a website with sport players, where > using Solr indexing their data. I have defined synonyms like: NY, New York. >

Re: DocValues vs stored fields?

2013-03-29 Thread Timothy Potter
Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docva

Synonyms problem

2013-03-29 Thread Plamen Mihaylov
Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and

Re: Cannot find word with accent

2013-03-29 Thread Jack Krupansky
The French Light Stemmer Filter is folding the accents: Try the Solr Admin UI Analysis page and you can see that the accents go away at the last step in analysis. This behavior is hardwired into the Lucene FrenchLightStemmer norm method. It would be nice if somebody added an attribute to di

Cannot find word with accent

2013-03-29 Thread Van Tassell, Kristian
I'm trying to find documents with this word: général It returns one hit for a document containing "General". If I search for g*ral I get 230 hits, of which some contain the word général. I'm not sure where to begin looking, I believe everything is encoded correctly. The text_fr (French) fieldT

Re: Parallel Indexing With Solr?

2013-03-29 Thread Furkan KAMACI
Can you tell more about "You can index from a MapReduce job "? I use nutch and it says Solr to index and reindex. I know that I can use Map Reduce jobs at nutch side however can I use Map Reduce jobs at Solr side (i.e for indexing etc.)? 2013/3/29 Otis Gospodnetic > Yes. You can index from

DocValues vs stored fields?

2013-03-29 Thread Jack Krupansky
I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great

Re: Parallel Indexing With Solr?

2013-03-29 Thread Otis Gospodnetic
Yes. You can index from any app that can hit SOlr with multiple threads. You can use StreamingUpdateSolrServer, at least in older Solrs, to handle multi-threading for you. You can index from a MapReduce job Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013

trying to index postgresql database using solrj

2013-03-29 Thread taniamm2002
I'm new to solr and my question may be easy but i can't understand why I've got table which I have already indexed in solr (so I've already have the fields of this table in the schema.xml). SO i added 2 new rows in my database and now I try to index again this table but this time from my java apl

Re: Too many fields to Sort in Solr

2013-03-29 Thread Joel Bernstein
OK, that makes sense. How are DocValues working for you? On Fri, Mar 29, 2013 at 9:02 AM, adityab wrote: > Hi Joel, > Might have an answer for this. Initially my servers were on 3.5 and then i > moved to Solr 4.0. at this time i use the solrconfig.xml that was in the > example and updated is wi

Re: Solr fuzzy search with WordDemiliterFilter

2013-03-29 Thread Jack Krupansky
The use of the fuzzy query operator will suppress the Word Delimiter Filter at query time. That's just the way it works. You can't use both fuzzy query and WDF when WDF is splitting apart words, numbers, and case changes, and throwing away special characters as well. To put it simply, at query

Re: SOAP for Solr indexing mechanism

2013-03-29 Thread Otis Gospodnetic
Nope. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 4:54 AM, Furkan KAMACI wrote: > Is there any support for communication over SOAP for Solr indexing > mechanism?

Solr fuzzy search with WordDemiliterFilter

2013-03-29 Thread ilay raja
Hi I need to apply fuzzy search for my production. It better the search results for spelling issue. However, it is not applying the analyzer filters configured in schema.xml I know fuzzy and wildcard search wont apply the filters. But is there a way to plugin the filters or write this logic at t

Re: Combining Solr Indexes at SolrCloud

2013-03-29 Thread Isaac Hebsh
Let's say you have machine A and machine B. you want to shutdown B. If all the shards on B have replicas (on A), you can shutdown B instantly. If there is a shard on B that has no replica, you should create one on machine A (using Core API), let it replicate the whole shard contents, and then you a

Suggestions for Customizing Solr Admin Page

2013-03-29 Thread Furkan KAMACI
I want to customize Solr Admin Page. I think that I will need more complicated things to manage my cloud. I will separate my Solr cluster into just indexing ones and just response ones. I will index my documents by categorical and I will index them at different collections. In my admin page I will

Re: Too many fields to Sort in Solr

2013-03-29 Thread adityab
Hi Joel, Might have an answer for this. Initially my servers were on 3.5 and then i moved to Solr 4.0. at this time i use the solrconfig.xml that was in the example and updated is with parameters i changed in 3.5 for the environment. there was no "" in the 4.0 example solrconfig.xml file. We conti

Realtime updates solrcloud

2013-03-29 Thread roySolr
Hello Guys, I want to use the realtime updates mechanism of solrcloud. My setup is as follow: 3 solr engines, 3 zookeeper instances(ensemble) The setup works great, recovery, leader election etc. The problem is the realtime updates, it's slow after the servers gets some traffic. I try to expla

Need Help in Patching OPENNLP

2013-03-29 Thread karthicrnair
Hi All, am very new to solr and Java technology. I would wonder if some one can gimme a way out to patch the OpenNLP platform with Solr. Am simply blocked out at the initial step, applying patch to Solr 4.2. Any pointer would be highly appreciated. Thanks, Karthic -- View this message in co

Re: solrj sample code for solrcloud

2013-03-29 Thread Erick Erickson
Here's some indexing code, should get you started... http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It's against 3.x as I remember, so there might be a bit of updating to do. Best Erick On Thu, Mar 28, 2013 at 2:49 AM, Jeong-dae Ha wrote: > Does anyone have solrj indexing and searchi

Re: Parallel Indexing With Solr?

2013-03-29 Thread Gora Mohanty
On 29 March 2013 14:56, Furkan KAMACI wrote: > Does Solr allows parallelism (parallel computing) for indexing? What do you mean by parallel computing in this context? Solr can use multiple threads for indexing if that is what you are asking. Regards, Gora

Parallel Indexing With Solr?

2013-03-29 Thread Furkan KAMACI
Does Solr allows parallelism (parallel computing) for indexing?

SOAP for Solr indexing mechanism

2013-03-29 Thread Furkan KAMACI
Is there any support for communication over SOAP for Solr indexing mechanism?

Combining Solr Indexes at SolrCloud

2013-03-29 Thread Furkan KAMACI
Let's assume that I have two machine in a SolrCloud that works as a part of cloud. If I want to shutdown one of them an combine its indexes into other how can I do that?

Re: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Isaac Hebsh
Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the "/up