date:20150325

Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Shawn Heisey

On 3/26/2015 12:03 AM, Nitin Solanki wrote: > Great thanks Shawn... > As you said - **For 204GB of data per server, I recommend at least 128GB > of total RAM, > preferably 256GB**. Therefore, if I have 204GB of data on single > server/shard then I prefer is 256GB by which searching will be fast an

Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki

Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? On Wed, Mar 25, 2015 at 9:5

Re: Replica and node states

2015-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2015 at 9:24 PM, Shai Erera wrote: > > > > There's even a param onyIfDown=true which will remove a > > replica only if it's already 'down'. > > > > That will only work if the replica is in DOWN state correct? That is, if > the Solr JVM was killed, and the replica stays in ACTIVE,

Re: Replica and node states

2015-03-25 Thread Shai Erera

> > There's even a param onyIfDown=true which will remove a > replica only if it's already 'down'. > That will only work if the replica is in DOWN state correct? That is, if the Solr JVM was killed, and the replica stays in ACTIVE, but its node is not under /live_nodes, it won't get deleted? What

Re: Custom TokenFilter

2015-03-25 Thread Erick Erickson

Thanks for letting us know the resolution, the problem was bugging me Erick On Wed, Mar 25, 2015 at 4:21 PM, Test Test wrote: > Re, > Finally, i think i found where this problem comes.I didn't use the right > class extender, instead using Tokenizers, i'm using Token filter. > Eric, thanks f

Re: [MASSMAIL]Re: Issues to create new core

2015-03-25 Thread Alejandro Jesus Mariño Molerio

Erick, Thanks for your help. I could fix the problem. I work in no SolrCloud mode. Best Regards, Ale - Mensaje original - De: "Erick Erickson" Para: solr-user@lucene.apache.org Enviados: Martes, 24 de Marzo 2015 10:14:22 Asunto: [MASSMAIL]Re: Issues to create new core Tell us all the st

Re: Using G1 with Apache Solr

2015-03-25 Thread William Bell

The issue we had with Java 8 was with DIH handler. We were using Rhino and with the new implementation in Java 8, we had several Regex expression issues... We are almost ready to go now, since we moved away from Rhino and now use Java. Bill On Wed, Mar 25, 2015 at 2:14 AM, Daniel Collins wrote:

Retrieving list of words for highlighting

2015-03-25 Thread Damien Dykman

In Solr 5 (or 4), is there an easy way to retrieve the list of words to highlight? Use case: allow an external application to highlight the matching words of a matching document, rather than using the highlighted snippets returned by Solr. Thanks, Damien

Re: Custom TokenFilter

2015-03-25 Thread Test Test

Re, Finally, i think i found where this problem comes.I didn't use the right class extender, instead using Tokenizers, i'm using Token filter. Eric, thanks for your replies.Regards. Le Mercredi 25 mars 2015 23h55, Test Test a écrit : Re, I have tried to remove all the redundant jar

Re: Problem with Terms Query Parser

2015-03-25 Thread Jack Krupansky

That should work. Check to be sure that you really are running Solr 5.0. Was it an old version of trunk or the 5x branch before last August when the terms query parser was added? -- Jack Krupansky On Tue, Mar 24, 2015 at 5:15 PM, Shamik Bandopadhyay wrote: > Hi, > > I'm trying to use Terms Qu

RE: Difference in indexing using config file vs client i.e SolrJ

2015-03-25 Thread Purohit, Sumit

Thanks Erick for the helpful explanations. thanks sumit From: Erick Erickson [erickerick...@gmail.com] Sent: Monday, March 23, 2015 4:58 PM To: solr-user@lucene.apache.org Subject: Re: Difference in indexing using config file vs client i.e SolrJ 1> Either

Re: Custom TokenFilter

2015-03-25 Thread Test Test

Re, I have tried to remove all the redundant jar files.Then i've relaunched it but it's blocked directly on the same issue. It's very strange. Regards, Le Mercredi 25 mars 2015 23h31, Erick Erickson a écrit : Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamin

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Erick Erickson

Martin: Perhaps this would help indexed=true, stored=true field can be searched. The raw input (not analyzed in any way) can be shown to the user in the results list. indexed=true, stored=false field can be searched. However, the field can't be returned in the results list with the document. ind

Re: Custom TokenFilter

2015-03-25 Thread Erick Erickson

Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamingtext/dependency directory did you? That means you have Lucene (and solr and solrj and ...) in your class path twice since they're _already_ in your classpath by default since you're running Solr. All your jars should be in y

RE: German Compound Splitter words.fst causing problems.

2015-03-25 Thread Markus Jelsma

Hello Chris - i don't know that token filter you mention but i would like to recommend Lucene's HyphenationCompoundWordTokenFilter. It works reasonably well if you provide the hyphenation rules and a dictionary. It has some flaws such as decompounding to irrelevant subwords, overlapping subwords

location field giving error for lat long

2015-03-25 Thread abhayd

hi I have field name GeoLocate with datatype as location. For some lat and long it is giving me following error during indexing process Can't parse point '139.9544301,35.4298081' because: Bad Y value 139.9544301 is not in boundary Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0) Any idea wha

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Michael Della Bitta

I agree the terminology is possibly a little confusing. Stored refers to values that are stored verbatim. You can retrieve them verbatim. Analysis does not affect stored values. Indexed values are tokenized/transformed and stored inverted. You can't recover the literal analyzed version (at least,

Re: Replica and node states

2015-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera wrote: > Thanks. > > Does Solr ever clean up those states? I.e. does it ever remove "down" > replicas, or replicas belonging to non-live_nodes after some time? Or will > these remain in the cluster state forever (assuming they never come back > up)? >

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Martin Wunderlich

Thanks a lot, Michael. See replies below. > Am 25.03.2015 um 21:41 schrieb Michael Della Bitta > : > > Two other things I noticed: > > 1. You probably don't want to store your copyFields. That's literally going > to be the same information each time. OK, got it. I have set the targets of the

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Martin Wunderlich

Thanks a lot, Ahmet. I’ve just read up on this query field parameter and it sounds good. Since the field contents are currently all identical, I can’t really test it, yet. Cheers, Martin > Am 25.03.2015 um 21:27 schrieb Ahmet Arslan : > > Hi Martin, > > fq means filter query. May be yo

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Michael Della Bitta

Two other things I noticed: 1. You probably don't want to store your copyFields. That's literally going to be the same information each time. 2. Your expectation "the pre-processed version of the text is added to the index" may be incorrect. Anything done in sections actually happens at query ti

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Ahmet Arslan

Hi Martin, fq means filter query. May be you want to use qf (query fields) parameter of edismax? On Wednesday, March 25, 2015 9:23 PM, Martin Wunderlich wrote: Hi all, I am wondering what the process is for applying Tokenizers and Filter (as defined in the FieldType definition) to field c

Re: Replica and node states

2015-03-25 Thread Shai Erera

Thanks. Does Solr ever clean up those states? I.e. does it ever remove "down" replicas, or replicas belonging to non-live_nodes after some time? Or will these remain in the cluster state forever (assuming they never come back up)? If they remain there, is there any penalty? E.g. Solr tries to sen

Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Martin Wunderlich

Hi all, I am wondering what the process is for applying Tokenizers and Filter (as defined in the FieldType definition) to field contents that result from CopyFields. To be more specific, in my Solr instance, Iwould like to support query expansion by two means: removing stop words and adding in

Re: Custom TokenFilter

2015-03-25 Thread Test Test

Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar

Re: Custom TokenFilter

2015-03-25 Thread Test Test

Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-commo

Uneven data distribution with composite router

2015-03-25 Thread Shamik Bandopadhyay

Hi, I'm using a three level composite router in a solr cloud environment, primarily for multi-tenant and field collapsing. The format is as follows. *language!topic!url*. An example would be : ENU!12345!www.testurl.com/enu/doc1 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs

Re: Replica and node states

2015-03-25 Thread Shalin Shekhar Mangar

Comments inline: On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera wrote: > Hi > > Is it possible for a replica to be DOWN, while the node it resides on is > under /live_nodes? If so, what can lead to it, aside from someone unloading > a core. > Yes, aside from someone unloading the index, this can h

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Jack Krupansky

Just to give a specific answer to the original question, I would say that dozens of cores (collections) is certainly fine (assuming the total data load and query rate is reasonable), maybe 50 or even 100. Low hundreds of cores/collections MAY work, but isn't advisable. Thousands, if it works at all

Re: [MARKETING] Re: KeywordTokenizerFactory splits by whitespaces

2015-03-25 Thread Erick Erickson

Yeah, this is a head scratcher. But it _has_ to be that way for things like edismax to work where you mix-and-match fielded and un-fielded terms. I.e. I can have a query like "q=field1:whatever some more stuff&qf=field2,field3,field4" where I want "whatever" to be evaluated only against field1, but

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Erick Erickson

bq: It does NOT optimize multiple replicas or shards in parallel. This behavior was changed in 4.10 though, see: https://issues.apache.org/jira/browse/SOLR-6264 So with 5.0 Pavel is seeing the result of that JIRA I bet. I have to agree with Shawn, the optimization step should proceed invisibly

RE: [MARKETING] Re: KeywordTokenizerFactory splits by whitespaces

2015-03-25 Thread Vadim Gorlovetsky

Thanks for a quick response. A bit confusing that analyzer of "query" type configured to use KeywordTokenizerFactory does not un-tokenize query criteria. I guess whitespace only the special case because it separates phrases in a query and runs prior analyzing. Actually I am handling a query the

German Compound Splitter words.fst causing problems.

2015-03-25 Thread Chris Morley

Hello, Chris Morley here, of Wayfair.com. I am working on the German compound-splitter by Dawid Weiss. I tried to "upgrade" the words.fst file that comes with the German compound-splitter using Solr 3.5, but it doesn't work. Below is the IndexNotFoundException that I get. cmorley@Caracal01:

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Shawn Heisey

On 3/25/2015 9:08 AM, pavelhladik wrote: > Our data are changing frequently so that's why so many deletedDocs. > Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm > looking for best solution howto optimize this huge core without downtime. I > know optimization working in

Re: Solr Monitoring - Stored Stats?

2015-03-25 Thread Shawn Heisey

On 3/25/2015 9:26 AM, Matt Kuiper wrote: > I am familiar with the JMX points that Solr exposes to allow for monitoring > of statistics like QPS, numdocs, Average Query Time... > > I am wondering if there is a way to configure Solr to automatically store the > value of these stats over time (for a

Re: KeywordTokenizerFactory splits by whitespaces

2015-03-25 Thread Erick Erickson

This is a _very_ common thing we all had to learn; what you're seeing is the results of the _query parser_, not the analysis chain. Anything like proj_name_sort:term1 term2 gets split at the query parser level, attaching &debug=query to the URL should show down in the "parsed query" section somethi

Re: Solr Monitoring - Stored Stats?

2015-03-25 Thread Erick Erickson

Matt: Not really. There's a bunch of third-party log analysis tools that give much of this information (not everything exposed by JMX of course is in the log files though). Not quite sure whether things like Nagios, Zabbix and the like have this kind of stuff built in seems like a natural extensi

Re: Setting up SOLR 5 from an RPM

2015-03-25 Thread Tom Evans

On Wed, Mar 25, 2015 at 2:40 PM, Shawn Heisey wrote: > I think you will only need to change the ownership of the solr home and > the location where the .war file is extracted, which by default is > server/solr-webapp. The user must be able to *read* the program data, > but should not need to writ

KeywordTokenizerFactory splits by whitespaces

2015-03-25 Thread Vadim Gorlovetsky

Hello, solr.KeywordTokenizerFactory seems splitting by whitespaces though according SOLR documentation shouldn't do that. For example I have the following configuration for the fields "proj_name" and "proj_name_sort": .. .. There

Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Shawn Heisey

On 3/25/2015 8:42 AM, Nitin Solanki wrote: > Server configuration: > 8 CPUs. > 32 GB RAM > O.S. - Linux > are running. Java heap set to 4096 MB in Solr. While indexing, > *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD. > Data Size: > 102Gsolr/node1/solr/wikingram_shard1_r

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Erick Erickson

That's a high number of deleted documents as a percentage of your index! Or at least I find those numbers surprising. When segments are merged in the background during normal indexing, quite a bit of weight is given to segments that have a high percentage of deleted docs. I usually see at most 10-2

Re: Custom TokenFilter

2015-03-25 Thread Erick Erickson

Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM,

Re: Sorting and Rerank

2015-03-25 Thread Koji Sekiguchi

Hi, You're right. Those sets are same each other, only documents order is different. Koji On 2015/03/26 0:53, innoculou wrote: If I do an initial search without any field sorting; and then do the exact same query but also sort one field will I get the same result set in the subsequent query b

Re: Unable to setup solr cloud with multiple collections.

2015-03-25 Thread Erick Erickson

You're still mixing master/slave with SolrCloud. Do _not_ reconfigure the replication. If you want your core (we call them replicas in SolrCloud) to appear on various nodes in your cluster, either create the collection with the nodes specified (createNodeSet) or, once the collection is created on a

Sorting and Rerank

2015-03-25 Thread innoculou

If I do an initial search without any field sorting; and then do the exact same query but also sort one field will I get the same result set in the subsequent query but sorted. In other words, does simply applying a sort criteria affect the re-rank on the full search or does it just sort the resul

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2015-03-25 Thread afrooz

Hi, I am a .net developer, but i need to use solr and specifically this good plugin "AutoPhrasingTokenFilter". I searched everywhere and i couldn't get useful information, can any one help me to run it in solr 5.0 or even previous versions. I am not able to add it to my solr it is throwing below er

Information Retrieval/Text Mining opportunity @ GE Research Data Mining Labs, Bangalore

2015-03-25 Thread Yavar Husain

I have loved working on Solr, so thought of posting an Information Retrieval/Text Mining requirement that we have for our GE Data Mining Research Labs @ Bangalore. Apologies if it is considered inappropriate here. Here goes the Job Description for those interested: If Information Retrieval, T

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Per Steffensen

On 25/03/15 15:03, Ian Rose wrote: Per - Wow, 1 trillion documents stored is pretty impressive. One clarification: when you say that you have 2 replica per collection on each machine, what exactly does that mean? Do you mean that each collection is sharded into 50 shards, divided evenly over al

Replica and node states

2015-03-25 Thread Shai Erera

Hi Is it possible for a replica to be DOWN, while the node it resides on is under /live_nodes? If so, what can lead to it, aside from someone unloading a core. I don't know if each SolrCore reports status to ZK independently, or it's done by the Solr process as a whole. Also, is it possible for

Solr Monitoring - Stored Stats?

2015-03-25 Thread Matt Kuiper

Hello, I am familiar with the JMX points that Solr exposes to allow for monitoring of statistics like QPS, numdocs, Average Query Time... I am wondering if there is a way to configure Solr to automatically store the value of these stats over time (for a given time interval), and then allow a u

Optimize SolrCloud without downtime

2015-03-25 Thread pavelhladik

Hi, I didn't find the answer yet, please help. We have standalone Solr 5.0.0 with a few cores yet. One of those cores contains: numDocs:120M deletedDocs:110M Our data are changing frequently so that's why so many deletedDocs. Optimized core takes around 50GB on disk, we are now almost on 100GB a

Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki

Hi Shawn, Sorry for all the things. Server configuration: 8 CPUs. 32 GB RAM O.S. - Linux *Earlier*, I was using 8 shards without replica(default is 1) using SOLR CLOUD. On server, Only Solr is running. There is no other application which are running. Java heap set to 4096 MB in Solr

Re: Setting up SOLR 5 from an RPM

2015-03-25 Thread Shawn Heisey

On 3/25/2015 5:49 AM, Tom Evans wrote: > On Tue, Mar 24, 2015 at 4:00 PM, Tom Evans wrote: >> Hi all >> >> We're migrating to SOLR 5 (from 4.8), and our infrastructure guys >> would prefer we installed SOLR from an RPM rather than extracting the >> tarball where we need it. They are creating the R

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Ian Rose

Per - Wow, 1 trillion documents stored is pretty impressive. One clarification: when you say that you have 2 replica per collection on each machine, what exactly does that mean? Do you mean that each collection is sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards per mach

Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Shawn Heisey

On 3/25/2015 5:03 AM, Nitin Solanki wrote: > Please can anyone assist me? I am indexing on single shard it > is taking too much of time to index data. And I am indexing around 49GB of > data on single shard. What's wrong? Why solr is taking too much time to > index data? > Earlier I was

Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki

Hello, * Updating my question again.* Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was in

Re: Setting up SOLR 5 from an RPM

2015-03-25 Thread Tom Evans

On Tue, Mar 24, 2015 at 4:00 PM, Tom Evans wrote: > Hi all > > We're migrating to SOLR 5 (from 4.8), and our infrastructure guys > would prefer we installed SOLR from an RPM rather than extracting the > tarball where we need it. They are creating the RPM file themselves, > and it installs an init.

Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki

Hello, Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Per Steffensen

In one of our production environments we use 32GB, 4-core, 3T RAID0 spinning disk Dell servers (do not remember the exact model). We have about 25 collections with 2 replica (shard-instances) per collection on each machine - 25 machines. Total of 25 coll * 2 replica/coll/machine * 25 machines =

Re: Using G1 with Apache Solr

2015-03-25 Thread Daniel Collins

Interesting none the less Shawn :) We use G1GC on our servers, we were on Java 7 (64-bit, RHEL6), but are trying to migrate to Java 8 (which seems to cause more GC issues, so we clearly need to tweak our settings), will investigate 8u40 though. On 25 March 2015 at 04:23, Shawn Heisey wrote: > O

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Toke Eskildsen

On Wed, 2015-03-25 at 03:46 +0100, Ian Rose wrote: > Thus theoretically we could actually just use one single collection for >all of our customers (adding a 'customer:' type fq to all > queries) but since we never need to query across customers it seemed > more performant (as well as safer - less c

Re: Custom TokenFilter

2015-03-25 Thread Test Test

Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. I can't figure out which one make this issue. ThanksRegards, Le Mardi 24 mars 2015 23h45, Erick Erickson a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: c

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Damien Kamerman

I've tried (very simplistically) hitting a collection with a good variety of searches and looking at the collection's heap memory and working out the bytes / doc. I've seen results around 100 bytes / doc, and as low as 3 bytes / doc for collections with small docs. It's still a work-in-progress - n

63 matches

Mail list logo