On 3/26/2015 12:03 AM, Nitin Solanki wrote:
> Great thanks Shawn...
> As you said - **For 204GB of data per server, I recommend at least 128GB
> of total RAM,
> preferably 256GB**. Therefore, if I have 204GB of data on single
> server/shard then I prefer is 256GB by which searching will be fast an
Great thanks Shawn...
As you said - **For 204GB of data per server, I recommend at least 128GB
of total RAM,
preferably 256GB**. Therefore, if I have 204GB of data on single
server/shard then I prefer is 256GB by which searching will be fast and
never slow down. Is it?
On Wed, Mar 25, 2015 at 9:5
On Wed, Mar 25, 2015 at 9:24 PM, Shai Erera wrote:
> >
> > There's even a param onyIfDown=true which will remove a
> > replica only if it's already 'down'.
> >
>
> That will only work if the replica is in DOWN state correct? That is, if
> the Solr JVM was killed, and the replica stays in ACTIVE,
>
> There's even a param onyIfDown=true which will remove a
> replica only if it's already 'down'.
>
That will only work if the replica is in DOWN state correct? That is, if
the Solr JVM was killed, and the replica stays in ACTIVE, but its node is
not under /live_nodes, it won't get deleted? What
Thanks for letting us know the resolution, the problem was bugging me
Erick
On Wed, Mar 25, 2015 at 4:21 PM, Test Test wrote:
> Re,
> Finally, i think i found where this problem comes.I didn't use the right
> class extender, instead using Tokenizers, i'm using Token filter.
> Eric, thanks f
Erick,
Thanks for your help. I could fix the problem. I work in no SolrCloud mode.
Best Regards,
Ale
- Mensaje original -
De: "Erick Erickson"
Para: solr-user@lucene.apache.org
Enviados: Martes, 24 de Marzo 2015 10:14:22
Asunto: [MASSMAIL]Re: Issues to create new core
Tell us all the st
The issue we had with Java 8 was with DIH handler. We were using Rhino and
with the new implementation in Java 8, we had several Regex expression
issues...
We are almost ready to go now, since we moved away from Rhino and now use
Java.
Bill
On Wed, Mar 25, 2015 at 2:14 AM, Daniel Collins
wrote:
In Solr 5 (or 4), is there an easy way to retrieve the list of words to
highlight?
Use case: allow an external application to highlight the matching words
of a matching document, rather than using the highlighted snippets
returned by Solr.
Thanks,
Damien
Re,
Finally, i think i found where this problem comes.I didn't use the right class
extender, instead using Tokenizers, i'm using Token filter.
Eric, thanks for your replies.Regards.
Le Mercredi 25 mars 2015 23h55, Test Test a écrit :
Re,
I have tried to remove all the redundant jar
That should work. Check to be sure that you really are running Solr 5.0.
Was it an old version of trunk or the 5x branch before last August when the
terms query parser was added?
-- Jack Krupansky
On Tue, Mar 24, 2015 at 5:15 PM, Shamik Bandopadhyay
wrote:
> Hi,
>
> I'm trying to use Terms Qu
Thanks Erick for the helpful explanations.
thanks
sumit
From: Erick Erickson [erickerick...@gmail.com]
Sent: Monday, March 23, 2015 4:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Difference in indexing using config file vs client i.e SolrJ
1> Either
Re,
I have tried to remove all the redundant jar files.Then i've relaunched it but
it's blocked directly on the same issue.
It's very strange.
Regards,
Le Mercredi 25 mars 2015 23h31, Erick Erickson a
écrit :
Wait, you didn't put, say, lucene-core-4.10.2.jar into your
contrib/tamin
Martin:
Perhaps this would help
indexed=true, stored=true
field can be searched. The raw input (not analyzed in any way) can be
shown to the user in the results list.
indexed=true, stored=false
field can be searched. However, the field can't be returned in the
results list with the document.
ind
Wait, you didn't put, say, lucene-core-4.10.2.jar into your
contrib/tamingtext/dependency directory did you? That means you have
Lucene (and solr and solrj and ...) in your class path twice since
they're _already_ in your classpath by default since you're running
Solr.
All your jars should be in y
Hello Chris - i don't know that token filter you mention but i would like to
recommend Lucene's HyphenationCompoundWordTokenFilter. It works reasonably well
if you provide the hyphenation rules and a dictionary. It has some flaws such
as decompounding to irrelevant subwords, overlapping subwords
hi
I have field name GeoLocate with datatype as location. For some lat and long
it is giving me following error during indexing process
Can't parse point '139.9544301,35.4298081' because: Bad Y value 139.9544301
is not in boundary Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0)
Any idea wha
I agree the terminology is possibly a little confusing.
Stored refers to values that are stored verbatim. You can retrieve them
verbatim. Analysis does not affect stored values.
Indexed values are tokenized/transformed and stored inverted. You can't
recover the literal analyzed version (at least,
On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera wrote:
> Thanks.
>
> Does Solr ever clean up those states? I.e. does it ever remove "down"
> replicas, or replicas belonging to non-live_nodes after some time? Or will
> these remain in the cluster state forever (assuming they never come back
> up)?
>
Thanks a lot, Michael. See replies below.
> Am 25.03.2015 um 21:41 schrieb Michael Della Bitta
> :
>
> Two other things I noticed:
>
> 1. You probably don't want to store your copyFields. That's literally going
> to be the same information each time.
OK, got it. I have set the targets of the
Thanks a lot, Ahmet. I’ve just read up on this query field parameter and it
sounds good. Since the field contents are currently all identical, I can’t
really test it, yet.
Cheers,
Martin
> Am 25.03.2015 um 21:27 schrieb Ahmet Arslan :
>
> Hi Martin,
>
> fq means filter query. May be yo
Two other things I noticed:
1. You probably don't want to store your copyFields. That's literally going
to be the same information each time.
2. Your expectation "the pre-processed version of the text is added to the
index" may be incorrect. Anything done in sections
actually happens at query ti
Hi Martin,
fq means filter query. May be you want to use qf (query fields) parameter of
edismax?
On Wednesday, March 25, 2015 9:23 PM, Martin Wunderlich
wrote:
Hi all,
I am wondering what the process is for applying Tokenizers and Filter (as
defined in the FieldType definition) to field c
Thanks.
Does Solr ever clean up those states? I.e. does it ever remove "down"
replicas, or replicas belonging to non-live_nodes after some time? Or will
these remain in the cluster state forever (assuming they never come back
up)?
If they remain there, is there any penalty? E.g. Solr tries to sen
Hi all,
I am wondering what the process is for applying Tokenizers and Filter (as
defined in the FieldType definition) to field contents that result from
CopyFields. To be more specific, in my Solr instance, Iwould like to support
query expansion by two means: removing stop words and adding in
Re,
Sorry about the image.So, there are all my dependencies jar in listing below :
- commons-cli-2.0-mahout.jar
- commons-compress-1.9.jar
- commons-io-2.4.jar
- commons-logging-1.2.jar
- httpclient-4.4.jar
- httpcore-4.4.jar
- httpmime-4.4.jar
Re,
Sorry about the image.So, there are all my dependencies jar in listing below :-
commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar-
commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar-
httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar-
lucene-analyzers-commo
Hi,
I'm using a three level composite router in a solr cloud environment,
primarily for multi-tenant and field collapsing. The format is as follows.
*language!topic!url*.
An example would be :
ENU!12345!www.testurl.com/enu/doc1
GER!12345!www.testurl.com/ger/doc2
CHS!67890!www.testurl.com/chs
Comments inline:
On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera wrote:
> Hi
>
> Is it possible for a replica to be DOWN, while the node it resides on is
> under /live_nodes? If so, what can lead to it, aside from someone unloading
> a core.
>
Yes, aside from someone unloading the index, this can h
Just to give a specific answer to the original question, I would say that
dozens of cores (collections) is certainly fine (assuming the total data
load and query rate is reasonable), maybe 50 or even 100. Low hundreds of
cores/collections MAY work, but isn't advisable. Thousands, if it works at
all
Yeah, this is a head scratcher. But it _has_ to be that way for things
like edismax to work where you mix-and-match fielded and un-fielded
terms. I.e. I can have a query like "q=field1:whatever some more
stuff&qf=field2,field3,field4" where I want "whatever" to be evaluated
only against field1, but
bq: It does NOT optimize multiple replicas or shards in parallel.
This behavior was changed in 4.10 though, see:
https://issues.apache.org/jira/browse/SOLR-6264
So with 5.0 Pavel is seeing the result of that JIRA I bet.
I have to agree with Shawn, the optimization step should proceed
invisibly
Thanks for a quick response.
A bit confusing that analyzer of "query" type configured to use
KeywordTokenizerFactory does not un-tokenize query criteria.
I guess whitespace only the special case because it separates phrases in a
query and runs prior analyzing.
Actually I am handling a query the
Hello, Chris Morley here, of Wayfair.com. I am working on the German
compound-splitter by Dawid Weiss.
I tried to "upgrade" the words.fst file that comes with the German
compound-splitter using Solr 3.5, but it doesn't work. Below is the
IndexNotFoundException that I get.
cmorley@Caracal01:
On 3/25/2015 9:08 AM, pavelhladik wrote:
> Our data are changing frequently so that's why so many deletedDocs.
> Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm
> looking for best solution howto optimize this huge core without downtime. I
> know optimization working in
On 3/25/2015 9:26 AM, Matt Kuiper wrote:
> I am familiar with the JMX points that Solr exposes to allow for monitoring
> of statistics like QPS, numdocs, Average Query Time...
>
> I am wondering if there is a way to configure Solr to automatically store the
> value of these stats over time (for a
This is a _very_ common thing we all had to learn; what you're seeing
is the results of the _query parser_, not the analysis chain. Anything
like
proj_name_sort:term1 term2 gets split at the query parser level,
attaching &debug=query to the URL should show down in the "parsed
query" section somethi
Matt:
Not really. There's a bunch of third-party log analysis tools that
give much of this information (not everything exposed by JMX of course
is in the log files though).
Not quite sure whether things like Nagios, Zabbix and the like have
this kind of stuff built in seems like a natural extensi
On Wed, Mar 25, 2015 at 2:40 PM, Shawn Heisey wrote:
> I think you will only need to change the ownership of the solr home and
> the location where the .war file is extracted, which by default is
> server/solr-webapp. The user must be able to *read* the program data,
> but should not need to writ
Hello,
solr.KeywordTokenizerFactory seems splitting by whitespaces though according
SOLR documentation shouldn't do that.
For example I have the following configuration for the fields "proj_name" and
"proj_name_sort":
..
..
There
On 3/25/2015 8:42 AM, Nitin Solanki wrote:
> Server configuration:
> 8 CPUs.
> 32 GB RAM
> O.S. - Linux
> are running. Java heap set to 4096 MB in Solr. While indexing,
> *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD.
> Data Size:
> 102Gsolr/node1/solr/wikingram_shard1_r
That's a high number of deleted documents as a percentage of your
index! Or at least I find those numbers surprising. When segments are
merged in the background during normal indexing, quite a bit of weight
is given to segments that have a high percentage of deleted docs. I
usually see at most 10-2
Images don't come through the mailing list, can't see your image.
Whether or not all the jars in the directory you're working on are
consistent is the least of your problems. Are the libs to be found in any
_other_ place specified on your classpath?
Best,
Erick
On Wed, Mar 25, 2015 at 12:36 AM,
Hi,
You're right. Those sets are same each other, only documents order is different.
Koji
On 2015/03/26 0:53, innoculou wrote:
If I do an initial search without any field sorting; and then do the exact
same query but also sort one field will I get the same result set in the
subsequent query b
You're still mixing master/slave with SolrCloud. Do _not_ reconfigure
the replication. If you want your core (we call them replicas in
SolrCloud) to appear on various nodes in your cluster, either create
the collection with the nodes specified (createNodeSet) or, once the
collection is created on a
If I do an initial search without any field sorting; and then do the exact
same query but also sort one field will I get the same result set in the
subsequent query but sorted. In other words, does simply applying a sort
criteria affect the re-rank on the full search or does it just sort the
resul
Hi,
I am a .net developer, but i need to use solr and specifically this good
plugin "AutoPhrasingTokenFilter".
I searched everywhere and i couldn't get useful information, can any one
help me to run it in solr 5.0 or even previous versions. I am not able to
add it to my solr it is throwing below er
I have loved working on Solr, so thought of posting an Information
Retrieval/Text Mining requirement that we have for our GE Data Mining
Research Labs @ Bangalore. Apologies if it is considered inappropriate here.
Here goes the Job Description for those interested:
If Information Retrieval, T
On 25/03/15 15:03, Ian Rose wrote:
Per - Wow, 1 trillion documents stored is pretty impressive. One
clarification: when you say that you have 2 replica per collection on each
machine, what exactly does that mean? Do you mean that each collection is
sharded into 50 shards, divided evenly over al
Hi
Is it possible for a replica to be DOWN, while the node it resides on is
under /live_nodes? If so, what can lead to it, aside from someone unloading
a core.
I don't know if each SolrCore reports status to ZK independently, or it's
done by the Solr process as a whole.
Also, is it possible for
Hello,
I am familiar with the JMX points that Solr exposes to allow for monitoring of
statistics like QPS, numdocs, Average Query Time...
I am wondering if there is a way to configure Solr to automatically store the
value of these stats over time (for a given time interval), and then allow a
u
Hi,
I didn't find the answer yet, please help. We have standalone Solr 5.0.0
with a few cores yet. One of those cores contains:
numDocs:120M
deletedDocs:110M
Our data are changing frequently so that's why so many deletedDocs.
Optimized core takes around 50GB on disk, we are now almost on 100GB a
Hi Shawn,
Sorry for all the things.
Server configuration:
8 CPUs.
32 GB RAM
O.S. - Linux
*Earlier*, I was using 8 shards without replica(default is 1) using SOLR
CLOUD. On server, Only Solr is running. There is no other application which
are running. Java heap set to 4096 MB in Solr
On 3/25/2015 5:49 AM, Tom Evans wrote:
> On Tue, Mar 24, 2015 at 4:00 PM, Tom Evans wrote:
>> Hi all
>>
>> We're migrating to SOLR 5 (from 4.8), and our infrastructure guys
>> would prefer we installed SOLR from an RPM rather than extracting the
>> tarball where we need it. They are creating the R
Per - Wow, 1 trillion documents stored is pretty impressive. One
clarification: when you say that you have 2 replica per collection on each
machine, what exactly does that mean? Do you mean that each collection is
sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards
per mach
On 3/25/2015 5:03 AM, Nitin Solanki wrote:
> Please can anyone assist me? I am indexing on single shard it
> is taking too much of time to index data. And I am indexing around 49GB of
> data on single shard. What's wrong? Why solr is taking too much time to
> index data?
> Earlier I was
Hello,
* Updating my question again.*
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was in
On Tue, Mar 24, 2015 at 4:00 PM, Tom Evans wrote:
> Hi all
>
> We're migrating to SOLR 5 (from 4.8), and our infrastructure guys
> would prefer we installed SOLR from an RPM rather than extracting the
> tarball where we need it. They are creating the RPM file themselves,
> and it installs an init.
Hello,
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was indexing same data on 8 shards. That time, it
In one of our production environments we use 32GB, 4-core, 3T RAID0
spinning disk Dell servers (do not remember the exact model). We have
about 25 collections with 2 replica (shard-instances) per collection on
each machine - 25 machines. Total of 25 coll * 2 replica/coll/machine *
25 machines =
Interesting none the less Shawn :)
We use G1GC on our servers, we were on Java 7 (64-bit, RHEL6), but are
trying to migrate to Java 8 (which seems to cause more GC issues, so we
clearly need to tweak our settings), will investigate 8u40 though.
On 25 March 2015 at 04:23, Shawn Heisey wrote:
> O
On Wed, 2015-03-25 at 03:46 +0100, Ian Rose wrote:
> Thus theoretically we could actually just use one single collection for
>all of our customers (adding a 'customer:' type fq to all
> queries) but since we never need to query across customers it seemed
> more performant (as well as safer - less c
Thanks Eric,
I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible
with this version.
I can't figure out which one make this issue.
ThanksRegards,
Le Mardi 24 mars 2015 23h45, Erick Erickson a
écrit :
bq: 13 moreCaused by: java.lang.ClassCastException: c
I've tried (very simplistically) hitting a collection with a good variety
of searches and looking at the collection's heap memory and working out the
bytes / doc. I've seen results around 100 bytes / doc, and as low as 3
bytes / doc for collections with small docs. It's still a work-in-progress
- n
63 matches
Mail list logo