RE: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Markus Jelsma
Hi - looks like Solr did not start up correctly, got some errors and kept Jetty running. You should find information in that node's logs. M. -Original message- > From:Andrej van der Zee > Sent: Thursday 17th December 2015 10:32 > To: solr-user@lucene.apache.org > Subject: Expected mim

propagate Query.rewrite call to super.rewrite after 5.4 upgrade

2015-12-17 Thread Markus Jelsma
Hi, Apologies for the cross post. We have a class overridding SpanPositionRangeQuery. It is similar to a SpanFirst query but it is capable of adjusting the boost value with regard to distance. With the 5.4 upgrade the unit tests suddenly threw the following exception: Query class org.GrSpanFir

RE: Load-balancing Solr instances

2015-12-18 Thread Markus Jelsma
Hello - a simple load balancer will do just fine. Or more sophisticated tools such as Varnish, HAProxy or Nginx, which we use. A hardware loadbalancer would obviously also do the job Markus -Original message- > From:Andrej van der Zee > Sent: Friday 18th December 2015 13:20 > To: sol

RE: How to use DocValues with TextField

2016-01-05 Thread Markus Jelsma
Hello - indeed, this is not going to work. But since you are using the token filter as some preprocessor, you could easily use an update request processor to do the preprocessing work for you. Check out the documentation, i think you can use the RegexReplaceProcessor. https://cwiki.apache.org/c

RE: Content translation using Solr

2016-01-25 Thread Markus Jelsma
Hi - Solr doesn't have any translation on board. But with Tika you could easily make a Solr UpdateProcessor and use Tika's o.a.t.language.translate package. Markus -Original message- > From:Shamir, Maya > Sent: Monday 25th January 2016 13:45 > To: solr-user@lucene.apache.org > Subject

RE: Content translation using Solr

2016-01-25 Thread Markus Jelsma
ing Solr > > Its sounds interesting, do you have an example? I found the documentation of > this package but didn’t find any example > Thanks, > Maya > > -----Original Message- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: Monday, January 25,

RE: When does Solr plan to update its embedded Apache Tika version?

2016-02-02 Thread Markus Jelsma
Hi - there is no open issue on upgrading Tika to 1.11, but you can always open one yourself. Markus -Original message- > From:Giovanni Usai > Sent: Tuesday 2nd February 2016 14:43 > To: solr-user@lucene.apache.org > Subject: When does Solr plan to update its embedded Apache Tika versio

RE: filters to work with dates

2016-02-02 Thread Markus Jelsma
Hello - i would opt for having a date field, and a custom update processor that converts a string date via DateUtils.parseDate() to an actual Date object. I think this would be a much simpler approach than a custom field or token filter. Markus -Original message- > From:Miguel Valencia

Custom JSON facet functions

2016-02-09 Thread Markus Jelsma
Hi - i must have missing something but is it possible to declare custom JSON facet functions in solrconfig.xml? Just like we would do with request handlers or search components? Thanks, Markus

RE: Custom JSON facet functions

2016-02-09 Thread Markus Jelsma
m JSON facet functions > > On Tue, Feb 9, 2016 at 7:10 AM, Markus Jelsma > wrote: > > Hi - i must have missing something but is it possible to declare custom > > JSON facet functions in solrconfig.xml? Just like we would do with request > > handlers or search components

Json faceting, aggregate numeric field by day?

2016-02-10 Thread Markus Jelsma
Hi - if we assume the following simple documents: 2015-01-01T00:00:00Z 2 2015-01-01T00:00:00Z 4 2015-01-02T00:00:00Z 3 2015-01-02T00:00:00Z 7 Can i get a daily average for the field 'value' by day? e.g. 3.0 5.0 Reading the documentation, i don't think i can, or i a

RE: Json faceting, aggregate numeric field by day?

2016-02-10 Thread Markus Jelsma
found the tickets :) Markus -Original message- > From:Tom Evans > Sent: Wednesday 10th February 2016 12:26 > To: solr-user@lucene.apache.org > Subject: Re: Json faceting, aggregate numeric field by day? > > On Wed, Feb 10, 2016 at 10:21 AM, Markus Jelsma > wrote:

ExactStatsCache not very exact

2016-02-10 Thread Markus Jelsma
Hi - i've noticed ExactStatsCache is not very exact on consecutive calls, see the following explains for the number one result: 70.76961 = sum of: 70.76961 = max plus 0.65 times others of: 70.76961 = weight(title_nl:contactformulier in 210879) [], result of: 70.76961 = score(doc=21087

RE: ExactStatsCache not very exact

2016-02-10 Thread Markus Jelsma
Well, what do we have here. I just saw a different docCount in the same result set for the same field. These two are the explains for the top two documents in the same result set: 1: 70.77082 = sum of: 70.77082 = max plus 0.65 times others of: 70.77082 = weight(title_nl:contactformulier i

RE: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Markus Jelsma
son faceting, aggregate numeric field by day? > > On Wed, Feb 10, 2016 at 5:21 AM, Markus Jelsma > wrote: > > Hi - if we assume the following simple documents: > > > > > > 2015-01-01T00:00:00Z > > 2 > > > > > > 2015-01-01T00:00:00Z >

RE: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Markus Jelsma
lr-user@lucene.apache.org > Subject: Re: Json faceting, aggregate numeric field by day? > > On Thu, Feb 11, 2016 at 10:04 AM, Markus Jelsma > wrote: > > Thanks. But this yields an error in FacetModule: > > > > java.lang.Clas

RE: Json faceting, aggregate numeric field by day?

2016-02-11 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-8673 Thanks again! Markus -Original message- > From:Yonik Seeley > Sent: Thursday 11th February 2016 17:12 > To: solr-user@lucene.apache.org > Subject: Re: Json faceting, aggregate numeric field by day? > > On Thu, Feb 11, 2016 at

RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Markus Jelsma
Nutch has Solr 5 cloud support in trunk, i committed it earlier this month. https://issues.apache.org/jira/browse/NUTCH-2197 Markus -Original message- > From:Emir Arnautovic > Sent: Tuesday 16th February 2016 16:26 > To: solr-user@lucene.apache.org > Subject: Re: Which open-source craw

RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Markus Jelsma
-source crawler to use with SolrJ and Postgresql ? > > Markus, > Ticket I run into is for Nutch2 and NUTCH-2197 is for Nutch1. > > Haven't been using Nutch for a while so cannot recommend version. > > Thanks, > Emir > > On 16.02.2016 16:37, Markus Jelsma wrot

RE: Solr and Nutch integration

2016-02-16 Thread Markus Jelsma
Hello Tom - Nutch 2.x has iirc old SolrServer client implemented. It should just send an HTTP request to a specified node. The Solr node will then forward it to a destination shard. In Nutch, you should set up indexer-solr as an indexing plugin in the plugin.includes configuration directive and

Frequent connection reset in AbstractFullDistribZkTestBase

2016-02-22 Thread Markus Jelsma
q

RE: Frequent connection reset in AbstractFullDistribZkTestBase

2016-02-22 Thread Markus Jelsma
Hi - we have quite some unit tests implementing AbstractFullDistribZkTestBase. Since the upgrade to 5.4.1 we frequently see tests failing due to connection reset problems. Is there an issue connected to this problem? Is there something else i can do? Thanks, Markus -Original message---

RE: /select changes between 4 and 5

2016-02-24 Thread Markus Jelsma
Re: POST in general still works for queries... I just verified it: This is not supposed to change i hope? We rely on POST for some huge automated queries. Instead of constantly increasing URL length limit, we rely on POST. Regards, Markus -Original message- > From:Yonik Seeley > Sent:

5.5.0 SOLR-8621 deprecation warnings without maxMergeDocs or mergeFactor

2016-02-24 Thread Markus Jelsma
Hi - i see lots of: o.a.s.c.Config Beginning with Solr 5.5, is deprecated, configure it on the relevant instead. On my development machine for all cores. None of the cores has either parameter configured. Is this expected? Thanks, Markus

RE: /select changes between 4 and 5

2016-02-24 Thread Markus Jelsma
Great! Thanks! -Original message- > From:Yonik Seeley > Sent: Wednesday 24th February 2016 18:04 > To: solr-user@lucene.apache.org > Subject: Re: /select changes between 4 and 5 > > On Wed, Feb 24, 2016 at 11:21 AM, Markus Jelsma > wrote: > > Re: POST

Solr 5.5.0, connection resets in abstract distributed test persist

2016-02-24 Thread Markus Jelsma
Hi, We got quite a few unit tests that inherit the abstract distributed test thing (haven't got hte FQCN around). On Solr 5.4.x we had a lot issues with connection reset, which i assumed, judging from resolved tickets, had been resolved with 5.5.0. Did i miss something? Can someone point me to

RE: Solr regex documenation

2016-02-29 Thread Markus Jelsma
Hi - do you enclose the regex in slashes? Do you url encode the + sign? Markus -Original message- > From:Anil > Sent: Monday 29th February 2016 7:45 > To: solr-user@lucene.apache.org > Subject: Re: Solr regex documenation > > HI , > > i am using [a-z]+works. i could not see networks

RE: Solr regex documenation

2016-02-29 Thread Markus Jelsma
Hmm, is the field indexed? A field:/[a-z]%2Bwork/ works fine overhere. Markus -Original message- > From:Anil > Sent: Monday 29th February 2016 13:24 > To: solr-user@lucene.apache.org > Subject: Re: Solr regex documenation > > Yes Markus. > > On 29 February 2016

RE: Solr regex documenation

2016-02-29 Thread Markus Jelsma
an see the results. > > But when I search on net[a-z]+ , i could not see juniper networks. i have > looked all the documents in the results, could not find it. > > Thank you, > Anil > > On 29 February 2016 at 18:42, Markus Jelsma > wrote: > > > Hmm, is the fie

RE: Solr regex documenation

2016-03-01 Thread Markus Jelsma
: Solr regex documenation > > Regex is working Markus. i need to investigate this particular pattern. > Thanks for you responses. > > On 29 February 2016 at 19:16, Markus Jelsma > wrote: > > > Hmm, if you have some stemming algorithm on that field, [a-z]+works is > >

RE: Override Default Similarity and SolrCloud

2016-03-03 Thread Markus Jelsma
We store them server/solr/lib/. -Original message- > From:Joshan Mahmud > Sent: Thursday 3rd March 2016 14:54 > To: solr-user@lucene.apache.org > Subject: Override Default Similarity and SolrCloud > > Hi group! > > I'm having an issue of deploying a custom jar in SolrCloud (v 5.3.0).

RE: Override Default Similarity and SolrCloud

2016-03-03 Thread Markus Jelsma
nd SolrCloud > > Thanks Markus - do you just SCP / copy them manually to your solr nodes and > not through Zookeeper (if you use that)? > > Josh > > On Thu, Mar 3, 2016 at 1:59 PM, Markus Jelsma > wrote: > > > We store them server/solr/lib/. > > > >

RE: Different scores depending on cloud node

2016-03-08 Thread Markus Jelsma
Hi - see inline. Markus -Original message- > From:Shawn Heisey > Sent: Tuesday 8th March 2016 15:11 > To: solr-user@lucene.apache.org > Subject: Re: Different scores depending on cloud node > > On 3/8/2016 6:56 AM, Robert Brown wrote: > > I have 2 shards, each with 1 replica. > > > > Whe

RE: Multiple custom Similarity implementations

2016-03-08 Thread Markus Jelsma
Hello, you can not change similarities per request, and this is likely never going to be supported for good reasons. You need multiple cores, or multiple fields with different similarity defined in the same core. Markus -Original message- > From:Parvesh Garg > Sent: Tuesday 8th March 2

RE: Disable hyper-threading for better Solr performance?

2016-03-09 Thread Markus Jelsma
Hi - i can't remember having seen any threads on this topic for the past seven years. Can you perform a controlled test with a lot of concurrent users. I would suspect nowadays HT would boost highly concurrent environments such a search engines. Markus -Original message- > From:Avn

RE: NoSuchFileException errors common on version 5.5.0

2016-03-11 Thread Markus Jelsma
Hm, it happens on one of our nodes quite frequently, but that was 5.4, maybe even 5.3. org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminOperation$4.call(CoreAdminOperation.java:192) at org.apache.solr.handler.admin.C

JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Markus Jelsma
Hello, Using SolrJ i built a method that consumes output produced by JSON facets, it also checks the count before further processing the output: 49 This is the code reading the count value via SolrJ: QueryResponse response = sourceClient.query(query); NamedList json

RE: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Markus Jelsma
in cloud and non-cloud > modes > > I have the same problem with a custom response writer. > > In production works but in my dev doesn't and are the same version 5.3.1 > > -- > Yago Riveiro > > On 22 Mar 2016 08:47 +, Markus Jelsma, wrote: > > Hello, >

RE: Deleted documents and expungeDeletes

2016-03-30 Thread Markus Jelsma
Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes to be reclaimed more often, e.g. weight of 4.0, you will see very frequent merging of large segments, killing performance if you are on spin

RE: using tokens to influence boost and score rather than filtering

2016-04-05 Thread Markus Jelsma
Hello - i would certainly go for edismax' boost parameter, as it multiplies scores. You can always do a regular boost query via {!boost ..} but edismax makes it much easier. Markus -Original message- > From:John Blythe > Sent: Tuesday 5th April 2016 15:36 > To: solr-user > Subject:

Shard ranges seem incorrect

2016-04-12 Thread Markus Jelsma
Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we noticed something odd, the hashing ranges don't make sense (full state.json below): shard1 Range: 8000-d554 shard2 Range: d555-2aa9 shard3 Range: 2aaa-7fff We've also noticed ranges not going from

CloudDescription sometimes null

2016-04-13 Thread Markus Jelsma
Hello - we use CloudDescriptor to get information about the collection. Very early after starting Solr, we obain an instance: cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor(); In some strange cases, at some later point cloudDescriptor is null? Is it possible cloudDescriptor is

RE: CloudDescription sometimes null

2016-04-13 Thread Markus Jelsma
until the cloudDescriptor has been built so I'd always expect > getCoreDescriptor() to return null but _not_ > getCOreDescriptor().getCloudDescriptor. > > So I'm really puzzled (or reading the code wrong). > > Erick > > On Wed, Apr 13, 2016 at 5:11 AM, Markus Jelsma > wrote

RE: Shard ranges seem incorrect

2016-04-14 Thread Markus Jelsma
Hi - bumping this issue. Any thoughts to share? Thanks, M -Original message- > From:Markus Jelsma > Sent: Tuesday 12th April 2016 13:49 > To: solr-user > Subject: Shard ranges seem incorrect > > Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we > noticed s

RE: Shard ranges seem incorrect

2016-04-15 Thread Markus Jelsma
Thanks both. I completely missed Shawn's response. -Original message- > From:Chris Hostetter > Sent: Thursday 14th April 2016 22:48 > To: solr-user@lucene.apache.org > Subject: RE: Shard ranges seem incorrect > > > : Hi - bumping this issue. Any thoughts to share? > > Shawn's resp

Set router.field in unit tests

2016-04-28 Thread Markus Jelsma
Hi - i'm working on a unit test that requires the cluster's router.field to be set to a field different than ID. But i can't find it?! How can i set router.field with AbstractFullDistribZkTestBase? Thanks! Markus

RE: Set router.field in unit tests

2016-04-29 Thread Markus Jelsma
Hi - any hints to share? Thanks! Markus -Original message- > From:Markus Jelsma > Sent: Thursday 28th April 2016 13:30 > To: solr-user > Subject: Set router.field in unit tests > > Hi - i'm working on a unit test that requires the cluster's router.field to > be set to a field diff

RE: Migrating from Solr 5.4 to Solr 6.0

2016-05-04 Thread Markus Jelsma
No, you don't need to reindex. M. -Original message- > From:Zheng Lin Edwin Yeo > Sent: Wednesday 4th May 2016 13:27 > To: solr-user@lucene.apache.org > Subject: Migrating from Solr 5.4 to Solr 6.0 > > Hi, > > Would like to find out, do we need to re-index our document when we migrate

Nodes appear twice in state.json

2016-05-04 Thread Markus Jelsma
Hi - we've just upgraded a development environment from 5.5 to Solr 6.0. After the upgrade, which went fine, we see two replica's appear twice in the cloud view (see below), both being leader. We've seen this happen before on some older 5.x versions. Is there a Jira issue i am missing? An unknow

RE: Nodes appear twice in state.json

2016-05-09 Thread Markus Jelsma
ug. Please open an issue. > > Please look up the core node name in core.properties for that particular > core and remove the other one from state.json manually. Probably best to do > a cluster restart to avoid surprises. This is certainly uncharted territory. > > On Wed, May 4, 201

RE: Programmatically find out if node is overseer

2015-07-21 Thread Markus Jelsma
Hello - this approach not only solves the problem but also allows me to run different processing threads on other nodes. Thanks! Markus -Original message- > From:Chris Hostetter > Sent: Saturday 18th July 2015 1:00 > To: solr-user > Subject: Re: Programmatically find out if node is ov

RE: References for Banana

2015-07-22 Thread Markus Jelsma
https://docs.lucidworks.com/display/SiLK/Dashboard+Configuration -Original message- > From:Vineeth Dasaraju > Sent: Wednesday 22nd July 2015 9:50 > To: solr-user@lucene.apache.org > Subject: References for Banana > > Hi, > > Could anyone please direct me towards any online resources

RE: Performance of facet contain search in 5.2.1

2015-07-22 Thread Markus Jelsma
Hello - why not index the facet field as n-grams? It blows up the index but is very fast! Markus -Original message- > From:Erick Erickson > Sent: Tuesday 21st July 2015 21:36 > To: solr-user@lucene.apache.org > Subject: Re: Performance of facet contain search in 5.2.1 > > "contains" has

RE: Optimizing Solr indexing over WAN

2015-07-22 Thread Markus Jelsma
Hello - Depening on size differences between source data and indexed data, you can gzip/bzip2 your source json/xml, then transfer it over WAN, and index it locally. This is the fastest method in every case we encountered. -Original message- > From:Reitzel, Charles > Sent: Wednesday 22n

RE: Performance of facet contain search in 5.2.1

2015-07-23 Thread Markus Jelsma
s is a X-Y problem. > > I think the user was trying to solve the infix autocomplete problem with > > faceting. > > > > We should get from him the initial problem to try to suggest a better > > solution. > > > > Cheers > > > > 2015-07-22 1

Per-document and per-query analysis

2015-07-23 Thread Markus Jelsma
Hello - the title says it all. When indexing a document, we need to run one or more additional filters depending on the value of a specific field. Likewise, we need to run that same filter over the already analyzed tokens when querying. This is not going to work if i extend TextField, at all. An

RE: Per-document and per-query analysis

2015-07-24 Thread Markus Jelsma
cuss better the requirements. > > Cheers > > 2015-07-23 17:03 GMT+01:00 Markus Jelsma : > > > Hello - the title says it all. When indexing a document, we need to run > > one or more additional filters depending on the value of a specific field. > > Likewise, we ne

High CPU DistributedQueue and OverseerAutoReplicaFailoverThread

2015-08-05 Thread Markus Jelsma
Hello  - we have a single Solr 5.2.1 node that (for now) contains four single shard collections. Only two collections actually contain data and are queried. The machine has some unusual latency that led me to sample the CPU time with VisualVM. On that node we see that DistributedQueue$LatchWacht

RE: High CPU DistributedQueue and OverseerAutoReplicaFailoverThread

2015-08-12 Thread Markus Jelsma
Hi - anyone to share some hints on the topic? Regards, M. -Original message- > From:Markus Jelsma > Sent: Wednesday 5th August 2015 12:13 > To: solr-user > Subject: High CPU DistributedQueue and OverseerAutoReplicaFailoverThread > > Hello  - we have a single Solr 5.2.1 node that (for

RE: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Markus Jelsma
Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up agai

RE: Custom merge logic in SolrCloud.

2015-09-01 Thread Markus Jelsma
Hello, i have had this issue as well. I patched QueryComponent and some other files that are used by QueryComponent so that it is finally possible to extend QueryComponent. https://issues.apache.org/jira/browse/SOLR-7968 -Original message- > From:Mohan gupta > Sent: Tuesday 1st Sept

RE: Merging documents from a distributed search

2015-09-03 Thread Markus Jelsma
Hello - We're doing something similar ended up overriding QueryComponent (https://issues.apache.org/jira/browse/SOLR-7968) which needs protected members instead of private members first. We could do a RankQuery and use its cool MergeStrategy, but we would also ened RankQuery to provide an entry

RE: Merging documents from a distributed search

2015-09-03 Thread Markus Jelsma
Hello - another current topic is also covering this issue, you may want to check it out: http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-td4226802.html -Original message- > From:Markus Jelsma > Sent: Thursday 3rd September 2015 10:27 > To: solr-user@luc

RE: Merging documents from a distributed search

2015-09-03 Thread Markus Jelsma
It seems so indeed. Please look up the thread titled "Custom merge logic in SolrCloud." -Original message- > From:tedsolr > Sent: Thursday 3rd September 2015 21:28 > To: solr-user@lucene.apache.org > Subject: RE: Merging documents from a distributed search > > Markus, did you

Trouble making tests with BaseDistributedSearchTestCase

2015-09-04 Thread Markus Jelsma
Hello - i am trying to create some tests using BaseDistributedSearchTestCase but two errors usually appear. Consider the following test: @Test @ShardsFixed(num = 3) public void test() throws Exception { del("*:*"); index(id, "1", "lang", "en", "text", "this is some text"); inde

RE: Trouble making tests with BaseDistributedSearchTestCase

2015-09-04 Thread Markus Jelsma
Strange enough, the following code gives different errors: assertQ( req("q", "*:*", "debug", "true", "indent", "true"), "//result/doc[1]/str[@name='id'][.='1']", "//result/doc[2]/str[@name='id'][.='2']", "//result/doc[3]/str[@name='id'][.='3']",

RE: Trouble making tests with BaseDistributedSearchTestCase

2015-09-08 Thread Markus Jelsma
Thanks! I went on using AbstractFullDistribZkTestBase and for some tests i circumvent the control core. I do sometimes get a recovery time out when starting up the tests. I have set the time out to 30 seconds, just like many other tests that extends AbstractFullDistribZkTestBase. Any thoughts o

RE: statsCache issue

2015-09-09 Thread Markus Jelsma
Hello - there are several issues with StatsCache < 5.3. If it is loaded, it won't work reliably. We are using it properly on 5.3. Statistics may be a bit off if you are using BM25 though. You should upgrade to 5.3. Markus -Original message- > From:Jae Joo > Sent: Wednesday 9th Septem

RE: Detect term occurrences

2015-09-10 Thread Markus Jelsma
If you are interested in just the number of occurences of an indexed term. The TermsComponent will give that answer. MArkus -Original message- > From:Francisco Andrés Fernández > Sent: Thursday 10th September 2015 15:58 > To: solr-user@lucene.apache.org > Subject: Detect term occurrenc

RE: How to know index file in OS Cache

2015-09-24 Thread Markus Jelsma
Hello - as far as i remember, you don't. A file itself is not the unit to cache, but blocks are. Markus -Original message- > From:Aman Tandon > Sent: Friday 25th September 2015 5:56 > To: solr-user@lucene.apache.org > Subject: How to know index file in OS Cache > > Hi, > > Is there

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Markus Jelsma
Hi - you need to use the CursorMark feature for larger sets: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results M. -Original message- > From:Ajinkya Kale > Sent: Monday 28th September 2015 20:46 > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org > Subj

Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Markus Jelsma
Hello, I have several implementations of AbstractFullDistribZkTestBase of Solr 5.3.0. Sometimes a test fails with either "There are still nodes recoverying - waited for 30 seconds" or "IOException occured when talking to server at: https://127.0.0.1:44474/collection1";, so usually at least one

RE: Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Markus Jelsma
fore you > index to them. > > Shot in the dark > Erick > > On Mon, Oct 5, 2015 at 1:36 AM, Markus Jelsma > wrote: > > Hello, > > > > I have several implementations of AbstractFullDistribZkTestBase of Solr > > 5.3.0. Sometimes a test fails with either &

RE: Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Markus Jelsma
r@lucene.apache.org > Subject: Re: Implementing AbstractFullDistribZkTestBase > > If it's always when using https as in your examples, perhaps it's SOLR-5776. > > - mark > > On Mon, Oct 5, 2015 at 10:36 AM Markus Jelsma > wrote: > > > Hmmm, i tried that ju

RE: Implementing AbstractFullDistribZkTestBase

2015-10-06 Thread Markus Jelsma
TestBase > > Not sure what that means :) > > SOLR-5776 would not happen all the time, but too frequently. It also > wouldn't matter the power of CPU, cores or RAM :) > > Do you see fails without https is what you want to check. > > - mark > > On Mon, Oct 5, 20

RE: which one is faster synonym_edismax & edismax faster?

2015-10-08 Thread Markus Jelsma
Hi - if you run a CPU sampler or profiler you will probably see it doesn't matter. Markus -Original message- > From:Aman Tandon > Sent: Friday 9th October 2015 6:52 > To: solr-user@lucene.apache.org > Subject: which one is faster synonym_edismax & edismax faster? > > Hi, > > Curren

NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-20 Thread Markus Jelsma
Hi - we have some code inside a unit test, extending AbstractFullDistribZkTestBase. I am indexing thousands of documents as part of the test to getCommonCloudSolrClient(); Somewhere down the line it trips over a document. I've debugged inspected the bas document but cannot find anything wrong w

RE: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Markus Jelsma
Hi - anyone here to shed some light on the issue? Markus -Original message- > From:Markus Jelsma > Sent: Tuesday 20th October 2015 13:39 > To: solr-user > Subject: NPE in CloudSolrClient via AbstractFullDistribZkTestBase > > Hi - we have some code inside a unit test, extending >

RE: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Markus Jelsma
where - are there no errors earlier on in the log? > > Alan Woodward > www.flax.co.uk > > > On 23 Oct 2015, at 12:44, Markus Jelsma wrote: > > > Hi - anyone here to shed some light on the issue? > > > > Markus > > > > > > > >

RE: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Markus Jelsma
the documents in order to route things correctly (UpdateRequest.java:204). > > Alan Woodward > www.flax.co.uk > > > On 23 Oct 2015, at 13:53, Markus Jelsma wrote: > > > Ah yes, i think i overlooked that one. Here it is: > > > > > type="or

RE: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Markus Jelsma
Actually it would probably be worth improving the error > reporting here to throw NPE when the documents are added to the UpdateRequest > in the first place - do you want to open a JIRA? > > Alan Woodward > www.flax.co.uk > > > On 23 Oct 2015, at 17:00, Markus Jelsma

RE: Solr collection alias - how rank is affected

2015-10-27 Thread Markus Jelsma
Hello - regarding fairly random/smooth distribution, you will notice it for sure. A solution there is to use distributed collection statistics. On top of that you might want to rely on docCount, not maxDoc inside your similarity implementation, because docCount should be identical in both collec

SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-29 Thread Markus Jelsma
Hello - we have some processes periodically sending documents to 5.3.0 in local mode using ConcurrentUpdateSolrClient 5.3.0, it has queueSize 10 and threadCount 4, just chosen arbitrarily having no idea what is right. Usually its a few thousand up to some tens of thousands of rather small docum

RE: Solr for Pictures

2015-10-29 Thread Markus Jelsma
Hi - Solr does integrate with Apache Tika, which happily accepts images and other media formats. I am not sure if EXIF is exposed though but you might want to try. Otherwise patch it up or use Tika in your own process that indexes data to Solr. https://cwiki.apache.org/confluence/display/solr

RE: Question on index time de-duplication

2015-10-30 Thread Markus Jelsma
Hello - keep in mind that both SignatureUpdateProcessorFactory and field collapsing do not work in distributed search unless you map identical signatures to identical shards. Markus -Original message- > From:Scott Stults > Sent: Friday 30th October 2015 11:58 > To: solr-user@lucene.apa

RE: SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-30 Thread Markus Jelsma
ameters are a bit of magic. You can have up to the number of threads > you specify sending their entire packet to solr in parallel, and up to > queueSize > requests. Note this is the _request_, not the docs in the list if I'm > reading the code > correctly. > > Be

RE: Performance testing on SOLR cloud

2015-11-17 Thread Markus Jelsma
Hi - we use the Siege load testing program. It can take a seed list of URL's, taken from actual user input, and can put load in parallel. It won't reuse common queries unless you prepare your seed list appropriately. If your setup achieves the goal your client anticipates, then you are fine. Sie

RE: EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Markus Jelsma
Hi - the usual suspect is: 'did you reindex?' Not seeing things change after modifying index-time analysis chains means you need to reindex. M. -Original message- > From:Daniel Valdivia > Sent: Wednesday 18th November 2015 0:17 > To: solr-user@lucene.apache.org > Subject: EdgeNGramF

RE: Boost non stemmed keywords (KStem filter)

2015-11-18 Thread Markus Jelsma
Hi - easiest approach is to use KeywordRepeatFilter and RemoveDuplicatesTokenFilter. This creates a slightly higher IDF for unstemmed words which might be just enough in your case. We found it not to be enough, so we also attach payloads to signify stemmed words amongst others. This allows you

RE: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Markus Jelsma
Hi - i sometimes see the too many searcher warning to since some 5.x version. The warning cloud has no autoCommit and there is only a single process ever sending a commit, only once every 10-15 minutes orso. The cores are quite small, commits finish quickly and new docs are quickly searchable. I

RE: Boost non stemmed keywords (KStem filter)

2015-11-19 Thread Markus Jelsma
Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 18. nov. 2015 kl. 22.24 skrev Markus Jelsma : > > > > Hi - easiest approach is to use KeywordRepeatFilter and > > RemoveDuplicatesTokenFilter. This creates a slightly higher IDF for > > u

RE: Spellcheck on first character

2015-11-27 Thread Markus Jelsma
Hi - this is default behaviour, see https://lucene.apache.org/core/4_1_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29 lucky for you it is configurable via Solr: http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/spelling/DirectSolrSpellChecker.h

RE: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Markus Jelsma
Hello - it looks like you have synonyms enabled at query time, which is fine, but also means TF*IDF stats are different for tbrush and toothbrush, causing this order to be the way it is. There is no solution available in Solr right now that would boost user-entered terms over expanded synonyms b

RE: Solr memory usage

2015-12-09 Thread Markus Jelsma
Steven - this fluctuation is normal, it is eating memory when documents are indexed or when searches are handled, this makes the meter go up. The garbage collector then frees the memory again. You can start to worry if there is a lot of activity but no fluctuation. M. -Original message---

RE: Solr Heap memory vs. OS memory

2015-12-09 Thread Markus Jelsma
Yes. This is still accurate, Lucene still relies on memory mapped files. And Solr usually doesn't require that much RAM, except if you have lots of massive cache entries. Markus -Original message- > From:Kelly, Frank > Sent: Wednesday 9th December 2015 16:19 > To: solr-user@lucene.apac

RE: similarity as a parameter

2015-12-15 Thread Markus Jelsma
Hello Dmitry - this is currently not possible. Quickest way is to reconfigure and reload the cores. Some similarities also require you to reindex, so it is a bad idea anyway. Markus -Original message- > From:Dmitry Kan > Sent: Tuesday 15th December 2015 16:02 > To: solr-user@lucene.ap

RE: similarity as a parameter

2015-12-15 Thread Markus Jelsma
Sweetspot does require reindexing but is that the only one? I have not investigated some exotic implementations, anyone to confirm sweetspot is the only one? In that case you could patch QueryComponent right, instead of having a custom component? M. -Original message- > From:Dmitry

RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
We have seen an increase between 4.8.1 and 4.10. -Original message- > From:Dmitry Kan > Sent: Tuesday 17th February 2015 11:06 > To: solr-user@lucene.apache.org > Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption > > Hi, > > We are currently comparing the RAM consumption of two

RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma > wrote: > > > We have seen an increase between 4.8.1 and 4.10. > > > > -Original message- > > > From:Dmitry Kan > > > Sent: Tuesday 17th February 2015 11:06 > > > To: solr-user@luce

Delimited payloads input issue

2015-02-27 Thread Markus Jelsma
Hi - we attempt to use payloads to identify different parts of extracted HTML pages and use the DelimitedPayloadTokenFilter to assign the correct payload to the tokens. However, we are having issues for some language analyzers and issues with some types of content for most regular analyzers. If

  1   2   3   4   5   6   7   8   9   10   >