BTW: it's a 4 shards solorcloud cluster using zookeeper 3.3.5
On Fri, Nov 22, 2013 at 11:07 AM, Mingfeng Yang wrote:
> Recently, I found out that I can't delete doc by id or overwrite a doc
> from/in my SOLR index which is based on SOLR 4.4.0 version.
>
> S
Recently, I found out that I can't delete doc by id or overwrite a doc
from/in my SOLR index which is based on SOLR 4.4.0 version.
Say, I have a doc http://pastebin.com/GqPP4Uw4 (to make it easier to
view, I use pastebin here). And I tried to add a dynamic field "rank_ti"
to it, want to make
above) try the new parameter facet.threads with a
> > reasonable value (4 to 8 gave me a massive performance speedup when
> > working with large facets, i.e. nTerms >> 10^7).
> >
> > -Sascha
> >
> >
> > Mingfeng Yang wrote:
> > > I h
I have an index with 170M documents, and two of the fields for each doc is
"source" and "url". And I want to know the top 500 most frequent urls from
Video source.
So I did a facet with
"fq=source:Video&facet=true&facet.field=url&facet.limit=500", and the
matching documents are about 9 millions.
t;!geofilt sfield=author_geo"
> Clearly wrong. Try escaping the braces with URL percent escapes, etc.
>
> ~ David
>
>
> Mingfeng Yang wrote
> > My solr index has a field called "author_geo" which contains the author's
> > location, and when I am trying t
BTW: my schema.xml contains the following related lines.
On Mon, Aug 19, 2013 at 2:02 PM, Mingfeng Yang wrote:
> My solr index has a field called "author_geo" which contains the author's
> location, and when I am trying to get all docs whose author are within 10
> k
My solr index has a field called "author_geo" which contains the author's
location, and when I am trying to get all docs whose author are within 10
km of 35.0,35.0 using the following query.
curl '
http://localhost/solr/select?q=*:*&fq={!geofilt%20sfield=author_geo}&pt=35.0,35.0&d=10&wt=json&inden
Figured out. use author_geo:[* TO *] will do the trick.
On Thu, Aug 15, 2013 at 1:26 PM, Mingfeng Yang wrote:
> I have a schema with a geolocation field named "author_geo" defined as
>
> stored="true" />
>
> How can I list docs whose author_geo fields
I have a schema with a geolocation field named "author_geo" defined as
How can I list docs whose author_geo fields are not empty?
Seems filter query "fq=author_geo:*" does not work like other fields which
are string or text or float type.
curl
'localhost/solr/select?q=*:*&rows=10&wt=json&inde
I am trying to upgrade solr to 4.4 version, and looks like solr cann't load
the ShingleFilterFactory class.
417 [coreLoadExecutor-4-thread-1] ERROR org.apache.solr.core.CoreContainer
– Unable to create core: collection1
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
fi
;
>
>types="at-under-alpha.txt"/>
>
>
>
> The file +at-under-alpha.txt+ would contain:
>
> @ => ALPHA
> _ => ALPHA
>
> The analysis results:
>
> Source: Hello @World_bar, r@end.
>Tokens: 1: Hello 2
We need to index and search lots of tweets which can like "@solr: solr is
great". or "@solr_lucene, good combination".
And we want to search with "@solr" or "@solr_lucene". How can we preserve
"@" and "_" in the index?
If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
How is daynamic field in solr implemented? Does it get saved into the same
Document as other regular fields in lucene index?
Ming-
(num,
DateTools.Resolution.SECOND));
Then you get dt as a string in the right format.
Ming-
On Fri, Jun 14, 2013 at 4:20 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:
> Use EmbeddedSolrServer rather than Lucene directly.
> On Jun 14, 2013 6:47 PM, "Ming
.
> On Jun 14, 2013 6:05 PM, "Mingfeng Yang" wrote:
>
> > Michael,
> >
> > That's what I thought as well. I would assume an optimization of the
> index
> > would rewrite all documents in the newer format then?
> >
> > Ming-
> >
> >
&
see if that
> > changes what you see.
> >
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062 | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> >
Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> w: appinions.com <http://www.appinions.com/>
>
>
> On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang >wrote:
>
> > I have an index firs
I have an index first built with solr1.4 and later upgraded to solr3.6,
which has 150million documents, and all docs have a datefield which are not
blank. (verified by solr query).
I am using the following code snippet to retrieve
import org.apache.lucene.index.IndexReader;
import org.apache.luce
l.com> wrote:
> No, it is hard coded to split into two shards only. You can call it
> recursively on a sub shard to split into more pieces. Please note that some
> serious bugs were found in that command which will be fixed in the next
> (4.3.1) release of Solr.
>
>
> On Tu
>From the solr wiki, I saw this command (
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=&shard=shardId)
which split one index into 2 shards. However, is there someway to split
into more shards?
Thanks,
Ming-
I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud
index (v4.1, 4 shards) by using SolrEntityProcessor. My data-config.xml is
like
http://10.64.35.117:8995/solr/"; query="*:*" rows="2000" fl=
"author_class,authorlink,author_location_text,author_text,author,category,date,
We have a solr instance running on a 4 CPU box.
Sometimes, we send a query to our solr server and it take up 100% of one
CPU and > 60% of memory. I assume that if we send another query request,
solr should be able to use another idling CPU. However, it is not the
case. Using top, I only see on
Andre,
Thanks for the info! Unfortunately, my solr is on 3.6 version, and looks
like those options are not available. :(
Ming-
On Mon, May 6, 2013 at 5:32 AM, Andre Bois-Crettez wrote:
> On 05/06/2013 06:03 AM, Michael Sokolov wrote:
>
>> On 5/5/13 7:48 PM, Mingfeng Yang wrote:
at 3:33 AM, Dmitry Kan wrote:
> Are you doing it once? Is your index sharded? If so, can you ask each shard
> individually?
> Another way would be to do it on Lucene level, i.e. read from the binary
> indices (API exists).
>
> Dmitry
>
>
> On Mon, May 6, 2013 at
Dear Solr Users,
Does anyone know what is the best way to iterate through each document in a
Solr index with billion entries?
I tried to use select?q=*:*&start=xx&rows=500 to get 500 docs each time
and then change start value, but it got very slow after getting through
about 10 million docs.
T
Right. I am wondering if/how we can download a specific file from the
zookeeper, modify it and then upload to rewrite it. Anyone ?
Thanks,
Ming
On Fri, Apr 19, 2013 at 10:53 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:
> I would like to know the answer to this as well.
>
the url field a Disk based DocValue and shift the
> memory from Solr to the file system cache. But to run efficiently this is
> still going to take a lot of memory in the OS file cache.
>
>
>
>
> On Thu, Apr 18, 2013 at 12:00 PM, Mingfeng Yang >wrote:
>
> > 20G is allo
20G is allocated to Solr already.
Ming
On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen
wrote:
> On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > I am doing faceting on an index of 120M documents,
> > on the field of url[...]
>
> I would guess that you woul
Apr 17, 2013 at 12:06 PM, Mingfeng Yang >wrote:
>
> > I am doing faceting on an index of 120M documents, on the field of url,
> > using the following two queries. Note that the only difference of the
> two
> > queries is that first one uses default facet.method, and t
I am doing faceting on an index of 120M documents, on the field of url,
using the following two queries. Note that the only difference of the two
queries is that first one uses default facet.method, and the second one
uses face.method=enum. ( each document in the index contains a review we
extra
olr/**AnalyzersTokenizersTokenFilter**s#solr.**
> WordDelimiterFilterFactory<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>
> .
>
>
> -- Jack Krupansky
>
> -Original Message- From: Mingfeng Yang
> Sent: Thursday, April 1
looks like it's due to the word delimiter filter. Anyone know if the
"protected" file support regular expression or not?
Ming
On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky wrote:
> Try the whitespace tokenizer.
>
> -- Jack Krupansky
>
> -----Original Message
Dear Solr users and developers,
I am trying to index some documents some of which are twitter messages, and
we have a problem when indexing retweet.
Say a twitter user named "jpc_108" post a tweet, and then someone retweet
his msg, and now @jpc_108 become part of the tweet text body.
Seems like
gh I'd be very curious to see someone actually test that.
>
> Upayavira
>
> On Fri, Mar 8, 2013, at 09:51 PM, Mingfeng Yang wrote:
> > Generally speaking, which has better performance for Solr?
> > 1. updating some fields or adding new fields into a document.
> >
Generally speaking, which has better performance for Solr?
1. updating some fields or adding new fields into a document.
or
2. replacing the whole document.
As I understand, update fields need to search for the corresponding doc
first, and then replace field values. While replacing the whole doc
Looks like pivot facet with solrcloud does not work (I am using Solr 4.1).
The query below return no pivot search result unless I added
"&shards=shard1".
http://localhost:8995/solr/collection1/select?q=*%3A*&facet=true&facet.mincount=1&facet.pivot=source_domain,author&rows=1&wt=json&facet.limit=5
I see the items under my solorcloud data directory of "replica node" as
drwxr-xr-x 2 solr solr42 Feb 22 18:19 index
drwxr-xr-x 2 solr solr 12288 Feb 23 01:00 index.20130222181947835
-rw-r--r-- 1 solr solr78 Feb 22 18:25 index.properties
-rw-r--r-- 1 solr solr 209 Feb 22 18:25 replication
I cannot give an affirmative answer. But I am thinking that it would have
potential problem, as the index format in 3.3 and 4.1 are slightly
different.
Why don't you upgrade to 4.1? The only thing you need to do is
1. install solr 4.1
2.1 copy all related config files from 3.3
2.2 back up the in
How about passing -Dsolr.data.dir=/ur/data/dir in the command line to java
when you start Solr service.
On Thu, Feb 21, 2013 at 9:05 AM, chamara wrote:
> Yes that is what i am doing now? I taught this solution is not elegant for
> a
> deployment? Is there any other way to do this from the Solr
Chris,
My config file did include the section of loading related plugin.
Ming
On Tue, Feb 19, 2013 at 10:42 AM, Chris Hostetter
wrote:
>
> : Found it by myself. It's here
> :
> http://mirrors.ibiblio.org/maven2/org/apache/solr/solr-dataimporthandler/4.1.0/
> :
> : Download and move the jar fi
Found it by myself. It's here
http://mirrors.ibiblio.org/maven2/org/apache/solr/solr-dataimporthandler/4.1.0/
Download and move the jar file to solr-webapp/webapp/WEB-INF/lib directory,
and the errors are all gone.
Ming
On Mon, Feb 18, 2013 at 11:52 AM, Mingfeng Yang wrote:
> When t
When trying to use SolrEntityProcessor to do data import from another solr
index (solor 4.1)
I added the following in solrconfig.xml
data-config.xml
and create new file data-config.xml with
http://wolf:1Xnbdoq@myserver:8995/solr/"; query="*:*"
fl="id,md5_text,title,text
Shawn,
Awesome. Exactly something I am looking for.
Thanks!
Ming
On Thu, Feb 14, 2013 at 12:00 PM, Shawn Heisey wrote:
> On 2/14/2013 12:46 PM, Mingfeng Yang wrote:
>
>> I have a few Solr indexes, each with 20-200 millions documents, which were
>> indexed by querying m
I have a few Solr indexes, each with 20-200 millions documents, which were
indexed by querying multiple PostgreSQL databases. If I do rebuild the
index by the same way, it would take a few months, because the PostgresSQL
query is slow.
Now, I need to do the following changes to all indexes.
1. de
odes
>
> Good luck!
>
> Regards, Per Steffensen
>
>
>
> On 1/26/13 6:56 AM, Mingfeng Yang wrote:
>
>> Hi Mark,
>>
>> When I did testing with SolrCloud, I found the following.
>>
>> 1. I started 4 shards on the same host on port 8983, 8973, 8963, an
Our application of Solr is somehow non-typical. We constantly feed Solr
with lots of documents grabbed from internet, and NRT searching is not
required. A typical search will return millions of result, and query
response need to be as fast as possible.
Since in SolrCloud environment, indexing re
In your case, since there is no co-current queries, adding replicas won't
help much on improving the response speed. However, break your index into
a few shards do help increase query performance. I recently break an index
with 30 million documents (30G) into 4 shards, and the boost is pretty
impr
Before Solr 4.0, I secure solr by enable password protection in Jetty.
However, password protection will make solrcloud not work.
We use EC2 now, and we need the www admin interface of solr to be
accessible (with password) from anywhere.
How do you protect your solr sever from unauthorized acces
chines as replicas of the
> cores you want to move - then once they are active, unload the cores on the
> old machine, stop the Solr instances and remove the stuff left on the
> filesystem.
>
> - Mark
>
> On Jan 25, 2013, at 7:42 PM, Mingfeng Yang wrote:
>
> > Right now
Right now I have an index with four shards on a single EC2 server, each
running on different ports. Now I'd like to migrate three shards
to independent servers.
What should I do to safely accomplish this process?
Can I just
1. shutdown all four solr instances.
2. copy three shards (indexes) to d
We are migrating our Solr index from single index to multiple shards with
solrcloud. I noticed that when I query solrcloud (to all shards or just one
of the shards), the response has a field of maxScore, but query of single
index does not include this field.
In both cases, we are using Solr 4.0.
51 matches
Mail list logo