Hi Francisco,
>> I have many drug products leaflets, each corresponding to 1 product. In
the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Take a look at SolrTextTagger for this use case.
https://github.
Hi Naresh,
Couldn't you could just model this as an OR query since your requirement is
at least one (but can be more than one), ie:
tags:T1 tags:T2 tags:T3
-sujit
On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav wrote:
> Hi all,
>
> Also asked this here : http://stackoverflow.com/questions/3016
Hi Vijay,
I haven't tried this myself, but perhaps you could build the two phrases as
PhraseQueries and connect them up with a SpanQuery? Something like this
(using your original example).
PhraseQuery p1 = new PhraseQuery();
for (String word : "this is phrase 1".split()) {
p1.add(new Term("my
-----Original Message-
>
> From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of
> Sujit Pal
>
> Sent: Saturday, April 11, 2015 10:23 AM
>
> To: solr-user@lucene.apache.org; Ahmet Arslan
>
> Subject: Re: Enrich search results with external data
>
&
Hi Ha,
I am the author of the blog post you mention. To your question, I don't
know if the code will work without change (since the Lucene/Solr API has
evolved so much over the last few years), but a more "preferred" way using
Function Queries way may be found in slides for Timothy Potter's talk h
Hi Ludovic,
A bit late to the party, sorry, but here is a bit of a riff off Eric's
idea. Why not store the previous terms in a Bloom filter and once you get
the terms from this week, check to see if they are not in the set. Once you
find the set, add them to the Bloom filter. Bloom filters are spa
Hi Trey,
In an application I built few years ago, I had a component that rewrote the
input query into a Lucene BooleanQuery and we would set the
minimumNumberShouldMatch value for the query. Worked well, but lately we
are trying to move away from writing our own custom components since
maintaining
Hi Eugene,
In a system we built couple of years ago, we had a corpus of English and
French mixed (and Spanish on the way but that was implemented by client
after we handed off). We had different fields for each language. So (title,
body) for English docs was (title_en, body_en), for French (title_
Hi Smitha,
Have you looked at Facet queries? It allows you to attach Solr queries to
facets. The problem with this is that you will need to know all possible
combinations of language and binding (or make an initial query to find this
information).
https://wiki.apache.org/solr/SimpleFacetParameter
Have you looked at IndexSchema? That would offer you methods to query index
metadata using SolrJ.
http://lucene.apache.org/solr/4_7_2/solr-core/org/apache/solr/schema/IndexSchema.html
-sujit
On Tue, May 27, 2014 at 1:56 PM, T. Kuro Kurosaka wrote:
> I'd like to write Solr client code that wri
told me about, seems like difficult and
> time consuming for students like me as i will have to submit this in next
> 15 Days.
> Please suggest me something.
>
>
> On Tue, Mar 11, 2014 at 5:12 AM, Sujit Pal wrote:
>
> > Hi Sohan,
> >
> > You would be the best perso
an answer.
-sujit
On Sun, Mar 9, 2014 at 11:26 PM, Sohan Kalsariya
wrote:
> Thanks Sujit and all for your views about semantic search in solr.
> But How do i proceed towards, i mean how do i start off the things to get
> on track ?
>
>
>
> On Sat, Mar 8, 2014 at 10:50
Thanks for sharing this link Sohan, its an interesting approach. Since you
have effectively defined what you mean by Semantic Search, there are couple
other approaches I know of to do something like this:
1) preprocess your documents looking for terms that co-occur in the same
document. The more su
Hi Furkan,
In the stock definition of the payload field:
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup
the analyzer for payloads field type is a WhitespaceTokenizerFactory
followed by a DelimitedPayloadTokenFilterFactory. So if you send it
In our case, it is because all our other applications are deployed on
Tomcat and ops is familiar with the deployment process. We also had
customizations that needed to go in, so we inserted our custom JAR into the
solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr
was (almost
Hi Lisheng,
We did something similar in Solr using a custom handler (but I think you could
just build a custom QeryParser to do this), but you could do this in your
application as well, ie, get the language and then rewrite your query to use
the language specific fields. Come to think of it, th
Hi ballusethuraman,
I am sure you have done this already, but just to be sure, did you reindex your
existing kilometer data after you changed the data type from string to long? If
not, then you should.
-sujit
On Mar 23, 2013, at 11:21 PM, ballusethuraman wrote:
> Hi, I am having a colum
You could also do this outside Solr, in your client. If your query is
surrounded by quotes, then strip away the quotes and make
q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in
general keeping in mind the upgrade path.
-sujit
On Feb 21, 2013, at 12:20 PM, Van Tasse
the /). Now it works perfect.
>
> Best regards, Bart
>
>
> On 11 Feb 2013, at 20:13, SUJIT PAL wrote:
>
>> Hi Bart,
>>
>> Like I said, I didn't actually hook my UIMA stuff into Solr, content and
>> queries are annotated before they reach Solr. Wh
Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path.
>
> Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch
> can I checkout? This is the Stable release I am running:
>
> Solr 4.1.0 1434440 - sarowe - 2013-01-
Hi Siva,
You will probably get a better reply if you head over to the nutch mailing list
[http://nutch.apache.org/mailing_lists.html] and ask there.
Nutch 2.1 may be what you are looking for (stores pages in NoSQL database).
Regards,
Sujit
On Feb 10, 2013, at 9:16 PM, SivaKarthik wrote:
> De
Hi Bart,
I did some work with UIMA but this was to annotate the data before it goes to
Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through
the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you
will have to set up your own aggregate analysis cha
Hi Christian,
Since customization is not a problem in your case, how about writing out the
userId and excluded document ids to the database when it is excluded, and then
for each query from the user (possibly identified by a userid parameter),
lookup the database by userid, construct a NOT filt
Hi,
We are using google translate to do something like what you (onlinespending)
want to do, so maybe it will help.
During indexing, we store the searchable fields from documents into a fields
named _en, _fr, _es, etc. So assuming we capture title and body from each
document, the fields are (t
Hi Srilatha,
One way to do this would be by making two calls, one to your sponsored list
where you pick two at random and a solr call where you pick all the search
results and then stick them together in your client.
Sujit
On Oct 4, 2012, at 12:39 AM, srilatha wrote:
> For an E-commerce websi
Hi Alex,
I implemented something similar using the rules described in this page:
http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences
The idea is to normalize the British spelling form to the American form during
indexing and query using a tokenizer that takes in a wo
Hi Samarendra,
This does look like a candidate for a custom query component if you want to do
this inside Solr. You can of course continue to do this at the client.
-sujit
On May 15, 2012, at 12:26 PM, Samarendra Pratap wrote:
> Hi,
> I need a suggestion for improving relevance of search resul
Hi Ian,
I believe you may be able to use a bunch of facet.query parameters, something
like this:
facet.query=yourfield:[NOW-1DAY TO NOW]
facet.query=yourfield:[NOW-2DAY to NOW-1DAY]
...
and so on.
-sujit
On May 3, 2012, at 10:41 PM, Ian Holsman wrote:
> Hi.
>
> I would like to be able to do
Hi Hoss,
Thanks for the pointers, and sorry, it was a bug in my code (was some dead code
which was alphabetizing the facet link text and also the parameters themselves
indirectly by reference).
I actually ended up building a servlet and a component to print out the
multi-valued parameters usin
static ThreadLocal variable, thereby making it available
> to your Solr component. It's kind of a hack but would work.
>
> Sent from my phone
>
> On Mar 17, 2012, at 6:53 PM, "SUJIT PAL" wrote:
>
>> Thanks Pravesh,
>>
>> Yes, converting the mypara
Thanks Pravesh,
Yes, converting the myparam to a single (comma-separated) field is probably the
best approach, but as I mentioned, this is probably a bit too late for this to
be practical in my case...
The myparam parameters are facet filter queries, and so far order did not
matter, since the
Hello,
I have a custom component which depends on the ordering of a multi-valued
parameter. Unfortunately it looks like the values do not come back in the same
order as they were put in the URL. Here is some code to explain the behavior:
URL: /solr/my_custom_handler?q=something&myparam=foo&mypa
Hi Thomas,
With Java (from within a custom handler in Solr) you can get a handle to the
IndexSchema from the request, like so:
IndexSchema schema = req.getSchema();
SchemaField sf = schema.getField(fielaname);
boolean isMultiValued = sf.multiValued();
From within SolrJ code, you can use SolrDoc
to remove such
> special characters during both index and query analyzing so a
> "Company®" and "Company" are equivalent.
>
> But your problem space may differ.
>
> Best
> Erick
>
> On Wed, Feb 1, 2012 at 6:55 PM, SUJIT PAL wrote:
>> Hi Tejind
Hi Tejinder,
I had this problem yesterday (believe it or not :-)), and the fix for us was to
make Tomcat UTF-8 compliant. In server.xml, there is a tag, we
added the attribute URIEncoding="UTF-8" and restarted Tomcat. Not sure what
container you are using, if its Tomcat this will solve it, els
Hi Devon,
Have you considered using a permuterm index? Its workable, but depending
on your requirements (size of fields that you want to create the index
on), it may bloat your index. I've written about it here:
http://sujitpal.blogspot.com/2011/10/lucene-wildcard-query-and-permuterm.html
Anothe
Hi Eugene,
I proposed a solution for something similar, maybe it will help you.
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html
-sujit
On Sat, 2011-11-05 at 16:43 -0400, Eugene Strokin wrote:
> Hello,
> I have a task which seems trivial, but I couldn't find any
Hi Alireza,
Would this work? Sort the results by age desc, then loop through the
results as long as age == age[0].
-sujit
On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote:
> Hi,
>
> Are you just looking for:
>
> age:
>
> This will return all documents/records where age field is equal
> POST responses cannot be cached (see HTTP spec).
>
> POST requests do not include the arguments in the log, which makes your HTTP
> logs nearly useless for diagnosing problems.
>
> wunder
> Walter Underwood
>
> On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote:
>
>
If you use the CommonsHttpSolrServer from your client (not sure about
the other types, this is the one I use), you can pass the method as an
argument to its query() method, something like this:
QueryResponse rsp = server.query(params, METHOD.POST);
HTH
Sujit
On Fri, 2011-10-14 at 13:29 +, Ro
Hi Mouli,
I was looking at the code here, not sure why you even need to do the
sort...
After you get the DocList, couldn't you do something like this?
List topofferDocIds = new ArrayList();
for (DocIterator it = ergebnis.iterator(); it.hasNext();) {
topofferDocIds.add(it.next());
}
Collections
Sorry hit send too soon. Personally, given the use case, I think I would
still prefer the two query approach. It seems way too much work to do a
handler (unless you want to learn how to do it) to support this.
On Thu, 2011-09-22 at 12:31 -0700, Sujit Pal wrote:
> I have a few blog posts on t
I have a few blog posts on this...
http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html
http://sujitpal.blogspot.com/2011/04/more-fun-with-solr-component.html
http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html
but its quite simple, just look at
>
>
> On 21/09/2011 21:26, Sujit Pal wrote:
> > Hi MOuli,
> >
> > AFAIK (and I don't know that much about Solr), this feature does not
> > exist out of the box in Solr. One way to achieve this could be to
> > construct a DocSet with topoffer:true and i
Hi MOuli,
AFAIK (and I don't know that much about Solr), this feature does not
exist out of the box in Solr. One way to achieve this could be to
construct a DocSet with topoffer:true and intersect it with your result
DocSet, then select the first 5 off the intersection, randomly shuffle
them, subl
Would it make sense to have a "Did you mean?" type of functionality for
which you use the EdgeNGram and Metaphone filters /if/ you don't get
appropriate results for the user query?
So when user types "cannon" and the application notices that there are
no cannons for sale in the index (0 results wi
FWIW, we have some custom classes on top of solr as well. The way we do
it is using the following ant target:
...
Seems to work fine...basically automates what you have described in your
second paragraph, but allows us to keep ou
Hi Ron,
There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:
+name:clarke name_s:"clarke"^100
The name field
I have done this using a custom tokenfilter that (among other things)
detects hyphenated words and converts it to the 3 variations, using a
regex match on the incoming token:
(\w+)-(\w+)
that runs the following regex transform:
s/(\w+)-(\w+)/$1$2__$1 $2/
and then splits by "__" and passes the or
Hi Sowmya,
I basically wrote an annotator and built a buffering tokenizer around it
so I could include it in a Lucene analyzer pipeline. I've blogged about
it, not sure if its good form to include links to blog posts in public
forums, but here they are, apologies in advance if this is wrong (let m
This may or may not help you, we solved something similar based on
hyphenated words - essentially when we encountered a hyphenated word
(say word1-word2) we send in a OR query with the word (word1-word2)
itself, a phrase "word1 word2"~3 and the word formed by removing the
hyphen (word1word2).
But
/solr-external-scoring/
On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote:
>
> --- On Thu, 5/5/11, Sujit Pal wrote:
>
> > From: Sujit Pal
> > Subject: Custom sorting based on external (database) data
> > To: "solr-user"
> > Date: Thursday, May 5, 20
Hi,
Sorry for the possible double post, I wrote this up but had the
incorrect sender address, so I am guessing that my previous one is going
to be rejected by the list moderation daemon.
I am trying to figure out options for the following problem. I am on
Solr 1.4.1 (Lucene 2.9.1).
I have search
>
> On Thu, Apr 7, 2011 at 7:39 PM, Sujit Pal
> wrote:
> Hi,
>
> I am developing a SearchComponent that needs to build some
> initial
> DocSets and then intersect with the result DocSet during each
> query (in
>
.
Would still appreciate knowing if there is a simpler way, or if I am
wildly off the mark.
Thanks
Sujit
On Thu, 2011-04-07 at 16:39 -0700, Sujit Pal wrote:
> Hi,
>
> I am developing a SearchComponent that needs to build some initial
> DocSets and then intersect with the result DocSet
Hi,
I am developing a SearchComponent that needs to build some initial
DocSets and then intersect with the result DocSet during each query (in
process()).
When the searcher is reopened, I need to regenerate the initial DocSets.
I am on Solr 1.4.1.
My question is, which method in SearchComponent
Hello,
I am denormalizing a map of into a single lucene document
by storing it as "key1|score1 key2|score2 ...". In Solr, I pull this in
using the following analyzer definition.
I have my own PayloadSimilarity which overrides scorePayload.
The index is
here this
> not
> enough.
>
> Another requirement is, when the access permission is changed, we need to
> update
> the field - my understanding is we can not unless re-index the whole document
> again. Am I correct?
> thanks,
> canal
>
>
>
>
> _
How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.
-sujit
On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> Morning,
>
> We use solr to index a range of cont
This could probably be done using a custom QParser plugin?
Define the pattern like this:
String queryTemplate = "title:%Q%^2.0 body:%Q%";
then replace the %Q% with the value of the Q param, send it through
QueryParser.parse() and return the query.
-sujit
On Wed, 2011-03-02 at 11:28 -0800, mrw
Yes, check out the field type "payloads" in the schema.xml file. If you
set up one or more of your fields as type payloads (you would use the
DelimitedPayloadTokenFilterFactory during indexing in your analyzer
chain), you can then use the PayloadTermQuery to query it with, scoring
can be done with
Hi Derek,
The XML files you post to Solr needs to be in the correct Solr specific
XML format.
One way to "preserve" the original structure would be to "flatten" the
document into field names indicating the position of the text, for
example:
book_titleabbrev: Advancing Return on Investment Analys
If the dictionary is a Lucene index, wouldn't it be as simple as delete
using a term query? Something like this:
IndexReader sdreader = new IndexReader();
sdreader.delete(new Term("word", "sherri"));
...
sdreader.optimize();
sdreader.close();
I am guessing your dictionary is built dynamically usi
Why not use the Keyword attribute (setKeyword(true)) when you see an
email. If the keyword attribute is set, skip the tokenfilters in the
chains below it. There is also a KeywordMarkerFilter which does this
(this is done in SnowballPorterStemFilterFactory, maybe also other
places, but this is one p
We are currently a Lucene shop, the way we do it (currently) is to have
these results come from a database table (where it is available in rank
order). We want to move to Solr, so what I plan on doing to replicate
this functionality is to write a custom request handler that will do the
database que
Another option (assuming the case where a user can be granted access to
a certain class of documents, and more than one user would be able to
access certain documents) would be to store the access filter (as an OR
query of content types) in an external cache (perhaps a database or an
eternal cache
66 matches
Mail list logo