Re: Detect term occurrences

2015-09-11 Thread Sujit Pal
Hi Francisco, >> I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Take a look at SolrTextTagger for this use case. https://github.

Re: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Sujit Pal
Hi Naresh, Couldn't you could just model this as an OR query since your requirement is at least one (but can be more than one), ie: tags:T1 tags:T2 tags:T3 -sujit On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav wrote: > Hi all, > > Also asked this here : http://stackoverflow.com/questions/3016

Re: Proximity Search

2015-04-30 Thread Sujit Pal
Hi Vijay, I haven't tried this myself, but perhaps you could build the two phrases as PhraseQueries and connect them up with a SpanQuery? Something like this (using your original example). PhraseQuery p1 = new PhraseQuery(); for (String word : "this is phrase 1".split()) { p1.add(new Term("my

Re: Enrich search results with external data

2015-04-17 Thread Sujit Pal
-----Original Message- > > From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of > Sujit Pal > > Sent: Saturday, April 11, 2015 10:23 AM > > To: solr-user@lucene.apache.org; Ahmet Arslan > > Subject: Re: Enrich search results with external data > &

Re: Enrich search results with external data

2015-04-11 Thread Sujit Pal
Hi Ha, I am the author of the blog post you mention. To your question, I don't know if the code will work without change (since the Lucene/Solr API has evolved so much over the last few years), but a more "preferred" way using Function Queries way may be found in slides for Timothy Potter's talk h

Re: Get the new terms of fields since last update

2014-12-05 Thread Sujit Pal
Hi Ludovic, A bit late to the party, sorry, but here is a bit of a riff off Eric's idea. Why not store the previous terms in a Bloom filter and once you get the terms from this week, check to see if they are not in the set. Once you find the set, add them to the Bloom filter. Bloom filters are spa

Re: What's the most efficient way to sort by "number of terms matched"?

2014-11-06 Thread Sujit Pal
Hi Trey, In an application I built few years ago, I had a component that rewrote the input query into a Lucene BooleanQuery and we would set the minimumNumberShouldMatch value for the query. Worked well, but lately we are trying to move away from writing our own custom components since maintaining

Re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Sujit Pal
Hi Eugene, In a system we built couple of years ago, we had a corpus of English and French mixed (and Spanish on the way but that was implemented by client after we handed off). We had different fields for each language. So (title, body) for English docs was (title_en, body_en), for French (title_

Re: Query on Facet

2014-07-30 Thread Sujit Pal
Hi Smitha, Have you looked at Facet queries? It allows you to attach Solr queries to facets. The problem with this is that you will need to know all possible combinations of language and binding (or make an initial query to find this information). https://wiki.apache.org/solr/SimpleFacetParameter

Re: Any Solrj API to obtain field list?

2014-05-27 Thread Sujit Pal
Have you looked at IndexSchema? That would offer you methods to query index metadata using SolrJ. http://lucene.apache.org/solr/4_7_2/solr-core/org/apache/solr/schema/IndexSchema.html -sujit On Tue, May 27, 2014 at 1:56 PM, T. Kuro Kurosaka wrote: > I'd like to write Solr client code that wri

Re: How to apply Semantic Search in Solr

2014-03-11 Thread Sujit Pal
told me about, seems like difficult and > time consuming for students like me as i will have to submit this in next > 15 Days. > Please suggest me something. > > > On Tue, Mar 11, 2014 at 5:12 AM, Sujit Pal wrote: > > > Hi Sohan, > > > > You would be the best perso

Re: How to apply Semantic Search in Solr

2014-03-10 Thread Sujit Pal
an answer. -sujit On Sun, Mar 9, 2014 at 11:26 PM, Sohan Kalsariya wrote: > Thanks Sujit and all for your views about semantic search in solr. > But How do i proceed towards, i mean how do i start off the things to get > on track ? > > > > On Sat, Mar 8, 2014 at 10:50

Re: How to apply Semantic Search in Solr

2014-03-08 Thread Sujit Pal
Thanks for sharing this link Sohan, its an interesting approach. Since you have effectively defined what you mean by Semantic Search, there are couple other approaches I know of to do something like this: 1) preprocess your documents looking for terms that co-occur in the same document. The more su

Re: Multivalued true Error?

2013-11-26 Thread Sujit Pal
Hi Furkan, In the stock definition of the payload field: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup the analyzer for payloads field type is a WhitespaceTokenizerFactory followed by a DelimitedPayloadTokenFilterFactory. So if you send it

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Sujit Pal
In our case, it is because all our other applications are deployed on Tomcat and ops is familiar with the deployment process. We also had customizations that needed to go in, so we inserted our custom JAR into the solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr was (almost

Re: Solr language-dependent sort

2013-04-08 Thread SUJIT PAL
Hi Lisheng, We did something similar in Solr using a custom handler (but I think you could just build a custom QeryParser to do this), but you could do this in your application as well, ie, get the language and then rewrite your query to use the language specific fields. Come to think of it, th

Re: Solr Sorting is not working properly on long Fields

2013-03-24 Thread SUJIT PAL
Hi ballusethuraman, I am sure you have done this already, but just to be sure, did you reindex your existing kilometer data after you changed the data type from string to long? If not, then you should. -sujit On Mar 23, 2013, at 11:21 PM, ballusethuraman wrote: > Hi, I am having a colum

Re: Matching an exact word

2013-02-21 Thread SUJIT PAL
You could also do this outside Solr, in your client. If your query is surrounded by quotes, then strip away the quotes and make q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in general keeping in mind the upgrade path. -sujit On Feb 21, 2013, at 12:20 PM, Van Tasse

Re: Can Solr analyze content and find dates and places

2013-02-11 Thread SUJIT PAL
the /). Now it works perfect. > > Best regards, Bart > > > On 11 Feb 2013, at 20:13, SUJIT PAL wrote: > >> Hi Bart, >> >> Like I said, I didn't actually hook my UIMA stuff into Solr, content and >> queries are annotated before they reach Solr. Wh

Re: Can Solr analyze content and find dates and places

2013-02-11 Thread SUJIT PAL
Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path. > > Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch > can I checkout? This is the Stable release I am running: > > Solr 4.1.0 1434440 - sarowe - 2013-01-

Re: Crawl Anywhere -

2013-02-10 Thread SUJIT PAL
Hi Siva, You will probably get a better reply if you head over to the nutch mailing list [http://nutch.apache.org/mailing_lists.html] and ask there. Nutch 2.1 may be what you are looking for (stores pages in NoSQL database). Regards, Sujit On Feb 10, 2013, at 9:16 PM, SivaKarthik wrote: > De

Re: Can Solr analyze content and find dates and places

2013-02-08 Thread SUJIT PAL
Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis cha

Re: Per user document exclusions

2012-11-19 Thread SUJIT PAL
Hi Christian, Since customization is not a problem in your case, how about writing out the userId and excluded document ids to the database when it is excluded, and then for each query from the user (possibly identified by a userid parameter), lookup the database by userid, construct a NOT filt

Re: Query foreign language "synonyms" / words of equivalent meaning?

2012-10-10 Thread SUJIT PAL
Hi, We are using google translate to do something like what you (onlinespending) want to do, so maybe it will help. During indexing, we store the searchable fields from documents into a fields named _en, _fr, _es, etc. So assuming we capture title and body from each document, the fields are (t

Re: How to make SOLR manipulate the results?

2012-10-04 Thread SUJIT PAL
Hi Srilatha, One way to do this would be by making two calls, one to your sponsored list where you pick two at random and a solr call where you pick all the search results and then stick them together in your client. Sujit On Oct 4, 2012, at 12:39 AM, srilatha wrote: > For an E-commerce websi

Re: Synonym file for American-British words

2012-08-07 Thread SUJIT PAL
Hi Alex, I implemented something similar using the rules described in this page: http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences The idea is to normalize the British spelling form to the American form during indexing and query using a tokenizer that takes in a wo

Re: First query to find meta data, second to search. How to group into one?

2012-05-15 Thread SUJIT PAL
Hi Samarendra, This does look like a candidate for a custom query component if you want to do this inside Solr. You can of course continue to do this at the client. -sujit On May 15, 2012, at 12:26 PM, Samarendra Pratap wrote: > Hi, > I need a suggestion for improving relevance of search resul

Re: Faceting on a date field multiple times

2012-05-04 Thread SUJIT PAL
Hi Ian, I believe you may be able to use a bunch of facet.query parameters, something like this: facet.query=yourfield:[NOW-1DAY TO NOW] facet.query=yourfield:[NOW-2DAY to NOW-1DAY] ... and so on. -sujit On May 3, 2012, at 10:41 PM, Ian Holsman wrote: > Hi. > > I would like to be able to do

Re: Any way to get reference to original request object from within Solr component?

2012-03-20 Thread SUJIT PAL
Hi Hoss, Thanks for the pointers, and sorry, it was a bug in my code (was some dead code which was alphabetizing the facet link text and also the parameters themselves indirectly by reference). I actually ended up building a servlet and a component to print out the multi-valued parameters usin

Re: Any way to get reference to original request object from within Solr component?

2012-03-18 Thread SUJIT PAL
static ThreadLocal variable, thereby making it available > to your Solr component. It's kind of a hack but would work. > > Sent from my phone > > On Mar 17, 2012, at 6:53 PM, "SUJIT PAL" wrote: > >> Thanks Pravesh, >> >> Yes, converting the mypara

Re: Any way to get reference to original request object from within Solr component?

2012-03-17 Thread SUJIT PAL
Thanks Pravesh, Yes, converting the myparam to a single (comma-separated) field is probably the best approach, but as I mentioned, this is probably a bit too late for this to be practical in my case... The myparam parameters are facet filter queries, and so far order did not matter, since the

Any way to get reference to original request object from within Solr component?

2012-03-16 Thread SUJIT PAL
Hello, I have a custom component which depends on the ordering of a multi-valued parameter. Unfortunately it looks like the values do not come back in the same order as they were put in the URL. Here is some code to explain the behavior: URL: /solr/my_custom_handler?q=something&myparam=foo&mypa

Re: How to check if a field is a multivalue field with java

2012-02-22 Thread SUJIT PAL
Hi Thomas, With Java (from within a custom handler in Solr) you can get a handle to the IndexSchema from the request, like so: IndexSchema schema = req.getSchema(); SchemaField sf = schema.getField(fielaname); boolean isMultiValued = sf.multiValued(); From within SolrJ code, you can use SolrDoc

Re: How to make search with special characters in keywords

2012-02-01 Thread SUJIT PAL
to remove such > special characters during both index and query analyzing so a > "Company®" and "Company" are equivalent. > > But your problem space may differ. > > Best > Erick > > On Wed, Feb 1, 2012 at 6:55 PM, SUJIT PAL wrote: >> Hi Tejind

Re: How to make search with special characters in keywords

2012-02-01 Thread SUJIT PAL
Hi Tejinder, I had this problem yesterday (believe it or not :-)), and the fix for us was to make Tomcat UTF-8 compliant. In server.xml, there is a tag, we added the attribute URIEncoding="UTF-8" and restarted Tomcat. Not sure what container you are using, if its Tomcat this will solve it, els

Re: Solr, SQL Server's LIKE

2011-12-29 Thread Sujit Pal
Hi Devon, Have you considered using a permuterm index? Its workable, but depending on your requirements (size of fields that you want to create the index on), it may bloat your index. I've written about it here: http://sujitpal.blogspot.com/2011/10/lucene-wildcard-query-and-permuterm.html Anothe

Re: Dynamic rating based on "Like" feature

2011-11-05 Thread Sujit Pal
Hi Eugene, I proposed a solution for something similar, maybe it will help you. http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html -sujit On Sat, 2011-11-05 at 16:43 -0400, Eugene Strokin wrote: > Hello, > I have a task which seems trivial, but I couldn't find any

Re: Find Documents with field = maxValue

2011-10-18 Thread Sujit Pal
Hi Alireza, Would this work? Sort the results by age desc, then loop through the results as long as age == age[0]. -sujit On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote: > Hi, > > Are you just looking for: > > age: > > This will return all documents/records where age field is equal

Re: SolrJ + Post

2011-10-14 Thread Sujit Pal
> POST responses cannot be cached (see HTTP spec). > > POST requests do not include the arguments in the log, which makes your HTTP > logs nearly useless for diagnosing problems. > > wunder > Walter Underwood > > On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote: > >

Re: SolrJ + Post

2011-10-14 Thread Sujit Pal
If you use the CommonsHttpSolrServer from your client (not sure about the other types, this is the one I use), you can pass the method as an argument to its query() method, something like this: QueryResponse rsp = server.query(params, METHOD.POST); HTH Sujit On Fri, 2011-10-14 at 13:29 +, Ro

Re: Sort five random "Top Offers" to the top

2011-10-03 Thread Sujit Pal
Hi Mouli, I was looking at the code here, not sure why you even need to do the sort... After you get the DocList, couldn't you do something like this? List topofferDocIds = new ArrayList(); for (DocIterator it = ergebnis.iterator(); it.hasNext();) { topofferDocIds.add(it.next()); } Collections

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Sujit Pal
Sorry hit send too soon. Personally, given the use case, I think I would still prefer the two query approach. It seems way too much work to do a handler (unless you want to learn how to do it) to support this. On Thu, 2011-09-22 at 12:31 -0700, Sujit Pal wrote: > I have a few blog posts on t

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Sujit Pal
I have a few blog posts on this... http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html http://sujitpal.blogspot.com/2011/04/more-fun-with-solr-component.html http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html but its quite simple, just look at

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Sujit Pal
> > > On 21/09/2011 21:26, Sujit Pal wrote: > > Hi MOuli, > > > > AFAIK (and I don't know that much about Solr), this feature does not > > exist out of the box in Solr. One way to achieve this could be to > > construct a DocSet with topoffer:true and i

Re: Sort five random "Top Offers" to the top

2011-09-21 Thread Sujit Pal
Hi MOuli, AFAIK (and I don't know that much about Solr), this feature does not exist out of the box in Solr. One way to achieve this could be to construct a DocSet with topoffer:true and intersect it with your result DocSet, then select the first 5 off the intersection, randomly shuffle them, subl

Re: Too many results in dismax queries with one word

2011-08-21 Thread Sujit Pal
Would it make sense to have a "Did you mean?" type of functionality for which you use the EdgeNGram and Metaphone filters /if/ you don't get appropriate results for the user query? So when user types "cannon" and the application notices that there are no cannons for sale in the index (0 results wi

Re: Problems generating war distribution using ant

2011-08-16 Thread Sujit Pal
FWIW, we have some custom classes on top of solr as well. The way we do it is using the following ant target: ... Seems to work fine...basically automates what you have described in your second paragraph, but allows us to keep ou

Re: Exact matching on names?

2011-08-16 Thread Sujit Pal
Hi Ron, There was a discussion about this some time back, which I implemented (with great success btw) in my own code...basically you store both the analyzed and non-analyzed versions (use string type) in the index, then send in a query like this: +name:clarke name_s:"clarke"^100 The name field

Re: Strip special chars like "-"

2011-08-09 Thread Sujit Pal
I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+) that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by "__" and passes the or

Re: (Solr-UIMA) Doubt regarding integrating UIMA in to solr - Configuration.

2011-07-08 Thread Sujit Pal
Hi Sowmya, I basically wrote an annotator and built a buffering tokenizer around it so I could include it in a Lucene analyzer pipeline. I've blogged about it, not sure if its good form to include links to blog posts in public forums, but here they are, apologies in advance if this is wrong (let m

Re: Results with and without whitspace(soccer club and soccerclub)

2011-05-20 Thread Sujit Pal
This may or may not help you, we solved something similar based on hyphenated words - essentially when we encountered a hyphenated word (say word1-word2) we send in a OR query with the word (word1-word2) itself, a phrase "word1 word2"~3 and the word formed by removing the hyphen (word1word2). But

Re: Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
/solr-external-scoring/ On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote: > > --- On Thu, 5/5/11, Sujit Pal wrote: > > > From: Sujit Pal > > Subject: Custom sorting based on external (database) data > > To: "solr-user" > > Date: Thursday, May 5, 20

Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
Hi, Sorry for the possible double post, I wrote this up but had the incorrect sender address, so I am guessing that my previous one is going to be rejected by the list moderation daemon. I am trying to figure out options for the following problem. I am on Solr 1.4.1 (Lucene 2.9.1). I have search

Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
> > On Thu, Apr 7, 2011 at 7:39 PM, Sujit Pal > wrote: > Hi, > > I am developing a SearchComponent that needs to build some > initial > DocSets and then intersect with the result DocSet during each > query (in >

Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
. Would still appreciate knowing if there is a simpler way, or if I am wildly off the mark. Thanks Sujit On Thu, 2011-04-07 at 16:39 -0700, Sujit Pal wrote: > Hi, > > I am developing a SearchComponent that needs to build some initial > DocSets and then intersect with the result DocSet

Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
Hi, I am developing a SearchComponent that needs to build some initial DocSets and then intersect with the result DocSet during each query (in process()). When the searcher is reopened, I need to regenerate the initial DocSets. I am on Solr 1.4.1. My question is, which method in SearchComponent

Any way to do payload queries in Luke?

2011-03-11 Thread Sujit Pal
Hello, I am denormalizing a map of into a single lucene document by storing it as "key1|score1 key2|score2 ...". In Solr, I pull this in using the following analyzer definition. I have my own PayloadSimilarity which overrides scorePayload. The index is

Re: Solr and Permissions

2011-03-11 Thread Sujit Pal
here this > not > enough. > > Another requirement is, when the access permission is changed, we need to > update > the field - my understanding is we can not unless re-index the whole document > again. Am I correct? > thanks, > canal > > > > > _

Re: Solr and Permissions

2011-03-10 Thread Sujit Pal
How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: > Morning, > > We use solr to index a range of cont

Re: Understanding multi-field queries with q and fq

2011-03-02 Thread Sujit Pal
This could probably be done using a custom QParser plugin? Define the pattern like this: String queryTemplate = "title:%Q%^2.0 body:%Q%"; then replace the %Q% with the value of the Q param, send it through QueryParser.parse() and return the query. -sujit On Wed, 2011-03-02 at 11:28 -0800, mrw

Re: Solr Payloads retrieval

2011-02-28 Thread Sujit Pal
Yes, check out the field type "payloads" in the schema.xml file. If you set up one or more of your fields as type payloads (you would use the DelimitedPayloadTokenFilterFactory during indexing in your analyzer chain), you can then use the PayloadTermQuery to query it with, scoring can be done with

Re: loading XML docbook files into solr

2011-02-26 Thread Sujit Pal
Hi Derek, The XML files you post to Solr needs to be in the correct Solr specific XML format. One way to "preserve" the original structure would be to "flatten" the document into field names indicating the position of the text, for example: book_titleabbrev: Advancing Return on Investment Analys

Re: manually editing spellcheck dictionary

2011-02-25 Thread Sujit Pal
If the dictionary is a Lucene index, wouldn't it be as simple as delete using a term query? Something like this: IndexReader sdreader = new IndexReader(); sdreader.delete(new Term("word", "sherri")); ... sdreader.optimize(); sdreader.close(); I am guessing your dictionary is built dynamically usi

Re: Keeping Tokens Whole by Force?

2011-02-14 Thread Sujit Pal
Why not use the Keyword attribute (setKeyword(true)) when you see an email. If the keyword attribute is set, skip the tokenfilters in the chains below it. There is also a KeywordMarkerFilter which does this (this is done in SnowballPorterStemFilterFactory, maybe also other places, but this is one p

Re: boosting results by a query?

2011-02-11 Thread Sujit Pal
We are currently a Lucene shop, the way we do it (currently) is to have these results come from a database table (where it is available in rank order). We want to move to Solr, so what I plan on doing to replicate this functionality is to write a custom request handler that will do the database que

Re: Architecture decisions with Solr

2011-02-09 Thread Sujit Pal
Another option (assuming the case where a user can be granted access to a certain class of documents, and more than one user would be able to access certain documents) would be to store the access filter (as an OR query of content types) in an external cache (perhaps a database or an eternal cache