Re: Same field could be part of Query and filter

2009-08-06 Thread Ganesh
Any idea on this.Basically i require Filter of Filters. I want a single field to be part of Filter and Query Filter: us...@domain.com, us...@domain.com us...@domain.com Query: us...@domain.com OR User1 My requirement a group admin of 3 Users could view only 3 members data and he should also per

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Hi Phil, Well, kind of... but... Then, why, when I do the search in Luke, do I get the results I cited: ==> succeeds .yyy ==> fails (no results) I guess that I've been assuming that the search in Luke is "correct" and I've been using that to "test my understanding", but maybe that'

Re: Language Detection for Analysis?

2009-08-06 Thread Otis Gospodnetic
Bradford, If I may: Have a look at http://www.sematext.com/products/language-identifier/index.html And/or http://www.sematext.com/products/multilingual-indexer/index.html Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP,

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread Phil Whelan
Hi Jim, > As I said, based on the terms in Luke, I would have expected a web app query > on: > > path:file-1-2 > > to succeed, and a query on: > > path:file-1-2.dat > to fail. > > But, instead both of those succeed when I do a web query. This query will also pass through the same (hopefully) Ana

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Phil, I need to be more precise... The files that I have are at, say: C:\dir1\dir2\ so, for example, I have C:\dir1\dir2\file-1-1.dat C:\dir1\dir2\file-1-2.dat C:\dir1\dir2\file-1-3.dat C:\dir1\dir2\file-1-4.dat C:\dir1\dir2\file-1-5.dat After indexing, and, using Luke, I look at the "path" f

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Phil, Both my indexer and the webapp are basically from the Lucene demos, the indexer starting with the IndexFiles.java demo code, so I think they're both using the StandardAnalyzer. What appears in Luke, when I select "path" is just the filename part, without the extension, i.e., the "" p

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread Phil Whelan
Hi Jim, Are you using the same Analyzer for indexing and searching? .yyy will be seem as a HOSTNAME by StandardAnalyzer and will keep it as one term, whereas another indexer might split this into 2 terms. This should not matter either way as long as you are using the same Analyzer for both ind

Re: Efficient optimization of large indexes?

2009-08-06 Thread Nigel
On Wed, Aug 5, 2009 at 3:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Aug 5, 2009 at 12:08 PM, Nigel wrote: > > We periodically optimize large indexes (100 - 200gb) by calling > > IndexWriter.optimize(). It takes a heck of a long time, and I'm > wondering > > if a more

Re: Language Detection for Analysis?

2009-08-06 Thread Shai Erera
Thanks Robert for the explanation. I thought that you meant something different, like doing stemming in some sophisticated manner by somehow detecting the language. Doing these normalizations makes sense of course, especially if the letters look similar. Thanks again, Shai On Thu, Aug 6, 2009 at

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Paul Taylor
Paul Taylor wrote: Shai Erera wrote: No actually I think that's how the ACRONYM rule is defined: R.E.S. is detected as ACRONYM, and therefore is converted to RES. R.E.S is not detected as ACRONYM and therefore remains as R.E.S Hence the mismatch. Hi looking at your suggestion of https://issues.

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir
Shai, I mean doing language-agnostic things that apply to all of these since they are based on the same writing system, like normalizing all yeh characters (arabic yeh, farsi yeh, alef maksura) to the same form, removing harakat, the kinds of things in ArabicNormalizationFilter and PersianNormaliza

Re: Language Detection for Analysis?

2009-08-06 Thread Shai Erera
Robert - can you elaborate on what you mean by "just treat it at the script level"? On Thu, Aug 6, 2009 at 10:55 PM, Robert Muir wrote: > Bradford, there is an arabic analyzer in trunk. for farsi there is > currently a patch available: > http://issues.apache.org/jira/browse/LUCENE-1628 > > one o

Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Hi, In my indexer app (based on the IndexFiles.java demo), I am adding the "path" field: doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZED)); Per Luke, the full path (e.g., "c:\\.yyy") gets parsed, and one of the terms (again, per Luke) is "", i.e.,

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir
Bradford, there is an arabic analyzer in trunk. for farsi there is currently a patch available: http://issues.apache.org/jira/browse/LUCENE-1628 one option is not to detect languages at all. it could be hard for short queries due to the languages you mentioned borrowing from each other. but you do

Language Detection for Analysis?

2009-08-06 Thread Bradford Stephens
Hey there, We're trying to add foreign language support into our new search engine -- languages like Arabic, Farsi, and Urdu (that don't work with standard analyzers). But our data source doesn't tell us which languages we're actually collecting -- we just get blocks of text. Has anyone here worke

RE: Analysis Question

2009-08-06 Thread Christopher Condit
Hi Anshum- > You might want to look at writing a custom analyzer or something and > add a > document boost (while indexing) for documents containing those terms. Do you know how to access the document from an analyzer? It seems to only have access to the field... Thanks, -Chris ---

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Paul Taylor
Shai Erera wrote: No actually I think that's how the ACRONYM rule is defined: R.E.S. is detected as ACRONYM, and therefore is converted to RES. R.E.S is not detected as ACRONYM and therefore remains as R.E.S Hence the mismatch. Hi looking at your suggestion of https://issues.apache.org/jira/brow

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Paul Taylor
Ian Lea wrote: See https://issues.apache.org/jira/browse/LUCENE-1068 which appears to be talking about the same sort of thing, and StandardAnalyzer.setReplaceInvalidAcronym(b). Quite how you deal with this in your own analyzer is left as an exercise ... Yes I think you are right, though don

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Paul Taylor
Shai Erera wrote: I see you index R.E.S. and search for R.E.S (note the dot that's missing in the query at the end). Can you try to query w/ the dot? Yes if you search with the dot it works (i mentioned this in the first email) so it appears when the field is being indexed its no tremoving the

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Shai Erera
I see you index R.E.S. and search for R.E.S (note the dot that's missing in the query at the end). Can you try to query w/ the dot? On Thu, Aug 6, 2009 at 5:45 PM, Paul Taylor wrote: > Erick Erickson wrote: > >> I don't see anything obvious in the code. >> >> Are you using the same analzer at qu

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Ian Lea
See https://issues.apache.org/jira/browse/LUCENE-1068 which appears to be talking about the same sort of thing, and StandardAnalyzer.setReplaceInvalidAcronym(b). Quite how you deal with this in your own analyzer is left as an exercise ... -- Ian. On Thu, Aug 6, 2009 at 3:45 PM, Paul Taylor wr

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Paul Taylor
Erick Erickson wrote: I don't see anything obvious in the code. Are you using the same analzer at query time as at index time? Yes, I do I have created a testcase now, that fails import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.

Re: StandardFilter not handling dots as exptected ?

2009-08-06 Thread Erick Erickson
I don't see anything obvious in the code. Are you using the same analzer at query time as at index time? I'd also get a copy of Luke and examine your index to see what is actually getting put in it, and query.toString might help. Best Erick On Thu, Aug 6, 2009 at 10:03 AM, Paul Taylor wrote: >

StandardFilter not handling dots as exptected ?

2009-08-06 Thread Paul Taylor
Hi want the query "R.E.S" to match "R.E.S" I use StandardFilter in my analyzer below and the description says: 'Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token. ' so I thought that R.E.S. would b

RE: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Carl Austin
Thanks for the quick responses Mike. I have changed our readers to read only, and that works nicely. I perfered that to patching the lucene we have been using and testing with for some time. Thanks again for the help. Carl -Original Message- From: Michael McCandless [mailto:luc...@mike

Re: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Peter Keegan
Or you could try this patch: *LUCENE-1316 * Peter* * On Thu, Aug 6, 2009 at 8:51 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Opening your IndexReader with readOnly=true should also fix it, I think. > > Mike > > On Thu, Aug 6, 200

Re: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Michael McCandless
Opening your IndexReader with readOnly=true should also fix it, I think. Mike On Thu, Aug 6, 2009 at 8:41 AM, Carl Austin wrote: > Thanks Mike, > > Running this with the 2.9 build does resolve the issue it would seem. > > Unfortunately I can't move to 2.9, especially as it isn't in release yet. I

RE: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Carl Austin
Thanks Mike, Running this with the 2.9 build does resolve the issue it would seem. Unfortunately I can't move to 2.9, especially as it isn't in release yet. Is there a work-around for 2.4 known that will allow me to get around this issue as I notice the patches change some underlying classes su

Re: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Michael McCandless
Most likely you're hitting this issue: https://issues.apache.org/jira/browse/LUCENE-1316 Which is fixed in 2.9. Can you try running with 2.9 to confirm? Mike On Thu, Aug 6, 2009 at 8:19 AM, Carl Austin wrote: > Hi, > > I have been seeing an issue running MatchAllDocsQueries concurrently. >

MatchAllDocsQuery concurrency issue

2009-08-06 Thread Carl Austin
Hi, I have been seeing an issue running MatchAllDocsQueries concurrently. Running one against a test index is very fast (70 ms). Running two concurrently can take 5-25 seconds on the same test index! This issue doesn't occur with any other type of query I have used. Because of this, I have put tog

Re: Paging in a Lucene search

2009-08-06 Thread Shai Erera
If you pass reader.maxDoc(), it will create a heap (array) of size reader.maxDoc() and is not recommended. Instead, if you display the first page of results, you should pass 10 (assuming you display 10 results). You can call TopFieldDocs.totalHits to get the total number of matching results. Then

Paging in a Lucene search

2009-08-06 Thread Savvas-Andreas Moysidis
Hello, I'd like to ask if anybody has any thoughts on the best strategy to use when implementing a paging scenario in a Lucene search. In order to implement my paging list before the view is rendered I need to know the total number of documents this particular search would return but I still need

Re: Analysis Question

2009-08-06 Thread Anshum
Hi Cristopher, You might want to look at writing a custom analyzer or something and add a document boost (while indexing) for documents containing those terms. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinctio

Same field could be part of Query and filter

2009-08-06 Thread Ganesh
Hello all, I am having a field UserID, for every record. The results will be filtered for every User based on this field. We have a feature of group admin where a admin could view all records of a set of Users. I could manage to do by filtering on the set of IDs. I have doubt in the below sce