RE: Multiline Regex with Lucene

2009-07-28 Thread ba3
Hi Steve, In case of span queries, the span first query can specify the start of the span, is it possible to specify the term [not the position] indicating the end of the span ? -- Regards Ba3 Steven A Rowe wrote: > > Hi ba3, > > Check out the list of "Direct Known Subclasses" from the SpanQ

position available

2009-07-28 Thread Ted Dunning
I don't want to spam this mailing list (much). But I would like to pass the word that I have a search engineer position open. Follow up directly to me for all the exciting details. -- Ted Dunning, CTO DeepDyve tdunn...@deepdyve.com (858) 414-0013

Re: deadlock in indexing

2009-07-28 Thread Chengdu Huang
Thanks Uwe! I was looking at javadoc of IndexWriter in Lucene 2.4.0 and didn't find anything about thread safety. The wiki page does have that though. Thanks. Chengdu On Tue, Jul 28, 2009 at 2:20 PM, Uwe Schindler wrote: >> By "IndexWriter is threadsafe" do you mean that I can have to two >> t

RE: deadlock in indexing

2009-07-28 Thread Uwe Schindler
> By "IndexWriter is threadsafe" do you mean that I can have to two > threads, one calls IndexWriter.addDocument(), the other calls > IndexWriter.deleteDocuments() & IndexWriter.optimize(), without any > synchronization? Exactly. > For the deadlock, I also think it has something to do with mergin

Re: deadlock in indexing

2009-07-28 Thread Chengdu Huang
Hi Simon, By "IndexWriter is threadsafe" do you mean that I can have to two threads, one calls IndexWriter.addDocument(), the other calls IndexWriter.deleteDocuments() & IndexWriter.optimize(), without any synchronization? For the deadlock, I also think it has something to do with merging. Below

Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle

2009-07-28 Thread Bradford Stephens
Hey everyone, SLIGHT change of plans. A few people have asked me to move to a place with Air Conditioning, since the temperature's in the 90's this week. So, here we go: Big Time Brewing Company 4133 University Way NE Seattle, WA 98105 Call me at 904-415-3009 if you have any questions. On Mon

Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle

2009-07-28 Thread Bradford Stephens
On Mon, Jul 27, 2009 at 12:16 PM, Bradford Stephens wrote: > Hello again! > > Yes, I know some of us are still recovering from OSCON. It's time for > another delicious meetup to chat about Hadoop, HBase, Solr, Lucene, > and more! > > UW is quite a pain for us to access until August, so we're changi

Re: deadlock in indexing

2009-07-28 Thread Chengdu Huang
Thanks, Mike. Can you elaborate a bit more on why this would cause deadlock? Thanks. Chengdu On Tue, Jul 28, 2009 at 10:48 AM, Michael McCandless wrote: > This can in fact result in deadlock; you should sync on your own Object > instead. > > Mike > > On Tue, Jul 28, 2009 at 12:27 AM, Chengdu >

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, Ok, thanks for the clarifications. When I have some quiet time, I'll try to re-do the tests I did earlier and post back if any questions. Thanks again, Jim Matthew Hall wrote: > Oh.. no. > > If you specifically include a fieldname: blah in your clause, you don't > need a Mult

Re: deadlock in indexing

2009-07-28 Thread Michael McCandless
This can in fact result in deadlock; you should sync on your own Object instead. Mike On Tue, Jul 28, 2009 at 12:27 AM, Chengdu Huang wrote: > Hi, > > I have an application in which documents are added upon receiving a > user request and a background thread is needed to remove old > documents.  I

How to can I to customize the Similarity?

2009-07-28 Thread Fabrício Raphael
Hi! How to can I to customize the calculation of the similarity? I did my own index processor using the classes IndexWriter and IndexReader, because the my calculation of the index is very different the Vector Model. Then, based on my own index, I need to calculate the similarity using the Lucen

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Oh.. no. If you specifically include a fieldname: blah in your clause, you don't need a MultiFieldQueryParser. The purpose of the MFQP is to turn queries like this "blah" automatically into this "field1: blah" AND "field2: blah" AND "field3: blah" (Or OR if you set it up properly) When you

RE: Multiline Regex with Lucene

2009-07-28 Thread Steven A Rowe
Hi ba3, Check out the list of "Direct Known Subclasses" from the SpanQuery javadocs to see what's available: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/spans/SpanQuery.html SpanRegexQuery may be what you're looking for: http://lucene.apache.org/java/2_4_1/api/org/apache/l

RE: Multiline Regex with Lucene

2009-07-28 Thread Uwe Schindler
Yes, SpanRegExQuery - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ba3 [mailto:sbadhrin...@gmail.com] > Sent: Tuesday, July 28, 2009 6:53 PM > To: java-user@lucene.apache.org > Subject: Re: Multiline Re

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, I'll keep your comments in mind, but I'm still confused about something. I currently haven't changed much in the demo, other than adding that doc.add for "summary". With JUST that doc.add, having done my reading, I kind of expected NOT to be able to search on the "summary" at all, but

Re: Multiline Regex with Lucene

2009-07-28 Thread ba3
Hi, Thanks for the pointers. I will try the span queries. But can span query support regexp as a term ? Also for more details in the problem : The problem is like this: find a search string inside a block of statements. The block starts with a string and ends with a character.

Re: A question about the relevancy

2009-07-28 Thread Erick Erickson
Please start a new thread when you are asking a question on adifferent topic. See: http://people.apache.org/~hossman/#threadhijack When you open a new topic, please elaborate a bit more on what you're trying to accomplish. "An XML search engine" doe

Re: Multiline Regex with Lucene

2009-07-28 Thread Erick Erickson
I doubt you're thinking in terms of tokens. Your inputstream is broken up into tokens (think of them as words, depending upon the analyzer) and regex searchers are confined to those *tokens*. So the concept of a multi-line regex in a search is kind of ...odd... You could possibly index your input

RE: Multiline Regex with Lucene

2009-07-28 Thread Steven A Rowe
Hi ba3, StandardAnalyzer breaks text into individual terms, removes most punctuation, downcases, removes stopwords, etc. Your example text becomes the following sequence of terms: 1. hello 2. world 3. searched (this,is,to,be are all in the default stopword set)

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
You can choose to do either, Having items in multiple fields allows you to apply field specific boosts, thusly making matches to certain fields more important to others. But, if that's not something that you care about the second technique is useful in that it vastly simplifies your index str

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Hi Matthew and Ian, Thanks, I'll try that, but, in the meantime, I've been doing some reading (Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on multiple fields". I was just about to try to what's described in that section, i.e., using MultiFieldQueryParser.parse(), o

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Yeah, Ian has it nailed on the head here. Can't believe I missed it in the initial writeup. Matt Ian Lea wrote: Jim Glancing at SearchFiles.java I can see Analyzer analyzer = new StandardAnalyzer(); ... QueryParser parser = new QueryParser(field, analyzer); ... Query query = parser.parse(li

Re: New to Lucene - some questions about demo

2009-07-28 Thread Ian Lea
Jim Glancing at SearchFiles.java I can see Analyzer analyzer = new StandardAnalyzer(); ... QueryParser parser = new QueryParser(field, analyzer); ... Query query = parser.parse(line); so any query term you enter will be run through StandardAnalyzer which will, amongst other things, convert it t

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Ian and Matthew, I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(. Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(... I looked at the SearchFiles.java code, and it looks like

Re: New to Lucene - some questions about demo

2009-07-28 Thread Ian Lea
Hi Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo" in your example, and if you search for "foofoo" it won't match. A search for "FooFoo" would, assuming that your search terms are not being lowercased. -- Ian. On Tue, Jul 28, 2009 at 1:56 PM, Ohaya wrote: > Hi, > > I'm

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Oh, also check to see which Analyzer the demo webapp/indexer is using. Its entirely possible the analyzer that has been chosen isn't lowercasing input, which could also cause you issues. I'd be willing to bet your issue lies in one of these two problems I've mentioned ^^ Matt Matthew Hall

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Restart tomcat. When the indexes are read in at initialization time they are a snapshot of what the indexes contained at that moment. Unless the demo specifically either closes its IndexReader and creates a new one, or calls IndexReader.reopen periodically (Which I don't remember it doing) y

New to Lucene - some questions about demo

2009-07-28 Thread Ohaya
Hi, I'm just starting to work with Lucene, and I guess that I learn best by working with code, so I've started with the demos in the Lucene distribution. I got the IndexFiles.java and IndexHTML.java working, and also the luceneweb.war is deployed to Tomcat. I used IndexFiles.java to index

Re: A question about the relevancy

2009-07-28 Thread henok sahilu
hello there is there anyone who can tell me how to set up an XML search engine. please give an open source written in java thanks henok --- On Thu, 7/23/09, Erick Erickson wrote: From: Erick Erickson Subject: Re: A question about the relevancy To: java-user@lucene.apache.org Date: Thursday, J

Re: Doc IDs via IndexReader?

2009-07-28 Thread henok sahilu
hey i had the same problem. then i used TopDocs calss. it will give the first n top documets. and you can play around cheers --- On Wed, 7/22/09, Anuj Bhatt wrote: From: Anuj Bhatt Subject: Doc IDs via IndexReader? To: java-user@lucene.apache.org Date: Wednesday, July 22, 2009, 7:58 PM Hi, I

Re: Generating Query for Multiple Clauses in a Single Field

2009-07-28 Thread AHMET ARSLAN
> generate a query like the following: > title:(+chemistry +"national curriculum") I didn't understand what exactly you are asking but the query string is already well-formatted. You can pass this string directly to the parse method of QueryParser. The following four examples yields the same Qu

Generating Query for Multiple Clauses in a Single Field

2009-07-28 Thread blazingwolf7
Hi, I am currently creating a search engine and will need to generate a query like the following: title:(+chemistry +"national curriculum") its mention that it can be done using the QueryParser but unfortunately I can't find any reference in how to used it. Can anyone help me with this? Thanks

Re: deadlock in indexing

2009-07-28 Thread Simon Willnauer
I can not help you to figure out your exact problem but you can use an the same indexwriter instance without synchronization. IndexWriter is threadsafe so you synchronized block seems obsolet. I could imagine that there is a backgroud merge going on while you try to access the critical section ( yo