Re: A way to download URLs and index better ?

2010-01-15 Thread Ahmet Arslan
> Hi everyone, please help me this > question: > I need downloading some webpages from a list of URLs (about > 200 links) and > then index them by Lucene. > This list is not fixed, because it depends on definition of > my process. > Currently, in my web application, I wrote class for > downloading,

A way to download URLs and index better ?

2010-01-15 Thread Phan The Dai
Hi everyone, please help me this question: I need downloading some webpages from a list of URLs (about 200 links) and then index them by Lucene. This list is not fixed, because it depends on definition of my process. Currently, in my web application, I wrote class for downloading, but it download t

Re: Problem: Indexing and searching repeating groups of fields

2010-01-15 Thread Erick Erickson
Well, a variant on the easy solution might. What would happen if you indexed the un-split pairs in the same field? I.e. "java:5", "c:3", "php:2" all indexed as *single* tokens in the *same* field? But I think you should look at Digy's suggestion again. 6-10 fields is absolutely no problem at all.

Re: Problem: Indexing and searching repeating groups of fields

2010-01-15 Thread TJ Kolev
Found public int getPositionIncrementGap(String fieldName) on Analyzer. Sweet! Should've read more before emailing. tjk :) On Fri, Jan 15, 2010 at 10:19 AM, TJ Kolev wrote: > Hi! > > I don't think the easy solution will work for me, because I'll have more > than two fields in a group - perhaps

Re: Problem: Indexing and searching repeating groups of fields

2010-01-15 Thread TJ Kolev
Hi! I don't think the easy solution will work for me, because I'll have more than two fields in a group - perhaps 6 - 10. However using span queries looks very promising. I'll investigate that. I see setPositionIncrement() only on the Token object. Is there a way to set this when adding a field

Re: Finding frequency of regex query match in a field

2010-01-15 Thread Altimatic
I think so. I'll try to write up a quick demo app to see if it will work for what I require. Thanks for the prompt reply. Simon Willnauer wrote: > > One way to do it is to use the RegexTermEnum and iterate through your > terms manually. > > like the following pseudo code: > te = RegexTermEnum

Finding frequency of regex query match in a field

2010-01-15 Thread Altimatic
Hi All, I have an application that has to count the frequency that a specific regular expression is matched on a particular field for each document in an indexed directory. For example. Lets say I have 2 documents in the directory and each document has 3 fields, "table", "column" and "data".

Re: Finding frequency of regex query match in a field

2010-01-15 Thread Simon Willnauer
One way to do it is to use the RegexTermEnum and iterate through your terms manually. like the following pseudo code: te = RegexTermEnum(reader, Term("^T.*"), regexpCapabilities) while( te.next() ): t = te.term() td = reader.termDocs(t) while(td.next()): freqOfTermInCurrentDoc = td.freq(

Re: Finding frequency of regex query match in a field

2010-01-15 Thread Altimatic
I forgot to mention that I am using Lucene 3.0.0. -- View this message in context: http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175303p27175915.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Finding frequency of regex query match in a field

2010-01-15 Thread Altimatic
Hi All, I have an application that has to count the frequency that a specific regular expression is matched on a particular field for each document in an indexed directory. For example. Lets say I have 2 documents in the directory and each document has 3 fields, "table", "column" and "data".

Re: RangeFilter

2010-01-15 Thread Ian Lea
Indexing code looks OK at a glance. What does the search code look like? Should be easy enough to pass a disk based Directory to your write method to get an index you can look at/play with in Luke. -- Ian. On Thu, Jan 14, 2010 at 6:54 PM, AlexElba wrote: > > Did you completely re-index? > Ye

Re: Supported way to get segment from IndexWriter?

2010-01-15 Thread Michael McCandless
On Thu, Jan 14, 2010 at 7:57 PM, Chris Hostetter wrote: > > : Since SegmentInfos is now public, you could use SegmentInfos.read to > : read the current segments_N file, and then call its .size() method? > : > : But, this will only count as of the last commit... which is probably > : not sufficient