> Hi everyone, please help me this
> question:
> I need downloading some webpages from a list of URLs (about
> 200 links) and
> then index them by Lucene.
> This list is not fixed, because it depends on definition of
> my process.
> Currently, in my web application, I wrote class for
> downloading,
Hi everyone, please help me this question:
I need downloading some webpages from a list of URLs (about 200 links) and
then index them by Lucene.
This list is not fixed, because it depends on definition of my process.
Currently, in my web application, I wrote class for downloading, but it
download t
Well, a variant on the easy solution might. What would
happen if you indexed the un-split pairs in the same
field? I.e.
"java:5", "c:3", "php:2" all indexed as *single* tokens
in the *same* field?
But I think you should look at Digy's suggestion again.
6-10 fields is absolutely no problem at all.
Found public int getPositionIncrementGap(String fieldName) on Analyzer.
Sweet! Should've read more before emailing.
tjk :)
On Fri, Jan 15, 2010 at 10:19 AM, TJ Kolev wrote:
> Hi!
>
> I don't think the easy solution will work for me, because I'll have more
> than two fields in a group - perhaps
Hi!
I don't think the easy solution will work for me, because I'll have more
than two fields in a group - perhaps 6 - 10.
However using span queries looks very promising. I'll investigate that.
I see setPositionIncrement() only on the Token object. Is there a way to set
this when adding a field
I think so. I'll try to write up a quick demo app to see if it will work for
what I require.
Thanks for the prompt reply.
Simon Willnauer wrote:
>
> One way to do it is to use the RegexTermEnum and iterate through your
> terms manually.
>
> like the following pseudo code:
> te = RegexTermEnum
Hi All,
I have an application that has to count the frequency that a specific
regular expression is matched on a particular field for each document in an
indexed directory.
For example.
Lets say I have 2 documents in the directory and each document has 3 fields,
"table", "column" and "data".
One way to do it is to use the RegexTermEnum and iterate through your
terms manually.
like the following pseudo code:
te = RegexTermEnum(reader, Term("^T.*"), regexpCapabilities)
while( te.next() ):
t = te.term()
td = reader.termDocs(t)
while(td.next()):
freqOfTermInCurrentDoc = td.freq(
I forgot to mention that I am using Lucene 3.0.0.
--
View this message in context:
http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175303p27175915.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---
Hi All,
I have an application that has to count the frequency that a specific
regular expression is matched on a particular field for each document in an
indexed directory.
For example.
Lets say I have 2 documents in the directory and each document has 3 fields,
"table", "column" and "data".
Indexing code looks OK at a glance. What does the search code look
like? Should be easy enough to pass a disk based Directory to your
write method to get an index you can look at/play with in Luke.
--
Ian.
On Thu, Jan 14, 2010 at 6:54 PM, AlexElba wrote:
>
> Did you completely re-index?
> Ye
On Thu, Jan 14, 2010 at 7:57 PM, Chris Hostetter
wrote:
>
> : Since SegmentInfos is now public, you could use SegmentInfos.read to
> : read the current segments_N file, and then call its .size() method?
> :
> : But, this will only count as of the last commit... which is probably
> : not sufficient
12 matches
Mail list logo