[
https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-732:
------------------------------------
Attachment: sub.patch
Turns out this was due to a way the list of applicable collections is created,
and how that field is added to the indexing backend. First, it appends a
leading space, creating collection names like ' nutch' instead of 'nutch'.
Then, instead of tokenizing this field it passes it as is, so the leading space
is kept and prevents you from running a query.
I changed the collection name appending logic, and turned the field into
tokenized.
I'll commit the patch shortly.
> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
> Key: NUTCH-732
> URL: https://issues.apache.org/jira/browse/NUTCH-732
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.0.0
> Environment: Mac OS X 10.5 intel
> Reporter: Filipe Antunes
> Priority: Critical
> Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was
> working properly.
> The field subcollection is populated as it should, but searching for any
> subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a
> subcollection associated, yet i can't search for using the subcollection
> field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.