Re: Help with delimited text

Mark Wiltshire Thu, 07 Apr 2011 04:22:03 -0700

Thanks Ian, your a star :-)

Mark

On 7 Apr 2011, at 11:18, Ian Lea wrote:

Mark - I've uploaded some code to http://pastebin.com/mqSVcWUi that
indexes and searches file system paths. It demonstrates what I've
been trying to suggest and may help you get your search up and
running.

--
Ian.

On Thu, Apr 7, 2011 at 8:18 AM, Mark Wiltshire
<m...@redalertconsultants.co.uk> wrote:

Hi Thanks Ian for you help on this, its driving me nuts :-)

The StandardAnalyser is only used on the search query term being passed

also.

But In this case I am just adding a filter to the search.

The actual category may be

/Top/Books/Accountancy/10_Compliance/International

And the filter will allow me so filter search by subject area, i.e.

There is a drop down on the search page showing

Books

Online

Software

I translate this to a base category to then filter on. i.e. Books =

/Top/Books

It performs filter on search, so only those living in /Top/Books/*

categories (with *)

are returned.

I then want to use the actual category in the index to correctly display the

item.

Hope that makes sense.

Many thanks

Mark

On 6 Apr 2011, at 12:05, Ian Lea wrote:

Query subjectFilterQuery = new TermQuery(new

Term("category_path","/Top/Books*"));

Try losing the asterisk. Presumably the indexed term is "/Top/Books".

You don't appear to be using the StandardAnalyzer you create in your

code sample but if you did searches wouldn't work since "/Top/Books"

would certainly be lower cased. You've got /TOP/CD in there as well -

don't know where that came from. Unless you want case-sensitive

searching I'd downcase these paths in advance, and replace / with some

character that will be ignored by any analyzers you might be using.

--

Ian.

On Wed, Apr 6, 2011 at 11:33 AM, Mark Wiltshire

<m...@redalertconsultants.co.uk> wrote:

Thanks Ian,

I have managed to do that and through Luke I get My expected results.

Here is now my Index Code.

StringTokenizer st = buildSubjectArea(dbConnection, oid);

int tokenCount = 0;

while (st.hasMoreTokens()){

tokenCount++;

String categoryPath = st.nextToken();

if (categoryPath.length() != 0) {

////doc.add(Field.Text("category", category));

doc.add(new

Field("category_path",categoryPath,Field.Store.YES,Field.Index.NOT_ANALYZED));

}

Now using Luke with KeyworkAnalyser if I enter

category_path:/Top/My Prods*

I get my expected results back.

But I cannot get this working in my search code.

I am using this field to filter the results, i.e.

If I want Top Books I want to filter by /Top/Books*, if I want Top CD's I

want to filter by /Top/CD*

Filter string is generated from mapping file which gives me a category path

e.g. /Top/Books

Searcher searcher = new IndexSearcher(FSDirectory.open(new File(index)));

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);

...

Query subjectFilterQuery = new TermQuery(new

Term("category_path","/Top/Books*"));

QueryWrapperFilter filter = new QueryWrapperFilter(subjectFilterQuery);

TopDocs searchResult =

searcher.search(query,filter,MAX_SEARCH_RESULTS_SIZE);

If I debug the subjectFilterQuery and write out

subjectFilterQuery.toString()

I See

"subjectFilterQuery.toString()" category_path:/TOP/CD*

So it looks like the query is constructed correctly ?

But this does not bring back any results ?

You say I need to be consistent in Index and Query, am I missing something.

Many thanks

Mark

On 6 Apr 2011, at 10:06, Ian Lea wrote:

You can add multiple values for a field to a single document.

Document doc = new Document();

String[] paths = whatever.split(",");

for (String p : paths) {

doc.add(new Field("path", p, whatever ...);

}

For searching, assuming you only want to be able to wildcard on path

delimiters, you could index

/Top/My Prods/Book Prods/Text Books

/Top/My Prods/Book Prods

/Top/My Prods

/Top

which would let you search on any of them.

You'll want to pick or build an analyzer that behaves as you want wrt

case matching and not splitting on the /. Sometime it can be easier

to replace a character e.g. / to _. I think there is a lucene class

that can do that, maybe MappingCharFilter, if you don't want to do it

yourself. You will of course need to be consistent and do the same

processing at index and search time.

--

Ian.

On Wed, Apr 6, 2011 at 7:55 AM, Mark Wiltshire

<m...@redalertconsultants.co.uk> wrote:

To add more information

I am then wanting to search this field using part or all of the path

using wildcards

i.e.

Search category_path with /Top/My Prods*

Hi java-users

I need some help.

I am indexing categories into a single field category_path

Which may contain items such as

/Top/Books,/Top/My Prods/Book Prods/Text Books,

/Maths/Books/TextBooks

i.e. category paths delimited by ,

I want to store this field, so the Analyser tokenizes the document

only on ',' charaters and not on the '/' characters

I am adding the field to the index using

Where the categoryPath is a String containing list of the items

above.

doc.add(new

Field("category_path",categoryPath,Field.Store.YES,Field.Index.ANALYZED));

I think I need to split the string my self, but how do I pass this to

Lucene, do I have to setup different fields ?

I need to keep the full path in the index, as I want to use this when

redirecting users, when clicking on the results.

Any help would be great.

Many thanks

Regards

Mark

---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Regards

Mark

You can view my current Wolters Kluwer tasks at:

https://secure.nozbe.com/nozbe/page/feed/get-markRAC.5625.wolters_kluwer

You can view my calendar at: You can subscribe to my calendar at:

http://ical.me.com/markjwiltshire/Work webcal://ical.me.com/markjwiltshire/Work.ics