Unfortunately, we don't have a designated field for product identifiers,
and the product identifiers are from various manufacturers. So it is
hard to normalize product keys, as we can't distinguish them from other
parts of the document.
Examples are xbox 360 (which might be searched as xbox360)
My suggestion wasn't to store/index the triplets, just a normalized
version of the product key. So if you had
id: CRXUSB2.0-16GB
desc: some 16GB USB thing
you'd index, in your searchable words field, "CRXUSB2.016GB some 16GB USB thing"
And then at search time you'd take "CRX USB2.0-16G" and nor
Hi,
> Has somebody ever tried something like this? Is there a way to do this
without
> increasing the index to about 15 times (1+2+3+4+5) its original size?
The index will not have 15 times the size as it is inverted index and only
indexes the unique parts of your tokens. In most cases it will ha
java-user@lucene.apache.org
> Subject: Re: Indexing product keys with and without spaces in them
>
> Hi Aditya,
>
> Thank you for your suggestion!
> Unfortunately, this is not possible, as there is no common format for all
product
> keys. The products are not ours nor are they al
Hi Ian,
thank you for your reply.
Unfortunately this will be hard, as we have no way of knowing at which
position the user might enter spaces, so we cannot expand the product
keys at indexing time.
The other way round (triplets without spaces or hyphens) might work,
however we have no real
Hi Aditya,
Thank you for your suggestion!
Unfortunately, this is not possible, as there is no common format for
all product keys. The products are not ours nor are they all from the
same manufacturer, so we don't have any influence on how the product
keys look like.
Regards,
Christoph
On 03
Hi Christoph
My opinion is, you should not normalize or do any modification to the
product keys. This should be unique. Should be used as it is. Instead of
spaces you should have only used "-" but since the product already out in
the market, it cannot help.
In your UI, You could provide multiple
When indexing you could normalise them down to a standard format
without spaces or hyphens, but searching is much harder if you really
can't identify possible product ids within user queries. Make
triplets without spaces or hyphens? "CRX USB-2.0 16GB" ==>
CRXUSB2.016GB but also "some random words
Hello,
we use lucene as search engine in an online shop. The products in this
shop often contain product keys like CRXUSB2.0-16GB.
We would like our customers to be able to find products by entering
their key. The problem is that product keys sometimes contain spaces or
dashes and customers so