Re: Max Field Length

2022-09-23 Thread Michael Sokolov
ooh On Fri, Sep 23, 2022 at 11:02 AM Adrien Grand wrote: > > We have a TruncateTokenFilter in lucene/analysis/common. :) > > On Fri, Sep 23, 2022 at 4:39 PM Michael Sokolov wrote: > > > I wonder if it would make sense to provide a TruncationFilter in > > addition to the LengthFilter. That way lo

Re: Max Field Length

2022-09-23 Thread Adrien Grand
We have a TruncateTokenFilter in lucene/analysis/common. :) On Fri, Sep 23, 2022 at 4:39 PM Michael Sokolov wrote: > I wonder if it would make sense to provide a TruncationFilter in > addition to the LengthFilter. That way long tokens in source text > could be better supported, albeit with some

Re: Max Field Length

2022-09-23 Thread Michael Sokolov
I wonder if it would make sense to provide a TruncationFilter in addition to the LengthFilter. That way long tokens in source text could be better supported, albeit with some confusion if they share the same very long prefix... On Fri, Sep 23, 2022 at 9:56 AM Scott Guthery wrote: > > Thanks much,

Re: Max Field Length

2022-09-23 Thread Scott Guthery
Thanks much, Adrian. I hadn't realized that the size limit was on one token in the text as opposed to being a limit on the length of the entire text field. I'm loading patents, so I suspect that the very long word is a DNA sequence. Thanks also for your guidance with regard to setting maximums.

Re: Max Field Length

2022-09-23 Thread Adrien Grand
Hi Scott, There is no way to lift this limit. The assumption is that a user would never type a 32kB keyword in a search bar, so indexing such long keywords is wasteful. Some tokenizers like StandardTokenizer can be configured to limit the length of the tokens that they produce, there is also a Len

Max Field Length

2022-09-22 Thread Scott Guthery
Lucene 9.3 seems to have a (post-Analyzer) maximum field length of 32767. Is there a way of increasing this without resorting to the source code? Thanks for any guidance. Cheers, Scott

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor
On Tue, Jan 12, 2010 at 7:53 AM, Paul Taylor > wrote: Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS when indexing fields that arent tokenized, but later says norms are used to boost fields with less /single term, so matches based

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor
Erick Erickson wrote: Are you saying that you index the *same* field differently in different documents? Or do you index the field in question in the same way in all documents? Same way in all documents I ask because I'm having a hard time following the logic here. A field that is NOT analyzed

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Erick Erickson
Are you saying that you index the *same* field differently in different documents? Or do you index the field in question in the same way in all documents? I ask because I'm having a hard time following the logic here. A field that is NOT analyzed is an all-or-none match, i.e. looking for "paul" in

NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor
Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS when indexing fields that arent tokenized, but later says norms are used to boost fields with less /single term, so matches based on these single term fields would miss out on this boost. Is there a way to use NOT_ANALYSED_NO_NORM

Re: Max Field Length

2005-05-06 Thread Bill Tschumy
On May 6, 2005, at 4:42 PM, Ernesto De Santis wrote: Hi Exist a max length for a Field value? I have problems indexing large body files. The bottom isn't indexed. Bye, Ernesto. -- Ernesto De Santis - Colaborativa.net Córdoba 1147 Piso 6 Oficinas 3 y 4 (S2000AWO) Rosario, SF, Argentina. After you

Re: Max Field Length

2005-05-06 Thread Luke Shannon
2005 5:42 PM Subject: Max Field Length > Hi > > Exist a max length for a Field value? > I have problems indexing large body files. > The bottom isn't indexed. > > Bye, > Ernesto. > > -- > Ernesto De Santis - Colaborativa.net > Cór

Max Field Length

2005-05-06 Thread Ernesto De Santis
Hi Exist a max length for a Field value? I have problems indexing large body files. The bottom isn't indexed. Bye, Ernesto. -- Ernesto De Santis - Colaborativa.net Córdoba 1147 Piso 6 Oficinas 3 y 4 (S2000AWO) Rosario, SF, Argentina.