Hello,
Doesn't Lucene have a Tokenizer/Analyzer for Brown Corpus?
There doesn't seem to be such tokenizers/analyzers in Lucene.
As I didn't want re-inventing the wheel, so I googled, I got
the list of snippets that include "the quick brown fox..." :)
Koji
---
Thanks!
>Solr...falls back to wrapping with UninvertingReader
Thats's why my Solr UnitTests stayed green ;)
>But in general, you should really enable DocValues for fields you want to sort
>on
I will!
-Ursprüngliche Nachricht-
Von: Uwe Schindler [mailto:u...@thetaphi.de]
Gesendet: Montag
(I'm using Lucene 4.9.0)
I've been doing some perf testing of MemoryIndex, and have found that it is
much slower when a BooleanQuery contains a non-required clause, compared to
when it just contains required clauses.
Most of the time is spent in BooleanScorer, which as far as I can tell is an
I have solved a similar task of taking payload into account for fuzzy queries
12 февраля 2015 г. 2:58:10 GMT+06:00, Sheng пишет:
>fellas,
>
>I am wondering if it is possible to wrap payload query with
>customscorequery, so that one can tweak the search score with both
>payload
>similarity and a c
Hi,
Solr uses DocValues and falls back to wrapping with UninvertingReader, if user
have not indexed them (with negative startup performance and memory effects).
But in general, you should really enable DocValues for fields you want to sort
on.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-2
Hi,
The existence of the write.lock file has nothing to do with actual locking. The
lock file is just a placeholder file (0 bytes) on which the lock during
indexing is applied (the actual locking is done on this file using a fnctl).
When indexing finishes, the lock is removed from this file, bu
Hi,
Also Tokenizer no longer has a Reader in its ctor. Tokenizers are constructed
without any reader. To consume the TokenStream one has to set the reader using
setReader(Reader). Because of that createComponents does not need to get a
Reader, too.
The tokenStream method takes a String or Read
i use TermsQuery for creating a join query. the list of terms could be quite
large. e.g. million entries.
when this is the case, the IntroSorter sorting the terms becomes a performance
bottleneck.
could i use an other strategy or algorithm for building those joins on large
sets of terms?
an
Thanks for pointer. How does/did this change make its way into Solr?
-Ursprüngliche Nachricht-
Von: András Péteri [mailto:apet...@b2international.com]
Gesendet: Montag, 23. Februar 2015 14:13
An: java-user@lucene.apache.org
Betreff: Re: Lucene 4.x -> 5 : IllegalStateException while sortin
Thats why locking didnt work correctly back then.
On Mon, Feb 23, 2015 at 8:18 AM, Just Spam wrote:
> Any reason?
> I remember in 3.6 the lock was removed/deleted?
>
>
> 2015-02-23 14:13 GMT+01:00 Robert Muir :
>
>> It should not be deleted. Just don't mess with it.
>>
>> On Mon, Feb 23, 2015 at
Any reason?
I remember in 3.6 the lock was removed/deleted?
2015-02-23 14:13 GMT+01:00 Robert Muir :
> It should not be deleted. Just don't mess with it.
>
> On Mon, Feb 23, 2015 at 7:57 AM, Just Spam wrote:
> > Hello,
> >
> > i am trying to index a file (Lucene 4.10.3) – in my opinion in the
>
Hi Clemens,
I think this part of the release notes [1] applies to your case:
* FieldCache is gone (moved to a dedicated UninvertingReader in the misc
module). This means when you intend to sort on a field, you should index
that field using doc values, which is much faster and less heap consuming
It should not be deleted. Just don't mess with it.
On Mon, Feb 23, 2015 at 7:57 AM, Just Spam wrote:
> Hello,
>
> i am trying to index a file (Lucene 4.10.3) – in my opinion in the correct
> way – will say:
>
> get the IndexWriter, Index the Doc and add them, prepare commit, commit and
> finally{
Hello,
i am trying to index a file (Lucene 4.10.3) – in my opinion in the correct
way – will say:
get the IndexWriter, Index the Doc and add them, prepare commit, commit and
finally{ close}.
My writer is generated like so:
private IndexWriter getDataIndexWriter() throws CorruptIndexExcept
After upgrading to Lucene 5 one of my unittest which tests sorting fails with:
unexpected docvalues type NONE for field 'providertestfield' (expected=SORTED).
Use UninvertingReader or index with docvalues
What am I missing?
Got this one sorted out. I was still referencing the 4.x lucene-analyzers.jar
which required the reader ;)
Sorry for the noise!
-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
Gesendet: Montag, 23. Februar 2015 12:42
An: java-user@lucene.apache.org
Betreff:
My custom Analyzer had the following (Lucene 4) impl of createComponents:
protected TokenStreamComponents createComponents ( final String fieldName,
final Reader reader )
{
Tokenizer source = new KeywordTokenizer( reader );
TokenStream
17 matches
Mail list logo