But for this, you need a skillfully designed:
- set of fields
- multiplexing analyzer
- query expansion
In one of my projects, we do not split language by fields and it's a pain...
I'm having recurring issues in one sense or the other.
- the "die" example that Oti s mentioned is a good one: stop-
> 1) Docs in different languages -- every document is one language
> 2) Each document has fields in different languages
We mainly have 1)-models
Clemens
> -Ursprüngliche Nachricht-
> Von: Shai Erera [mailto:ser...@gmail.com]
> Gesendet: Dienstag, 18. Januar 2011 20:28
> An: java-user@luce
I think we should be using lucene with snowball jar's which means one
index for all languages (ofcourse size of index is always a matter of
concerns).
Hope this helps.
-vinaya
On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote:
What is the "best practice" to support multiple languages, i
Where do you get your Lucene/Solr downloads from?
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[X] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
>website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> d
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
On Jan 18, 2011, at 2:24 PM, Glen Newton wrote:
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via a
Mark your calendars today! The largest worldwide conference dedicated to Lucene
and Solr will take place in the San Francisco/Bay Area May 25-26.
The 2011 conference will build on the success of last year's Lucene Revolution
in Boston. Sponsored by Lucid Imagination with additional su
Sincerely,
Sithu D Sudarsan
Grant Ingersoll wrote:
> Where do you get your Lucene/Solr downloads from?
>
> [x] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [x] I/we build them from
[] ASF Mirrors (linked in our release announcements or via the Lucene website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a downstream
project)
--
Grant Ingersoll wrote:
> Where do you get your Lucene/Solr downloads from?
>
> [x] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [x] I/we build them from source via an SVN/Git checkout.
>
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downst
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
2011
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a downstream
On Tue, Jan 18, 2011 at 3:04 PM, Grant Ingersoll wrote:
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from sourc
> [] ASF Mirrors (linked in our release announcements or via
> the Lucene website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy,
> Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally
> or via a downst
On 18.01.2011, at 22:04, Grant Ingersoll wrote:
> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really
> don't have a good sense of how people get Lucene and Solr for use in their
> application. Because of this, there has been some talk of dropping Maven
> support for Luc
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
--
> Where do you get your Lucene/Solr downloads from?
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your
Where do you get your Lucene/Solr downloads from?
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors
[x] ASF Mirrors (linked in our release announcements or via the Lucene website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
On Tue, Jan 18, 2011 at 1:24 PM, Glen Newton wrote:
> Where do you get your Lucene/Solr
On Tue, 18 Jan 2011 22:04:01 +0100, Grant Ingersoll
wrote:
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in y
> Where do you get your Lucene/Solr downloads from?
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
--ewh
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional com
> [x] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [x] I/we build them from source via an SVN/Git checkout.
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in you
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downst
Where do you get your Lucene/Solr downloads from?
[x] ASF Mirrors (linked in our release announcements or via the Lucene website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
-Glen Newton
--
-
--
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
---
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
And here's mine:
On Jan 18, 2011, at 4:04 PM, Grant Ingersoll wrote:
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [x] I/we bui
Where do you get your Lucene/Solr downloads from?
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirro
As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really don't
have a good sense of how people get Lucene and Solr for use in their
application. Because of this, there has been some talk of dropping Maven
support for Lucene artifacts (or at least make them external). Before we
Hi
There are two types of multi-language docs:
1) Docs in different languages -- every document is one language
2) Each document has fields in different languages
I've dealt with both, and there are different solutions to each. Which of
them is yours?
Shai
On Tue, Jan 18, 2011 at 7:53 PM, Cleme
Hi Clemens,
If you will be searching individual languages, go with language-specific
indices. Wunder likes to give an example of "die" in German vs. English. :)
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Orig
What is the "best practice" to support multiple languages, i.e.
Lucene-Documents that have multiple language content/fields?
Should
a) each language be indexed in a seperate index/directory or should
b) the Documents (in a single directory) hold the diverse localized fields?
We most often will b
Hi Shai,
What I really wanted to do was reduce the frq file size
Oddly (when tokenizing 3 seperate fields) with the
WhitespaceTokenizer, more terms are produced than with the CJK
analyzer and the CJK frq filesize is much larger ... examples below:
with WhitespaceTokenizer:
89M _0.tis
HI Ian & Umesh.
This is what I was looking for.
Thank a lot.
Regards,
Lahiru
If I understand correctly, you compare the size of the .frq when
WhitespaceTokenizer is used, vs the CJK ones?
I'd bet this is because WhitespaceTokenizer creates far less terms than the
CJK one. Whitespace tokenizes the text by separating on whitespace, while
CJK does sort of N-Gram tokenization,
Hi Lahiru,
Comments are inline:
On Tue, Jan 18, 2011 at 5:42 PM, Lahiru Samarakoon wrote:
> Dear All,
>
> I have two documents. The analyzed and the tokenized contents are
> mentioned
> below.
>
> *Document 1 :*
>
> *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard
See what Searcher.explain() says for each hit. I don't think that word
order will matter with the query you give. There are several factors
in scoring - see oal.search.Similarity or google lucene scoring.
Or have a play with Luke: invaluable for investigating things with
lucene and will tell you
Hi,
We're trying to create a large index via solr for trends and notice
that we have a large '.frq' file after doing the following:
make all text fields index="true", stored="false",
omitTermFreqAndPositions="true" omitNorms="true" termPositions="false"
termOffsets="false" termVectors="false"
W
Dear All,
I have two documents. The analyzed and the tokenized contents are mentioned
below.
*Document 1 :*
*when*, null_1, *my*, null_1, money,
fund, amount, payment, creditcard, credit,
card, *bank, account*, debit, deduct,
*charge*, null_1, my, mobile, usage,
*service*, connection
*
42 matches
Mail list logo