Re: luke and chinese text

2011-12-22 Thread Andrzej Bialecki
the Settings menu a font that supports Unicode characters, the default platform font often doesn't support them, which results in '?' or other strange characters. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__

[ANN] Luke 3.5.0 released

2011-12-28 Thread Andrzej Bialecki
s and a happy New Year to you all! :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info

Re: Tamper resistant index

2012-01-09 Thread Andrzej Bialecki
ed approach first, because it's easy to implement, and then see if it's good enough. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embe

Re: delete entries from posting list Lucene 4.0

2012-03-19 Thread Andrzej Bialecki
regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigra

Re: delete entries from posting list Lucene 4.0

2012-03-29 Thread Andrzej Bialecki
se the doc enumeration and calculate the total number of term occurrences in all documents (e.g. in RIDFTermPruningPolicy.initPositionsTerm(..) ), and use this value in the formula in place of termPositions.freq(). -- Best regards, Andrze

Re: delete entries from posting list Lucene 4.0

2012-04-02 Thread Andrzej Bialecki
On 29/03/2012 11:14, Andrzej Bialecki wrote: The problem in our implementation is that we use a within-document term frequency (the number of occurrences of t in the current document) and not a collection-wide term frequency... so, it looks to me that the fix would be to first fully traverse

Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index

2012-04-26 Thread Andrzej Bialecki
age as far as I know. LUCENE-3837, to be specific. But as you said, it's still early and there is no code yet to speak of... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| I

Re: lucene algorithm ?

2012-04-26 Thread Andrzej Bialecki
ct is lower than the current lowest score. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Co

Re: Index pruning

2012-06-13 Thread Andrzej Bialecki
of field:term pairs. -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Co

[ANNOUNCE] Luke 4.0.0-ALPHA released

2012-07-17 Thread Andrzej Bialecki
g.Stires) * Issue 19: Custom directory implementation must be inherited from FSDirectory (mitja.lenic) * Issue 21: luke tarball needs to extract to a "luke" directory (bevan.koopman, Photodeus) * Issue 27: Cannot add or edit documents using StandardAnalyzer (dean.thrasher) Thanks to

Re: Lucene 4.0 .FDT

2012-07-19 Thread Andrzej Bialecki
ds. The question is whether the space savings would be worth the complication? -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [___||.__|__/|__||\/|: Information

Re: Problem with TermVector offsets and positions not being preserved

2012-07-27 Thread Andrzej Bialecki
at shows a term vector correctly shows positions and offsets if available (or blanks if not available). -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [_

Re: Getting terms from unstored fields, doc-wise

2012-07-27 Thread Andrzej Bialecki
re it, either using stored fields or in an external system. -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||.

Lucene 4 architecture - paper available

2012-10-09 Thread Andrzej Bialecki
e you enjoy the reading. :) -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact

Re: [ANNOUNCE] Wiki editing change

2013-03-25 Thread Andrzej Bialecki
AndrzejBialecki to the ContributorsGroup. Thanks! -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __<>< [___||.__|__/|__||\/|: Information Retrieval, System In

Re: How to avoid sharing docStore files?

2010-05-12 Thread Andrzej Bialecki
ks. However, even this new tool will make a copy of the original index, so you will need twice as much space. But in this case perhaps you could put the original index on a network FS, and split it into the target partition - the

Re: Access indexed terms

2010-05-14 Thread Andrzej Bialecki
ke/DocReconstructor.java If you really need such kind of access in your application then add your documents with term vectors with offsets and positions. Even then, depending on the Analyzer you used, the process is lossy - some input data that was discarded by Analyzer is simply no longer available

Re: Access indexed terms

2010-05-14 Thread Andrzej Bialecki
Is there an alternative way to do > that? Yes, see the discussion here: https://issues.apache.org/jira/browse/LUCENE-2393 -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic W

Re: Question about Field.setOmitTermFreqAndPositions(true)

2010-05-31 Thread Andrzej Bialecki
On 2010-05-31 10:54, Uwe Schindler wrote: > No. See also LUCENE-2048 (nice round number ;) ). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | E

Re: Document Order in IndexWriter.addIndexes

2010-06-29 Thread Andrzej Bialecki
it doesn't rely on this behavior. You have been warned :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://ww

Re: Adding a new field to existing Index

2010-06-30 Thread Andrzej Bialecki
tent can be recovered. See the "Reconstruct & Edit" functionality in Luke (http://www.getopt.org/luke). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__

Re: Document Order in IndexWriter.addIndexes

2010-06-30 Thread Andrzej Bialecki
recorded in the output index. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contac

Re: Adding a new field to existing Index

2010-07-07 Thread Andrzej Bialecki
On 2010-07-07 14:49, Naveen Kumar wrote: Hi Andrzej Bialecki When you suggested - "There are some other low-level ways to do this, but the easiest is to use a FilterIndexReader, especially since you just want to add a stored field - implement a subclass of FilterIndexR

Re: Search one index but use IDF from another?

2011-03-10 Thread Andrzej Bialecki
->DF map with values obtained from the full index, and then you use this map to calculate IDF. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedd

[ANN] Luke 3.1.0 released

2011-04-29 Thread Andrzej Bialecki
ributing bug reports, patches and comments. Happy Luke-ing! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www

Re: Changing Boosting that was set at indexing time

2011-06-16 Thread Andrzej Bialecki
ectly using IndexReader.setNorm(...) but you need to remember that this method uses raw byte values, that is the result of encoding a floating point value with Similarity.encodeNormValue(..). -- Best regards, Andrzej Bia

Re: Coloring search results based on score?

2011-06-16 Thread Andrzej Bialecki
more details: http://people.ischool.berkeley.edu/~hearst/research/tilebars.html -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www

[ANN] Luke 3.3.0 released.

2011-07-06 Thread Andrzej Bialecki
Hi all, Luke 3.3.0 has been released and is available for download here: http://code.google.com/p/luke/ Apart from the updated Lucene libraries there were no changes in functionality. -- Best regards, Andrzej Bialecki

[ANN] Luke 3.4.0 release

2011-10-03 Thread Andrzej Bialecki
APIs. * Rearranged "field flags" so that they are more logical and cover index options added in 3.4.0. E.g. omitNorms is represented as "with Norms" and marked by "N", IndexOptions are expanded to "Idfp" to mark indexed fields with docs, freqs and

Re: [ANN] Luke 3.4.0 release

2011-10-03 Thread Andrzej Bialecki
some lesson to learn from this situation... I committed a fix, and the updated release is marked as 3.4.0_1. Sorry! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Andrzej Bialecki
find a ranked list of documents that have the smallest bit-level distance in their hashes from the query hash. The solution is described in SOLR-1918 - Bit-wise scoring field type. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ _

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Andrzej Bialecki
On 31/10/2011 21:42, Petite Abeille wrote: On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote: similarity-preserving hash function was calculated on each sentence, and the hash was added as a field. The property of the hash was that similar documents (sentences) would produce a similar

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
ts only omitNorms. So the flags are set now like this: isIndexed = true; isTokenized = true; omitNorms = true; The end result of processing such a field is (I believe) conceptually equivalent to adding as many Fields as there are tokens, each with omitNorms=true. -- Best

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Otis Gospodnetic wrote: So in other words, it *is* possible to have the field both tokenized and its norms omitted? Yes. Probably this is an unintended side-effect of adding setOmitNorms, but I think it's useful and IMHO we should keep it. -- Best regards, Andrzej Bia

Re: boost freshness instead of sorting

2008-08-28 Thread Andrzej Bialecki
or each "1".) We are discussing the same thing in "Case sensitivity" thread - it's possible to have a tokenized field and omit its norms. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [_

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at

Re: Pre-filtering for expensive query

2008-08-30 Thread Andrzej Bialecki
one during scoring and not afterwards. FilteredQuery internally makes use of skipTo(), which should help to limit the number of evaluated docs. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retr

Re: Pre-filtering for expensive query

2008-09-04 Thread Andrzej Bialecki
Grant Ingersoll wrote: On Aug 30, 2008, at 3:14 PM, Andrzej Bialecki wrote: I think you can use a FilteredQuery in a BooleanClause. This may be faster than the filtering code in the Searcher, because the evaluation is done during scoring and not afterwards. FilteredQuery internally makes

Re: Sorting posting lists before intersection

2008-09-17 Thread Andrzej Bialecki
: ConjunctionScorer, lines 85-103 - pay attention to the comments there, it's not strictly a sort by frequency, rather by the sampled "sparseness". -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/

Re: Case Sensitivity

2008-09-19 Thread Andrzej Bialecki
vide static methods on Fieldable that test the validity of flag combinations with particular version of Lucene? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \|

Re: Sorting posting lists before intersection

2008-10-13 Thread Andrzej Bialecki
n it won't cause any IO, otherwise it needs to read this info from the .ti file. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System

Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
index to a new format, incompatible with earlier versions of Lucene (including 2.4 release). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
Andrzej Bialecki wrote: 1) Luke 2.4 release. This has the advantage of being an official stable [...] 2) Luke 2.9-dev snapshot. This has the advantage that you get the [...] Of course I meant Lucene 2.4 and Lucene 2.9-dev ... sorry for the confusion. -- Best regards, Andrzej Bialecki

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
ss someone else does it it's simply not going to happen. All code in Luke except for the Thinlet class is under Apache License, so feel free to start coding :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
- we can include this in the proposals for the next summer. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http:

Re: Read all the data from an index

2008-10-31 Thread Andrzej Bialecki
ntent of these deleted documents, call first IndexReader.undeleteAll(). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration htt

[ANN] Luke 0.9 released

2008-11-13 Thread Andrzej Bialecki
, although I tested all functionality to make sure that there is no data loss. HOWEVER, if you work with precious data, it's always a good idea to use the "Read-only" option. As usually, bug reports or suggestions for improvements, or even better patches, are welcome!

Re: [ANN] Luke 0.9 released

2008-11-14 Thread Andrzej Bialecki
but in practice Luke directly accesses the underlying Directory in many other places ... I forgot about the use of IndexFileDeleter - and indeed passing the read-only flag here can solve this, because then I can always use KeepAllDeletionPolicy when opening read-

Re: [ot] a reverse lucene

2008-11-23 Thread Andrzej Bialecki
. (with a score etc) I can see the case for this would be a news-article and several people writing queries to get alerted if it matched a certain condition. http://www.seas.upenn.edu/~svilen/publications/subscribe.pdf -- Best regards, Andrzej Bialecki

[ANN] Luke 0.9.1 - bugfix release

2008-11-23 Thread Andrzej Bialecki
ll commits" option was specified. Reported by Mark Harwood. o Empty index with no fields was reported as invalid. Discovered by Andrew Zhang and Michael McCandless (LUCENE-1454). Thank you! -- Best regards, A

Re: StandardAnalyzer vs KeywordAnalyzer in Luke

2008-12-02 Thread Andrzej Bialecki
n turn _require_ the presence of a common-grams.utf8 resource on the classpath. To summarize: unless you want to get your hands dirty with Luke internals it can't be done. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __

Re: Document.getBinaryValue returning null after upgrading to 2.4 for the data which was indexed using 2.3.1

2008-12-16 Thread Andrzej Bialecki
g 2.4 , the search worked fine using 2.4. Any ideas why this is happening. No idea - but perhaps this is somehow related: https://issues.apache.org/jira/browse/LUCENE-1452 -- Best regards, Andrzej Bia

Re: Document.getBinaryValue returning null after upgrading to 2.4 for the data which was indexed using 2.3.1

2008-12-16 Thread Andrzej Bialecki
versions of Lucene involved. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: inf

Re: Determining index term count

2009-01-07 Thread Andrzej Bialecki
formation this way would be messy - it's better to propose that this information should be added to API. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| |

Re: Luke site is down?

2009-03-04 Thread Andrzej Bialecki
Hi all, I apologize for the inconvenience - the site went down without any prior notice from the ISP. I'm investigating the issue ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: IndexSearcher

2009-03-08 Thread Andrzej Bialecki
liat oren wrote: Ok, thanks. I will have to edit the code of Luke in order to add another analyzer, right? No - if your analyzer is already on the classpath, then it's enough to type in the fully qualified class name in the drop down box (it's editable). -- Best regards, Andrze

Re: IndexSearcher

2009-03-09 Thread Andrzej Bialecki
the classpath when you start Luke. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info

Re: boosting query

2009-03-19 Thread Andrzej Bialecki
plement an arbitrary re-sorting of top-N results, according to your rules of preference (business rules, or heuristics). This way you can avoid the overfitting or doing endless tweaking, and still get the ranking that makes sense to your users. -- Best regards, Andrze

[ANN] Luke 0.9.2 release

2009-03-19 Thread Andrzej Bialecki
ounts per field in Overview - contributed by Mark Harwood. o Improved the Analysis plugin to show all token information, and highlight whenever a token is selected from the list. * Bug fixes: o (None) -- Best regards, Andrzej Bia

Re: [ANN] Luke 0.9.2 release

2009-03-20 Thread Andrzej Bialecki
Andrzej Bialecki wrote: (sorry for cross-posting) Hi all, I'm happy to announce a new release of Luke, the Lucene Index Toolbox. As usually, you can obtain it from here: http://www.getopt.org/luke If you tried to access this url during last couple hours the site was down. It s

Re: Index Partitioning

2009-03-22 Thread Andrzej Bialecki
. * repeat the cycle as many times as needed A more elegant version of this algorithm can be implemented using FilterIndexReader. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retr

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-10 Thread Andrzej Bialecki
space. (Actually: does CheckIndex warn about unused files in the index directory so people can clean them up? i'm not sure) It doesn't. But Luke has a function to do this. -- Best regards, A

Re: Index in text format

2009-04-24 Thread Andrzej Bialecki
(http://www.getopt.org/luke) can export all stored fields from all documents into an XML file. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, Syst

Re: Lucene Index Encryption

2009-05-10 Thread Andrzej Bialecki
rg/jira/browse/LUCENE-532 -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.co

[ANN] Luke + Hadoop, alpha version

2009-07-10 Thread Andrzej Bialecki
welcome - please keep in mind that this is an early preview. Also, various UI glitches are probably related to the Thinlet toolkit - again, one day I may re-write Luke using something else, but for now I don't have the strength to do it.

Re: Weird discrepancy with term counts vs. terms (off by 1)

2009-08-02 Thread Andrzej Bialecki
atch to Andrzej, the author of Luke. Thank you Phil for spotting this bug - this fix will be included in the next release of Luke. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Sem

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread Andrzej Bialecki
nized" version of the field. At this point any potential mismatch in query terms vs. analyzed tokens in the field should become apparent. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Se

Lucene Search Performance Analysis Workshop

2009-08-26 Thread Andrzej Bialecki
rsday, September 3rd 2009 11:00-11:30AM PDT / 14:00-14:30 EDT Follow this link to sign up: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP About: Lucene Performance Workshop: Understanding Lucene Search Performance with Andrzej Bialecki Experi

Re: Lucene gobbling file descriptors

2009-08-27 Thread Andrzej Bialecki
e-for-Lucene -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at

[ANN] Luke 0.9.9 release

2009-09-29 Thread Andrzej Bialecki
Chris Pimlott and others. Enjoy! :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.co

Re: [ANN] Luke 0.9.9 release

2009-10-01 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Hi all, I'm happy to announce the new release of Luke - the Lucene Index Toolbox. There's a bug in this version in that it doesn't show TermVectors for a field. I'll fix it in a few days - I'm waiting for other potential bugs to show up. So i

Re: Question about how to speed up custom scoring

2009-10-08 Thread Andrzej Bialecki
while) that if the terms you load are indexed that'll help. But this is mostly a guess. Just to clarify: IndexReader.document(doc) and .document(doc, selector) load _only_ stored fields, they don't interact at all with the terms-related part of Lucene.. -- Best

Re: [ANN] Luke 0.9.9 release

2009-10-23 Thread Andrzej Bialecki
yourself ;) Keyboard shortcuts are hardcoded somewhere deep in Thinlet, but likely they could be made configurable. You can find an EPS version of the Lucene logo here: http://lucene.apache.org/images/logo.eps

Re: Split single string into several fields?

2009-10-28 Thread Andrzej Bialecki
d you can even create other fields in the document (or split this token stream into several fields). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| ||

[ANN] Luke 0.9.9.1 release

2009-11-20 Thread Andrzej Bialecki
ability to edit per-commit user data Map Bug fixes - * Term frequency vectors were not displayed for selected field. Enjoy! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retr

Re: document with different index time boost returns same score

2009-12-18 Thread Andrzej Bialecki
that this encoding causes (and what input values effectively come out the same, once encoded). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Em

[ANN] Luke 1.0.0 for Lucene 3.0

2009-12-26 Thread Andrzej Bialecki
tween Lucene 2.9.1 and 3.0. Your feedback is welcome - please use the Google Issue tracker to report issues. Merry Christmas! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: [ANN] Luke 1.0.0 for Lucene 3.0

2009-12-26 Thread Andrzej Bialecki
existent fall back to the zero-arg ctor. > > I'll open an issue. Indeed - thanks! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix

Re: Field creation with TokenStream and stored value

2010-01-13 Thread Andrzej Bialecki
t your own Fieldable, and return what you want from its methods. You can also use Field constructor that takes the stored value, and then use Field.setTokenStream(TokenStream) - it doesn't override the stored value. -- Best regards, A

Re: Do deleted documents affect scores?

2010-02-11 Thread Andrzej Bialecki
merge segments). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact

Re: SpanQueries in Luke

2010-03-04 Thread Andrzej Bialecki
this parser out of the box. I expect to make a release within a few days. Watch the commits on the Google code project ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval

Re: SpanQueries in Luke

2010-03-04 Thread Andrzej Bialecki
gards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at

Re: SpanQueries in Luke

2010-03-05 Thread Andrzej Bialecki
textarea. I'll commit the current mostly-working state today, you can take a look - you've written some cool Luke plugins before .. ;) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Informa

Re: SpanQueries in Luke

2010-03-05 Thread Andrzej Bialecki
ally one could store such information in IndexCommit.getUserData(). The lack of standardized metadata is an issue, of course - we could start experimenting with this in Luke, to see whether we can squeeze a subset of Solr schema there. -- Best

[ANN] Luke - The Lucene Index Toolbox - 1.0.1 release

2010-04-01 Thread Andrzej Bialecki
lyzer plugin (and analyzers) don't work. * Issue 4 : Compress flag no longer available. * Issue 14 : Error while using custom similarity. Enjoy! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information

Re: How to rename fields in an index

2007-08-22 Thread Andrzej Bialecki
trouble ;) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigr

Re: How to rename fields in an index

2007-08-22 Thread Andrzej Bialecki
luable enough to do ... Alternatively, we could just take this code and add it to IndexReader.renameField(String old, String new) ... ;) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Informat

Re: How to rename fields in an index

2007-08-23 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Dear Andrzej Bialecki Can we change the field name in *.fnm directly by hand? Yes, but you need to be consistent about it, i.e. change it the same way for every segment that the index consists of. Also, fnm files are binary files, so you need to know the format

Re: Sorted Index

2007-10-27 Thread Andrzej Bialecki
l large-ish bins, and apply arbitrary sorting methods within each bin. Studies show that if you pick the right bin size, users will rarely look into the second and the following bins, so the task is reduced to the sorting of the first bin, e.g. 100 top scoring docs. -- Best regards, Andrz

Re: Document boost, is it working?

2007-10-31 Thread Andrzej Bialecki
a function of the current index format - perhaps in the future Lucene will be able to store these values separately using another type of storage. So far there was no pressing need to do this). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _

Re: Wikia search goes live today

2008-01-08 Thread Andrzej Bialecki
rrect? (I'm not involved in Wikia development). There are some ways to go about it even in the pure Lucene-land, so that the updates are fast without reindexing the main content. Hint: ParallelReader. -- Best regards, Andrze

Re: Wikia search goes live today

2008-01-08 Thread Andrzej Bialecki
Ryan McKinley wrote: Andrzej Bialecki wrote: Lukas Vlcek wrote: So staring will be accommodated only during indexing phase. Does it mean it will be pretty static value not a dynamically changing variable... correct? In other words if I add my starts to some document it won't affec

Re: Bucketing (was Re: Wikia search goes live today)

2008-01-09 Thread Andrzej Bialecki
simple to implement, yet produces useful results difficult to obtain through the usual means (similarity, boosting, even function query). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retr

Re: Lucene + Hadoop

2008-01-16 Thread Andrzej Bialecki
local filesystem first... Yes - see org.apache.nutch.indexer.FsDirectory. However, you will not like the performance, it's much slower than using the index locally. -- Best regards, Andrzej Bia

Re: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Andrzej Bialecki
ynced to the on-disk index), and start using the new IndexSearcher. And again, start accumulating new docs in the RAMDirectory, etc, etc ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information

FYI: parallel corpus in 22 languages

2008-01-24 Thread Andrzej Bialecki
://wt.jrc.it/lt/Acquis/ -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigra

Re: Performance guarantees and index format

2008-01-31 Thread Andrzej Bialecki
g more relevant. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact

Re: appending field to an existing index

2008-02-04 Thread Andrzej Bialecki
emented (yet?). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.co

Re: Performance guarantees and index format

2008-02-04 Thread Andrzej Bialecki
lass applicable to various scenarios. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System In

[ANN] Luke 0.8 released

2008-02-04 Thread Andrzej Bialecki
m an index. Instead this column now reads "Norms" and shows the fieldNorm value of a field. Have fun! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__||

  1   2   3   >