Re: Is Lucene a "document oriented database"?

2010-05-31 Thread Lukáš Vlček
Heh, the first link is broken, it should be http://lucene-eurocon.org/slides/From-Publisher-ToPlatform-the-Guardian_Stephen-Dunn.pdf";>link. Check for other conference slides here: http://lucene-eurocon.org/agenda.html On Tue, Jun 1, 2010 at 7:25 AM, Lukáš Vlček wrote: > There were nice presenta

Re: Is Lucene a "document oriented database"?

2010-05-31 Thread Lukáš Vlček
There were nice presentations from The Guardian folks at EuroCon this year about how they made their content available to the public using Solr (and they refer to noSQL model [not only SQL]). http://lucene-eurocon.org/slides/From-Publisher-ToPlatform-the-Guardian_Stephen-Dunn.pdf http://lucene-eur

Re: Using JSON for index input and search output

2010-05-31 Thread Otis Gospodnetic
VL, Solr (not Lucene, but you can embed Solr) has JsonUpdateRequestHandler, which lets you send docs to Solr for indexing in JSON (instead of the usual XML): http://search-lucene.com/c/Solr:/src/java/org/apache/solr/handler/JsonUpdateRequestHandler.java And you can get Solr to respond with JSON

Re: Is Lucene a "document oriented database"?

2010-05-31 Thread Otis Gospodnetic
I think those doc-oriented DBs tend to be distributed, with replication built-in and such, but yes, in some way the schemaless DB with docs and fields (whether they are pumped in as JSON or XML or Java objects) feels the same. I saw something from Grant about 2 months ago how Lucene is "nosql-i

Re: Grouping or de-duping

2010-05-31 Thread Otis Gospodnetic
Pasa, Maybe Field Collapsing (Solr) can help? See SOLR-236 in JIRA http://search-lucene.com/?q=field+collapsing&fc_project=Lucene&fc_project=Solr Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message --

Re: Solr tutorial

2010-05-31 Thread N Hira
I don't know of a single tutorial that puts it all together, but the "rich documents" feature implemented in Solr-284 would be where I would start: https://issues.apache.org/jira/browse/SOLR-284 Look here if you're using Solr 1.4 -- it should address your needs: http://wiki.apache.org/solr/Extra

Solr tutorial

2010-05-31 Thread sv
Hi, I am kind of struggling to setup Solr to search pdf files. I am following documents from lucidimagination and wiki. Can someone please point to a good Solr tutorial which involve step by step instrunctions to search/index pdf document, highlighting and snippting. Thanks in advance, Deepak

Re: Lucene Newbie Questions

2010-05-31 Thread N Hira
>From a legal/technical perspective, you can either embed Solr or you can use >it as a WebApp. I generally suggest that it be used as a separate WebApp, but >that depends. I would suggest the following criteria: 1. Fitness to use cases 2. Effort to develop/adapt 3. Ease of deployment 4. Eff

Re: Lucene Newbie Questions

2010-05-31 Thread Shashi Kant
Based on your description, I would recommend Solr. It provides several features such as spelling suggestion, faceting etc. OOTB. http://lucene.apache.org/solr/features.html should answer all your questions. On Mon, May 31, 2010 at 7:54 PM, Frank A wrote: > Thanks a bunch. > > Since I'm already

Re: Lucene Newbie Questions

2010-05-31 Thread Frank A
Thanks a bunch. Since I'm already inside a java based web application it would seem like both SOLR and Lucene would be plausible. I'm curious what other factors I should know about in determing if SOLR or Lucene is right for me. Can SOLR be used within a web application (as a library) or is it o

Re: Lucene Newbie Questions

2010-05-31 Thread N Hira
Frank -- Lucene can definitely do this stuff. This review of the Query Syntax might offer you some insight: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html Specifically, you can look up "Fuzzy Searches" and "Synonyms". There are a couple of key ways to handle synonyms, so you might

Re: Lucene Newbie Questions

2010-05-31 Thread Shashi Kant
You are certainly in the right place - Apache Solr (a search server built using Lucene) provides what you are looking for out of the box. On Mon, May 31, 2010 at 7:20 PM, Frank A wrote: > Hello all, > I'm considering Lucene for a specific application and am trying to ensure > that it is the righ

Lucene Newbie Questions

2010-05-31 Thread Frank A
Hello all, I'm considering Lucene for a specific application and am trying to ensure that it is the right tool for what I'm trying to accomplish. At a high level I have a list of restaurants in a database and a list of tags related to the restaurant (e.g. Italian, Formal, Expensive, etc). Each re

Grouping or de-duping

2010-05-31 Thread Паша Минченков
Sorry for my similar questions. I need to remove duplicates from search results for a given field (or group by). Documents on this field are not ordered. Which one will get duplicates in search results - I do not care. I tried to use DuplicateFilter and PerParentLimitedQuery, but they didn't help.

Re: Question about Field.setOmitTermFreqAndPositions(true)

2010-05-31 Thread Michael McCandless
TermVectors are not used for searching; they just store each doc, inverted. They allow you to retrieve all terms (and optionally their positions/offsets) for a given document. But this entails a seek, per-document, so it's fairly costly. Highlighters use term vectors because they are a good way

Is Lucene a "document oriented database"?

2010-05-31 Thread Shashi Kant
There seems to be considerable buzz on the internets about document oriented dbs such as MongoDB, CouchDB etc. I am at a loss as to what are the principal differences between Lucene and the "DODBs". I could very use Lucene as any of the above (schema-free, Document oriented) and perform similar que

PerParentLimitedQuery and index updating

2010-05-31 Thread Паша Минченков
Hi, It seems that PerParentLimitedQuery analyzes the old data before update. Here's an example. If remove documents updates - everything works. Thanks. public void testPerParent() throws IOException { dir = new RAMDirectory(); Analyzer analyzer = new StandardAnalyz

Re: phrase query highlighter spans matching

2010-05-31 Thread Koji Sekiguchi
(10/05/19 13:58), Li Li wrote: hi all, I read lucene in action 2nd Ed. It says SimpleSpanFragmenter will "make fragments that always include the spans matching each document". And also a SpanScorer existed for this use. But I can't find any class named SpanScorer in lucene 3.0.1. And the res

Re: Question about Field.setOmitTermFreqAndPositions(true)

2010-05-31 Thread Li Li
What about TermVector? it says in "lucene in action": Term vectors are something a mix of between an indexed field and a stored field. They are similar to a stored field because you can quickly retrieve all term vector fields for a given document: term vectors are keyed first by document ID. But t

vector model usage

2010-05-31 Thread Dionisis Koumouras
Hi all, I'm new to lucene but have used it succesfully for a few simple tasks. I am experimenting with the vector space representation of documents and have managed to store and retrieve TermFreqVector objects. The question is whether it is possible to directly add vector space representations of

Re: Question about Field.setOmitTermFreqAndPositions(true)

2010-05-31 Thread Andrzej Bialecki
On 2010-05-31 10:54, Uwe Schindler wrote: > No. See also LUCENE-2048 (nice round number ;) ). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, Sys

RE: Question about Field.setOmitTermFreqAndPositions(true)

2010-05-31 Thread Uwe Schindler
No. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Li Li [mailto:fancye...@gmail.com] > Sent: Monday, May 31, 2010 10:48 AM > To: java-user@lucene.apache.org > Subject: Question about Field.setOmitTermF

Question about Field.setOmitTermFreqAndPositions(true)

2010-05-31 Thread Li Li
I read in 'lucene in action" that to save space, we can omit termfreq and postion information. But as far as I know, lucene's default scoring model is vsm, which need tf(term,doc) to calcuate score. If there is no tf saved. Will the relevance score be correct? -

Re: DuplicateFilter question

2010-05-31 Thread Паша Минченков
Thanks. I do not mind the first or the last document. Most importantly, that in filtered documents there were no duplicates for a given field (in fact I need to group the filtered results to the specified field). Trying to use PerParentLimitingQuery and NestedDocumentQuery. ---

Re: DuplicateFilter question

2010-05-31 Thread Mark Harwood
The DuplicateFilter passed to the searcher does not have visibility of the text query and is therefore evaluated independently from all other criteria. Sounds like the behaviour you want is to get the last duplicate that also matches your criteria, which seems like something fairly common to need

Re: DuplicateFilter question

2010-05-31 Thread Паша Минченков
df (DuplicateFilter) is the second parameter in the searcher.search metod. >> ScoreDoc[] hits = searcher.search(q, df, 1000).scoreDocs; This varians doesn't hit too: ScoreDoc[] hits = searcher.search(new FilteredQuery(tq, df), new QueryWrapperFilter(new TermQuery(new Term("text", "now"))), 1000).s