Heh, the first link is broken, it should be http://lucene-eurocon.org/slides/From-Publisher-ToPlatform-the-Guardian_Stephen-Dunn.pdf";>link.
Check for other conference slides here:
http://lucene-eurocon.org/agenda.html
On Tue, Jun 1, 2010 at 7:25 AM, Lukáš Vlček wrote:
> There were nice presenta
There were nice presentations from The Guardian folks at EuroCon this year
about how they made their content available to the public using Solr (and
they refer to noSQL model [not only SQL]).
http://lucene-eurocon.org/slides/From-Publisher-ToPlatform-the-Guardian_Stephen-Dunn.pdf
http://lucene-eur
VL,
Solr (not Lucene, but you can embed Solr) has JsonUpdateRequestHandler, which
lets you send docs to Solr for indexing in JSON (instead of the usual XML):
http://search-lucene.com/c/Solr:/src/java/org/apache/solr/handler/JsonUpdateRequestHandler.java
And you can get Solr to respond with JSON
I think those doc-oriented DBs tend to be distributed, with replication
built-in and such, but yes, in some way the schemaless DB with docs and fields
(whether they are pumped in as JSON or XML or Java objects) feels the same. I
saw something from Grant about 2 months ago how Lucene is "nosql-i
Pasa,
Maybe Field Collapsing (Solr) can help? See SOLR-236 in JIRA
http://search-lucene.com/?q=field+collapsing&fc_project=Lucene&fc_project=Solr
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message --
I don't know of a single tutorial that puts it all together, but the "rich
documents" feature implemented in Solr-284 would be where I would start:
https://issues.apache.org/jira/browse/SOLR-284
Look here if you're using Solr 1.4 -- it should address your needs:
http://wiki.apache.org/solr/Extra
Hi,
I am kind of struggling to setup Solr to search pdf files. I am following
documents from lucidimagination and wiki. Can someone please point to a
good Solr tutorial which involve step by step instrunctions to search/index
pdf document, highlighting and snippting.
Thanks in advance,
Deepak
>From a legal/technical perspective, you can either embed Solr or you can use
>it as a WebApp. I generally suggest that it be used as a separate WebApp, but
>that depends. I would suggest the following criteria:
1. Fitness to use cases
2. Effort to develop/adapt
3. Ease of deployment
4. Eff
Based on your description, I would recommend Solr. It provides several
features such as spelling suggestion, faceting etc.
OOTB.
http://lucene.apache.org/solr/features.html
should answer all your questions.
On Mon, May 31, 2010 at 7:54 PM, Frank A wrote:
> Thanks a bunch.
>
> Since I'm already
Thanks a bunch.
Since I'm already inside a java based web application it would seem like
both SOLR and Lucene would be plausible. I'm curious what other factors I
should know about in determing if SOLR or Lucene is right for me.
Can SOLR be used within a web application (as a library) or is it o
Frank --
Lucene can definitely do this stuff. This review of the Query Syntax might
offer you some insight:
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
Specifically, you can look up "Fuzzy Searches" and "Synonyms". There are a
couple of key ways to handle synonyms, so you might
You are certainly in the right place - Apache Solr (a search server
built using Lucene) provides what you are looking for out of the box.
On Mon, May 31, 2010 at 7:20 PM, Frank A wrote:
> Hello all,
> I'm considering Lucene for a specific application and am trying to ensure
> that it is the righ
Hello all,
I'm considering Lucene for a specific application and am trying to ensure
that it is the right tool for what I'm trying to accomplish.
At a high level I have a list of restaurants in a database and a list of
tags related to the restaurant (e.g. Italian, Formal, Expensive, etc). Each
re
Sorry for my similar questions. I need to remove duplicates from search
results for a given field (or group by). Documents on this field are not
ordered. Which one will get duplicates in search results - I do not care. I
tried to use DuplicateFilter and PerParentLimitedQuery, but they didn't
help.
TermVectors are not used for searching; they just store each doc, inverted.
They allow you to retrieve all terms (and optionally their
positions/offsets) for a given document. But this entails a seek,
per-document, so it's fairly costly.
Highlighters use term vectors because they are a good way
There seems to be considerable buzz on the internets about document
oriented dbs such as MongoDB, CouchDB etc. I am at a loss as to what
are the principal differences between Lucene and the "DODBs". I could
very use Lucene as any of the above (schema-free, Document oriented)
and perform similar que
Hi,
It seems that PerParentLimitedQuery analyzes the old data before
update. Here's an example. If remove documents updates - everything
works. Thanks.
public void testPerParent()
throws IOException {
dir = new RAMDirectory();
Analyzer analyzer = new StandardAnalyz
(10/05/19 13:58), Li Li wrote:
hi all,
I read lucene in action 2nd Ed. It says SimpleSpanFragmenter will
"make fragments that always include the spans matching each document".
And also a SpanScorer existed for this use. But I can't find any class
named SpanScorer in lucene 3.0.1. And the res
What about TermVector? it says in "lucene in action":
Term vectors are something a mix of between an indexed field and
a stored field. They are similar to a stored field because you can
quickly retrieve all term vector fields for a
given document: term vectors are keyed first by document ID. But t
Hi all,
I'm new to lucene but have used it succesfully for a few simple tasks.
I am experimenting with the vector space representation of documents and
have managed to store and retrieve TermFreqVector objects.
The question is whether it is possible to directly add vector space
representations of
On 2010-05-31 10:54, Uwe Schindler wrote:
> No.
See also LUCENE-2048 (nice round number ;) ).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, Sys
No.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Li Li [mailto:fancye...@gmail.com]
> Sent: Monday, May 31, 2010 10:48 AM
> To: java-user@lucene.apache.org
> Subject: Question about Field.setOmitTermF
I read in 'lucene in action" that to save space, we can omit termfreq
and postion information. But as far as I know, lucene's default
scoring model is vsm, which need tf(term,doc) to calcuate score. If
there is no tf saved. Will the relevance score be correct?
-
Thanks. I do not mind the first or the last document. Most
importantly, that in filtered documents there were no duplicates for a
given field (in fact I need to group the filtered results to the
specified field). Trying to use PerParentLimitingQuery and
NestedDocumentQuery.
---
The DuplicateFilter passed to the searcher does not have visibility of the text
query and is therefore evaluated independently from all other criteria.
Sounds like the behaviour you want is to get the last duplicate that also
matches your criteria, which seems like something fairly common to need
df (DuplicateFilter) is the second parameter in the searcher.search metod.
>> ScoreDoc[] hits = searcher.search(q, df, 1000).scoreDocs;
This varians doesn't hit too:
ScoreDoc[] hits = searcher.search(new FilteredQuery(tq, df), new
QueryWrapperFilter(new TermQuery(new Term("text", "now"))),
1000).s
26 matches
Mail list logo