Hi all,
I'm organising another open source search social evening (OSSSE?) in
London on Monday the 15th of June, this time with the able assistance
of René Kriegler.
The plan is to get together and chat about search technology, from
Lucene to Solr, Hadoop, Mahout, Ferret and the like. We are plann
Thanks Ian!
You were right, the problem was simply the InputStreamReader.
(Sorry this reply is a bit off-topic now, I didn't have time to work
on this specific problem earlier).
Laura
On Apr 27, 2009, at 3:24 PM, Ian Lea wrote:
Hi
The problem may well lie with the reading of the queries
28 maj 2009 kl. 12.22 skrev Gaurav Kumar:
Hi everyone,
I am doing a project using Lucene where i need to index HTML files.
I am
using Tika to parse HTML files. But i need to index files according
to their
tags which means that every text present in different HTML tag (like
) should be s
Indexing/Storing are at developers discretion. You may choose to store or
not store a field as per your requirement.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Thu, Ma
you will need to develop parser and indexer.
but remember that in current implementation content is not stored in lucene
index,
indexed - yes nut not stored.
Best Regards
Alexander Aristov
2009/5/28 Gaurav Kumar
> Hi everyone,
>
> I am doing a project using Lucene where i need to index HTML
Kumar,
you'll have to make your own documents with after parsing yourself the
HTML (e.g. with Nekohtml to dom).
As for the weights of tokens, supplementarily to IDF, you can do that
per field, i.e. when you add a field into the document.
paul
Le 28-mai-09 à 12:22, Gaurav Kumar a écrit :
Hi everyone,
I am doing a project using Lucene where i need to index HTML files. I am
using Tika to parse HTML files. But i need to index files according to their
tags which means that every text present in different HTML tag (like
) should be stored in different fields. Can i do that. If yes how
No i am not indexing the html tags here. I just want to highlight the
searched word in the html or xml file(the file from which the index
was created) ,can't i trace that? Is there any function to trace the
positions of the term stored in lucene index to find where it actually
is in the file? Can o
As I know, you extract the text out of html pages, I dont think you want to
index the tags as well, right?
So what gets indexed by lucene is just the text and what you get as search
result is what you've indexed.
I'm repeating myself, once you have the search result, its upon you to do
what you wan
Is this possible through lucene or has anybody tried such thing?
On 28/05/2009, Ritu choudhary wrote:
> well friend let me explain the whole thing to you then:
>
> i created lucene index out of some .xml and .html files and i also
> checked this index through luke and its pretty alright till here
well friend let me explain the whole thing to you then:
i created lucene index out of some .xml and .html files and i also
checked this index through luke and its pretty alright till here . I
searched the terms and can find them too but how do i use this result
. I want to open the document ,the
11 matches
Mail list logo