Hi All,
I've been trying to index some non-english [Indian languages] in unicode
utf-8. For all these languages we don't have any stemmer or tokenizers etc.
To keep the searching simple I'ld like to be able to do exact word
searches/matches as a first step. I'ld like to know which will be the
simpl
Semantic search is a nice addition to full text search. recently, there are
lots of development in apply semantics to optimize search engine, including
the most hyped launch of walfram alpha.
Semantic search will be one of the main themes on this year's semantic
technology conference, June 14-18,
For all the docs, and in fact, I think it might be the document frequency.
Basically I need to be able to do a query and get a list of terms with how
many documents in the result set contain that term. I'm not so worried about
how often the term appears in each document.
Thanks
Rob
On Thu, May 21
crack...@comcast.net schrieb:
once you get comfortable with vtd-xml, few people will ever get back to DOM and SAX...
maybe you want to consider to contribute a vtd-xml based parsing
implementation to Lucene ;-)
Thanks
Michael
- Original Message -
From: "Sithu D. Sudarsan"
To:
once you get comfortable with vtd-xml, few people will ever get back to DOM and
SAX...
- Original Message -
From: "Sithu D. Sudarsan"
To: java-user@lucene.apache.org
Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific
Subject: RE: Parsing large xml files
Thanks every
We had similar a problem where we had to parse 1 GB XML files.Better
transform to array like json and write a custom search API using lucene.
On Thu, May 21, 2009 at 8:12 PM, Sudarsan, Sithu D. <
sithu.sudar...@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB siz
Thanks a lot.But now I'am going to work(waiter).When I come back I'll
immediately do that
Thanks again.You are so kind.
2009/5/22 Matthew Hall
> humor me.
>
> Open up your indexing software package.
>
> Step 1: In all places where you reference your index, replace whatever the
> heck you have
humor me.
Open up your indexing software package.
Step 1: In all places where you reference your index, replace whatever
the heck you have there with the following EXACT STRING:
/home/marco/testIndex
Do not leave off the leading slash.
After you have made these changes to the indexing softw
I dont't know hot to solve the problem..I've tried all rationals
things.Maybe the last thing is to try to index not with FSDirectory but with
something else.I have to peruse the api documentation.
But.IF IT WAS A LUCENE'S BUG???
2009/5/22 Matthew Hall
> because that's the default index write
because that's the default index write behavior.
It will create any directory that you ask it to.
Matt
Marco Lazzara wrote:
ok.I understand what you really mean but It doesn't work.
I understand one thing.For example When i try to open an index in the
following location : "RDFIndexLucene/" but
ok.I understand what you really mean but It doesn't work.
I understand one thing.For example When i try to open an index in the
following location : "RDFIndexLucene/" but the folder doesn't exist,*Lucene
create an empty folder named "RDFIndexLucene"* in my home folder...WHY???
MARCO LAZZARA
2009/
home/marco/RdfIndexLucene and media/disk/users/fratelli/RDFIndexLucene are
relative paths. Use
/media/disk/users/fratelli/RDFIndexLucene etc. instead.
DIGY
-Original Message-
From: Marco Lazzara [mailto:marco.lazz...@gmail.com]
Sent: Friday, May 22, 2009 12:48 AM
To: java-user@lucene.apa
For writing indexes?
Well I guess it depends on what you want.. but I personally use this:
(2.3.2 API)
File INDEX_DIR = "/data/searchtool/thisismyindexdirectory"
Analyzer analyzer = new WhateverConcreteAnalyzerYouWant();
writer = new IndexWriter(/INDEX_DIR/, /analyzer/, true);
Your best bet w
I was talking with my teacher.
Is it correct to use FSDirectory?Could you please look again at the code
I've posted here??
Should I choose a different way to Indexing ??
Marco Lazzara
2009/5/22 Ian Lea
> OK. I'd still like to see some evidence, but never mind.
>
> Next suggestion is the old
Yeah, there's a setting on windows that allows you to use up to .. erm
3G I think it was. The limitation there is due to the silly windows
file system. I'm don't remember off hand exactly what that setting was,
but I'm 100% certain that its there.
If you do a google search for jvm maximum me
Hi Matt,
We use 32 bit JVM. Though it is supposed to have upto 4GB, any
assignment above 2GB in Windows XP fails. The machine has quad-core
dual processor.
On Linux we're able to use 4GB though!
If there is any setting that will let us use 4GB do let me know.
Thanks,
Sithu D Sudarsan
-O
Thanks everyone for your useful suggestions/links.
Lucene uses DOM and we tried with SAX.
XML Pull & vtd-xml as well as Piccolo seem good.
However, for now, we've broken the file into smaller chunks and then
parsing it.
When we get some time, we'ld like to refactor with the suggested ones.
Er
2g... should not be a maximum for any Jvm that I know of.
Assuming you are running a 32 bit Jvm you are actually able to address a
bit under 4G of memory, I've always used around 3.6G when trying to max
out a 32 bit jvm. Technically speaking it should be able to address 4g
under a 32 bit or,
Hello,
I think if you analyze text correctly, then your highlighting will work too.
Your problem is you need an analyzer that analyzes text correctly, then I
think everything will work!
Here's a short intro with some links:
You can get code that applies these algorithms here:
http://site.icu-proje
On May 22, 2009, at 12:28 AM, Dmitri Bichko wrote:
Hi,
I may be missing something obvious, but how do I get the payloads for
the specific token positions that were matched by a query?
See SpanQuery.getPayloadSpans() and it's SpanQuery derivatives.
For example, if I have a phrase query li
OK. I'd still like to see some evidence, but never mind.
Next suggestion is the old standby - cut the code down to the absolute
minimum to demonstrate the problem and post it here. I know you've
already posted some code, but maybe not all of it, and definitely not
cut down to the absolute minimu
crack...@comcast.net schrieb:
http://vtd-xml.sf.net
- Original Message -
From: "Sithu D. Sudarsan"
To: java-user@lucene.apache.org
Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
Subject: Parsing large xml files
Hi,
While trying to parse xml documents of
22 matches
Mail list logo