Re: highlighting the result within a file

2009-07-04 Thread Jaison Sunny
hi Ritu, > this is jaison. i am new to such search. can you just help me out. i want some body who can guide me in lucene > Thanks in advance. > > > - >

Re: Storing a serialized object ?

2009-07-04 Thread Amin Mohammed-Coleman
Hi I think you might want to look at Hibernate Search. You can use projections which basically store instance fields in the index. It does not store the object in a serialised form in the index. It holds a reference (id) to the persistant entity. Cheers Amin On Sat, Jul 4, 2009 at 2:39 AM, Er

Re: Storing a serialized object ?

2009-07-04 Thread Simon Willnauer
Hi there, On Fri, Jul 3, 2009 at 9:32 PM, MilleBii wrote: > I want to store in the index a data structure and load it back at search > time. > > Is it safe to serialize the java object store it and load it back later ? It won't be particularly fast nor efficient but it is gonna work. > Presumably

RE: Storing a serialized object ?

2009-07-04 Thread Uwe Schindler
You can add a serialized object easily as a stored field to a document, just serialize the object to an byte[] array and store this in the index, e.g.: ByteArrayOutputStream serData=new ByteArrayOutputStream(); ObjectOutputStream out=new ObjectOutputStream(serData); try { out.writeObject(d

RE: Storing a serialized object ?

2009-07-04 Thread Uwe Schindler
> That is one way, or you do it base64 encoded in a text field if don't > care about space at all. :) Lucene also have binary fields for storing. Searching on such fields does not make sense, so its ok to not be able to index them (how should that work). I have this use case, too. Sometimes it is

Re: Storing a serialized object ?

2009-07-04 Thread Simon Willnauer
On Sat, Jul 4, 2009 at 10:15 AM, Uwe Schindler wrote: >> That is one way, or you do it base64 encoded in a text field if don't >> care about space at all. :) just for clarification: one way Java Object Serialization - is not efficient at all It takes a lot of space and performance is crap. other wa

Re: Storing a serialized object ?

2009-07-04 Thread MilleBii
Well, During indexing phase (I'm actually running Nutch), I'm also extracting data about my pages including some text fragments. So I'd like to store the resulting objects in lucene index, and reload them at search time for further manipulation. I was wondering which way was the simplest. 2009/7

Re: Storing a serialized object ?

2009-07-04 Thread MilleBii
Right I'm not indexing such fields, they are actually a kind of document property of my own 2009/7/4 Uwe Schindler > > That is one way, or you do it base64 encoded in a text field if don't > > care about space at all. :) > > Lucene also have binary fields for storing. Searching on such fields do

RE: Storing a serialized object ?

2009-07-04 Thread Uwe Schindler
Then see my other mail about Java Serialization. It works (but not so fast), but is the simpliest way to do it. I do not use the serialized fields during searching, I store them only for usage in some special maintenance tasks on the indexed documents. So it's the same use-case. For this use case

Re: Storing a serialized object ?

2009-07-04 Thread MilleBii
OK thanks for the tip on Java object serialization performance. Most of what I have to store/retrieve is straightforward so I can do it by hand. What pushed me on object serialization is that I want to store/retrieve text fragment of undefined content. 2009/7/4 Simon Willnauer > On Sat, Jul 4,

Re: Storing a serialized object ?

2009-07-04 Thread MilleBii
OK, thx guys. I see the different options more clear now. 2009/7/4 Uwe Schindler > Then see my other mail about Java Serialization. It works (but not so > fast), > but is the simpliest way to do it. > > I do not use the serialized fields during searching, I store them only for > usage in some sp

Re: search for percent char with lucene

2009-07-04 Thread shbn
HI, i used the StandartAnalyzer. i changed to WhitespaceAnalyzer so now i got results when i search for '1%' for exemple, but if i type only the '%' i still got results. /*** doc = new Document(); nameField = new Field("name",strN,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITI

Re: How to use RegexTermEnum

2009-07-04 Thread Shayak Sen
I might be skirting the issue here, but wouldnt it be easier and faster if you remove the sid before you add it to the index? Cheers, Shayak On Sat, Jul 4, 2009 at 3:03 AM, Erick Erickson wrote: > WARNING: I haven't actually tried using RegexTermEnum in a > long time, but... > > I *think* that th

Re: How to use RegexTermEnum

2009-07-04 Thread Raf
It works, thanks. I thought I had to call next() to know IF there was a term, as you normally do with hasNext() - next() using iterators, but I was wrong. So, in order to know if there is a match, I have to check if rte.term() is null, correct? Than I can use next() to look for additional matches.

Re: How to use RegexTermEnum

2009-07-04 Thread Raf
Yes, I thought about this solution too, but the problem is that the "sid" part can be different in different domains. So, sometimes we have sid=..., other times we have s= and so on. If we decide to solve the problem by removing the sid from the url in the index, when we discover a new "patter

Boolean retrieval

2009-07-04 Thread Lukas Michelbacher
This is about an experiment comparing plain Boolean retrieval with vector-space-based retrieval. I would like to disable all of Lucene's scoring mechanisms and just run a true Boolean query that returns exactly the documents that match a query specified in Boolean syntax (OR, AND, NOT). No scori

Re: Boolean retrieval

2009-07-04 Thread Mark Harwood
Check out booleanfilter in contrib/queries. It can be wrapped in a constantScoreQuery On 4 Jul 2009, at 17:37, Lukas Michelbacher wrote: This is about an experiment comparing plain Boolean retrieval with vector-space-based retrieval. I would like to disable all of Lucene's scoring mechani

Re: Boolean retrieval

2009-07-04 Thread Paul Elschot
It is also possible to use the HitCollector api and simply ignore the score values. Regards, Paul Elschot On Saturday 04 July 2009 21:14:41 Mark Harwood wrote: > > Check out booleanfilter in contrib/queries. It can be wrapped in a > constantScoreQuery > > > > On 4 Jul 2009, at 17:37, Lukas

Need help regarding Lucene index/query

2009-07-04 Thread mitu2009
I want to have a "citystate" field in Lucene index which will store various city state values like: Chicago, IL Boston, MA San Diego, CA How do i store these values(shud it be tokenized or non-tokenized?) in Lucene and how do I generate a query (should it be phrasequery or termquery or somet