The ~25 GB represents about 100 million events an avg of about 250 bytes each. the indexed and searchable values are normal things: small bits of text (8-10 bytes usually); longs; ints; etc...
Also this 25GB is a per-day size, which is why expanding the values in it to ascii is problematic from a storage perspective. I've never used lucene before, but looking at the javadocs, I was hoping that I'd be able to implement some IndexOutput that would store the relevant offsets and then be able to search custom Fieldables in my document. (I'm writing this on the subway right now, so I don't have the docs in front of me, but I think that's what I was thinking last night) ./paul Sent from my Verizon Wireless BlackBerry -----Original Message----- From: Michael McCandless <luc...@mikemccandless.com> Date: Fri, 30 Jan 2009 05:45:17 To: <java-user@lucene.apache.org> Subject: Re: indexing binary files? You can also create a Lucene field using a Reader, if the String is really too large to materialize at once. Such fields cannot be stored though. But, if the String really is so large, I would worry about the end user's experience (normally you want a Document to be a rather bite- sized piece of content so users browsing through search results won't see a single monolithic result covering tons and tons of content). Mike Ganesh wrote: > Use your parser to get the string out of the binary file and index > them using Lucene. > > Store the string as it is, if it is small otherwise store the path > and its offset position. The content could be later retrieved. > > Regards > Ganesh > > > ----- Original Message ----- From: "Paul Feuer" <paul...@gmail.com> > To: <java-user@lucene.apache.org> > Sent: Friday, January 30, 2009 10:00 AM > Subject: Re: indexing binary files? > > >> we have parsers for these files. >> >> to index them, do the string representations need to be stored (aside >> from sitting in the index file)? or can the reader simply provide the >> string in order to record the location of the record in the binary >> file? >> >> if i need to convert the binary file into text fields, the files will >> get VERY large. >> >> the binary data are well-formed events, so queries would be like >> "where ACCOUNT = 'Microsoft'" >> >> ./paul >> >> >> On Thu, Jan 29, 2009 at 11:00 PM, Anshum <ansh...@gmail.com> wrote: >>> Hi Paul, >>> Lucene is a 'text only' saerch lib. i.e. as long as you feed in >>> anything as >>> a string, you'd be able to use lucene else I don't think there's a >>> way. >>> How do you even intend to search in those binary files? as in... >>> what would >>> be the keyword/phrase? asking out of curiosity! >>> >>> -- >>> Anshum Gupta >>> Naukri Labs! >>> http://ai-cafe.blogspot.com >>> >>> The facts expressed here belong to everybody, the opinions to me. >>> The >>> distinction is yours to draw............ >>> >>> >>> On Fri, Jan 30, 2009 at 9:13 AM, Paul Feuer <paul...@gmail.com> >>> wrote: >>> >>>> Hi - >>>> >>>> I've looked on the FAQ, the Java Docs, and searched a little in >>>> google, but haven't been able to figure out if Lucene can index >>>> binary >>>> files. >>>> >>>> Our binary files can get up into the 20-30 gigabyte range. >>>> >>>> If it is possible, anyone have any pointers to what interfaces I >>>> should >>>> look at? >>>> >>>> Thanks, >>>> >>>> ./paul >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > Send instant messages to your online friends http://in.messenger.yahoo.com > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org