Re: Feasibility question

Otis Gospodnetic Tue, 11 Nov 2008 21:19:25 -0800

Yes, I think it is.  I think the only catch will be those log timestamps, how 
fine you really need them to be, and if you want them very fine what happens 
when you do range queries on timestamps.  If you have a pile of log files lying 
around, it should be pretty easy to get them indexed.  You don't even have to 
write a client for searching the resulting index, just point something like 
Luke to it, or even Solr.



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Jeff Capone <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, November 10, 2008 6:51:20 PM
Subject: Feasibility question

Has anyone deployed Lucene to index log files?  I have seen some articles 
about how RackSpace used Lucene and Hadoop for log processing, but I have 
not seen any details on the implementation.  

To get my required analytics, I think I would need to treat each line of 
the Apache log files as a document and I though I would treat each field as 
a key word to minimize processing. 

Assuming you have clusters operating on independent datasets (so I guess it 
would scale linearly) and you want to process Terabytes of logs per day, 
is such a solution even feasible?

Thank you,

Jeff Capone


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Feasibility question

Reply via email to