On 09/01/2012 16:27, Mike C wrote:
Hi,

I'm investigating storing syslog data using Lucene (via Solr or
Elasticsearch, undecided at present). The syslogs belong to systems
under the scope of the PCI DSS (Data Security Standard), and one of
the requirements is to ensure logs aren't tampered with. I'm looking
for advice on how to accomplish this.

Looking through the Lucene documentation, I believe there doesn't
exist any built in functionality to secure index data through digital
signatures or HMACs. Is this the case, or have I overlooked something?
I see there is a lucenetransform project
(http://code.google.com/p/lucenetransform/) that offers encryption,
but not digital signatures. I'm not concerned about hiding the
contents of the data, just need to ensure it hasn't been tampered
with. At present I use Splunk, which signs and verifies blocks of
indexed data. Unfortunately its pricing model doesn't scale well,
hence looking for a lucene-based solution.

I suppose I could add a digital signature programmatically to each
lucene Document/Syslog, though it seems like a lot of overhead.
Lucenetransforms approach does seem to suggest that I could provide a
digital signature version of Directory (and IndexInput/IndexOutput),
however before I go down that rabbit hole, decided to check in here.
Any advice or suggestions appreciated.

This is an interesting and important problem.

I assume that the signature(s) should be created as a part of the regular indexing process, and in a sense they would also depend on and provide a way to verify the authenticity of the application that created the index (because the application has to know how to create valid signatures). You would obviously need a counterpart application that can verify such signatures.

Per-document sigs do add some overhead, but if you can keep them small (128 bits?) then you can still use stored fields (or DocValues in trunk, which offer a more efficient, compact representation). Still, if you need non-repudiation for certain sequences of events then you need to sign such sequences too - in Lucene terms this would be probably segments or Directory files.

So the "transformation" approach can work well for creating global (per segment and per file) signatures - instead of encrypting you would pass all data that is written to Directory through a HMAC algo, which on stream close would simply write a signature to a separate file in Directory - this can be easily implemented as a Directory wrapper. The only complication here is that you would have to handle changes related to segment merges yourself, i.e. you would have to do something with sig files that correspond to obsolete segments (discard?).

In Lucene trunk you can use the Codec API to essentially do the same as explained above, only this time you can interpret the data more easily, e.g. if some aspects of data (postings, payloads, term dictionary) are not so important for the signature as e.g. stored fields are, then you can skip them - and finally when a batch of documents (that corresponds to a Lucene segment) is finished you would write the signatures to additional files - only this time the sig files would be known as belonging to that segment, so you would get some help from Lucene during segment merging and you could handle merging of data (create additional sigs for every merge? or recompute sig for the new segment?), and old sigs would be deleted whenever old segments are deleted due to merging.

I'd give it a shot with Directory-based approach first, because it's easy to implement, and then see if it's good enough.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to