> Lucene needs to be able to ask a RAF opened for writing what it's > current "position" is during indexing, which it then stores away, and > later during searching it needs to ask a RAF opened for reading to > seek back to that position so it can read bytes from there. Would the > encryption APIs allow this?
And this would be the problem, as depending on your encryption, the most encryptions need a "stream" and cannot restart somewhere! E.g. Filesystem encryptions encrypt block-wise. What could be done in the directory implementation is to also encrypt chunks and on each read/write access (write: re-read the chunk, decrypt, change requested bytes, encrypt, store chunk; read: read the chunk, decrypt, return bytes - as you see writing is very costly. So a caching needs to be implemented in the Directory impl, but calling flush should write the encrypted chunk). Another possibility is to copy the decrypted files into a RAMDirectory. E.g. maybe use some algorithm like in this new recently added Split-Directory. > If this is possible then couldn't one make a Directory impl that hides > all encryption/decryption "under the hood"? > > (Presumably performance will suffer perhaps substantially since every > search will need to decrypt on the fly...). > > Mike > > On Mon, May 4, 2009 at 6:29 PM, <peter_lena...@ibi.com> wrote: > > > > I hope to make this a discussion rather than a request for a feature. > > > > In the database world, secure data is always encrypted in the database. > > Since I am interested in storing data from a database in the index, at > > times I want to encrypt the index when the file is one disk. > > > > Currently data stored in the Lucene Index is easily accessible to any > > program that wants to access it. You cannot store sensitive data in the > > index without the fear that it will be readable by all the people that > > have access to the system. > > > > There are two other posts in the mailing list that ask a question about > > Lucene Index Encryption. In both cases, I think that the conservation > > was dropped or the feature put off. > > > > Basically, I am asking for comments on the topic. I might consider > > coding the feature, but I would only do it if I am sure that the feature > > would be useful and accepted back into the core codebase of Lucene. > > > > The Sun javax.crypto package is available in the JDK 1.4 so using that > > package could be possible way of providing an encrypted file. > > > > The other option is Bouncy Castle, which is now being used in the PDFBox > > and Tika projects. > > > > In any case, because the normal Lucene Index implementation would not > > use an encrypted index, all references to Security classes should load > > dynamically with the "Class.forName()" method if they were not part of a > > standard JRE, to guarantee no additional requirements are placed on > > people currently using the Lucene libraries. > > > > Then there is the issue of what to use as the Encryption Key, and how to > > allow access to the Index files from the various programs that may need > > to get to the data. The Encryption Key needs to external from any > > program that accesses the Index, because with Java, if the key were > > stored in the code, it would be easily found with a simple decompile of > > the Java class. > > > > I don't have answers to the questions, but basically I am requesting > > comments on the topic. > > > > I imagine that if I put Encryption and Decryption at the I/O level, > > immediately before a segment was written or immediately after a segment > > was read, that I would minimize the overall impact of the Lucene > > Library. > > > > Another area to address is Remote Searching. The Remote Interface would > > need extensions that allow for Encrypted Remote files as well as > > Encrypted communication between the machines. > > > > However, I am not sure of these assumptions. I don't know how many > > places the segments are read and written. I really do not know how to do > > this currently, but would be willing to give it a try it there was > > enough interest shown in the topic. > > > > > > Peter > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org