@Gidon Copied the gist to a google doc for commenting: https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit#
@Micah > it would be preferable to put them in an internal namespace to ensure adequate unit testing is in place. Agreed; my intent was only to indicate that implementation details should be private. We can even have *_internal.h header files for those to make them available for unit testing as we do with some other classes. I've added a note of this caveat to the doc. > It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should still be implemented > with internal synchronization to guarantee thread-safety? If these are exclusively accessed from the main thread (where they are used to assemble FileDecryptionProperties) then there is still no need to synchronize them. @Tham > As I see, the returned FileDecryptionProperties object from > PropertiesDrivenCryptoFactory is not mutable, > since it has a DecryptionKeyRetriever object. If the DecryptionKeyRetriever accesses the caches directly, then it will indeed need to be synchronized. Instead, we can ensure that cache access happens on the main thread before worker tasks are launched. After we assemble this ahead-of-time set of keys it will not change during the course of a read, so the DecryptionKeyRetriever can safely access it without mutexes. I've added a comment to the doc On Fri, Nov 13, 2020 at 3:09 AM Gidon Gershinsky <gg5...@gmail.com> wrote: > Hi all, > > Glad to see the parquet-cpp progress on this! Can I suggest creating a > googledoc for the technical discussion? The current md doc format seems to > be harder for pinpointed comments. I got a few, but they are too minor for > sending to the two mailing lists. > > Cheers, Gidon > > > On Fri, Nov 13, 2020 at 9:17 AM Tham Ha <t...@emotiv.com> wrote: > >> Hi Ben, >> >> Can you reconsider this point: >> > Properties (including FileDecryptionProperties) are immutable after >> construction and thus can be safely shared between threads with no further >> synchronization. >> >> As I see, the returned FileDecryptionProperties object from >> PropertiesDrivenCryptoFactory >> is not mutable, since it has a DecryptionKeyRetriever object (i.e. >> FileKeyUnwrapper >> object in this pull) that helps get the key from key metadata. >> From the current implementation of FileKeyUnwrapper with the cache, I >> think synchronization is necessary, unless you have another idea. >> >> Cheers, Tham >> >> On Fri, Nov 13, 2020 at 1:28 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> I skimmed through and this seems like a clean design (I would have to >>> reread the PR to do a comparison. A few thoughts of the top of my head: >>> >>> >>> > - Multiple internal classes are left public in header files, where it >>> > would be >>> > preferred that public classes be kept to a minimum. >>> >>> I think some of these classes are non-trivial. If that is the case it >>> would be preferable to put them in an internal namespace to ensure >>> adequate unit testing is in place. >>> >>> It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should >>> still be implemented with internal synchronization to guarantee >>> thread-safety? >>> >>> >>> >>> On Wed, Nov 11, 2020 at 6:19 PM Benjamin Kietzman <bengil...@gmail.com> >>> wrote: >>> >>> > In the context of https://issues.apache.org/jira/browse/ARROW-9318 / >>> > https://github.com/apache/arrow/pull/8023 which port the parquet-mr >>> > design to c++: there's some question whether that design is consistent >>> > with the style and conventions of the c++ implementation of parquet. >>> > >>> > Here is a Gist with a sketched alternative design adapted to c++ >>> > https://gist.github.com/bkietz/f701d32add6f5a2aeed89ce36a443d43 >>> > >>> > Specific concerns in brief: >>> > - Multiple internal classes are left public in header files, where it >>> > would be >>> > preferred that public classes be kept to a minimum. >>> > - Several levels of explicit synchronization with mutexes are used to >>> > guarantee >>> > that each class is completely thread safe, but this is unnecessary >>> > overhead >>> > given the pattern of use of `parquet-cpp`. >>> > >>> >> >> >> -- >> Tham Ha Thi >> C++ Developer >> *EMOTIV* www.emotiv.com >> Neurotech for the Global Community >> >> LEGAL NOTICE >> >> This message (including all attachments) contains confidential >> information intended for a specific individual and purpose, and is >> protected by law. Any confidentiality or privilege is not waived or lost >> because this email has been sent to you by mistake. If you have received it >> in error, please let us know by reply email, delete it from your system and >> destroy any copies. >> >> This email is also subject to copyright. Any disclosure, copying, or >> distribution of this message, or the taking of any action based on it, is >> strictly prohibited. >> >> Emails may be interfered with, may contain computer viruses or other >> defects and may not be successfully replicated on other systems. We give no >> warranties in relation to these matters. If you have any doubts about the >> authenticity of an email purportedly sent by us, please contact us >> immediately. >> >>