On Tuesday when I'm back at work I will read all the above, and can coordinate 
on starting a design doc.

On Sun, Sep 6, 2020, at 5:03 AM, Gidon Gershinsky wrote:
> Cool, thank you. This would solve the problem at hand.
> I agree it'd be good to kick off the PyArrow API discussion in parallel
> with the PR8023 review.
> Maybe you and Itamar could prep a googledoc draft for the community to have
> a look and to comment.
> 
> Cheers, Gidon
> 
> 
> On Fri, Sep 4, 2020 at 6:08 PM Roee Shlomo <roe...@gmail.com> wrote:
> 
> > Sounds good. In the suggestion above the builders for
> > FileEncryptionProperties/FileDecryptionProperties should not be exposed, so
> > only key tools would create those. This is just one option of course.
> >
> > On 2020/09/03 20:44:26, Antoine Pitrou <anto...@python.org> wrote:
> > >
> > > It would be useful for outsiders to expose what those two API levels
> > > are, and to what usage they correspond.
> > > Is Parquet encryption used only with that Spark?  While Spark
> > > interoperability is important, Parquet files are more ubiquitous than
> > that.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 03/09/2020 à 22:31, Gidon Gershinsky a écrit :
> > > > Why would the low level API be exposed directly.. This will break the
> > > > interop between the two analytic ecosystems down the road.
> > > > Again, let me suggest leveraging the high level interface, based on the
> > > > PropertiesDrivenCryptoFactory.
> > > > It should address your technical requirements; if it doesn't, we can
> > > > discuss the gaps.
> > > > All questions are welcome.
> > > >
> > > > Cheers, Gidon
> > > >
> > > >
> > > > On Thu, Sep 3, 2020 at 10:11 PM Roee Shlomo <roe...@gmail.com> wrote:
> > > >
> > > >> Hi Itamar,
> > > >>
> > > >> I implemented some python wrappers for the low level API and would be
> > > >> happy to collaborate on that. The reason I didn't push this forward
> > yet is
> > > >> what Gidon mentioned. The API to expose to python users needs to be
> > > >> finalized first and it must include the key tools API for interop with
> > > >> Spark.
> > > >>
> > > >> Perhaps it would be good to kickoff a discussion on how the pyarrow
> > API
> > > >> for PME should look like (in parallel to reviewing the arrow-cpp
> > > >> implementation of key-tools; to ensure that wrapping it would be a
> > > >> reasonable effort).
> > > >>
> > > >> One possible approach is to expose both the low level API and keytools
> > > >> separately. A user would create and initialize a
> > > >> PropertiesDrivenCryptoFactory and use it to create the
> > > >> FileEncryptionProperties/FileDecryptionProperties to pass to the lower
> > > >> level API. In pandas this would translate to something like:
> > > >> ```
> > > >> factory = PropertiesDrivenCryptoFactory(...)
> > > >> df.to_parquet(path, engine="pyarrow",
> > > >> encryption=factory.getFileEncryptionProperties(...))
> > > >> df = pd.read_parquet(path, engine="pyarrow",
> > > >> decryption=factory.getFileDecryptionProperties(...))
> > > >> ```
> > > >> This should also work with reading datasets since decryption uses a
> > > >> KeyRetriever, but I'm not sure what will need to be done once
> > datasets will
> > > >> support write.
> > > >>
> > > >> On 2020/09/03 14:11:51, "Itamar Turner-Trauring" <
> > ita...@pythonspeed.com>
> > > >> wrote:
> > > >>> Hi,
> > > >>>
> > > >>> I'm looking into implementing this, and it seems like there are two
> > > >> parts: packaging, but also wrapping the APIs in Python. Is the latter
> > item
> > > >> accurate? If so, any examples of similar existing wrapped APIs, or
> > should I
> > > >> just come up with something on my own?
> > > >>>
> > > >>> Context:
> > > >>> https://github.com/apache/arrow/pull/4826
> > > >>> https://issues.apache.org/jira/browse/ARROW-8040
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> —Itamar
> > > >>
> > > >
> > >
> >
> 

Reply via email to