Re: Dataflow and mounting large data sets

2023-01-30 Thread Israel Herraiz via user
And could not the dataset be accessed from Cloud Storage? Does it need to be specifically NFS? On Thu, 26 Jan 2023 at 18:16, Chad Dombrova wrote: > Hi all, > We have large data sets which we would like to mount over NFS within > Dataflow. As far as I know, this is not possible. Has anything ch

Re: Dataflow and mounting large data sets

2023-01-30 Thread Chad Dombrova
Hi Israel, Thanks for responding. And could not the dataset be accessed from Cloud Storage? Does it need to > be specifically NFS? > No unfortunately it can't be accessed from Cloud Storage. Our data resides on high performance Isilon [1] servers using a posix filesystem, and NFS is the tried a

Re: Dataflow and mounting large data sets

2023-01-30 Thread Robert Bradshaw via user
If it's your input/output data, presumably you could implement a https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/FileSystem.html for nfs. (I don't know what all that would entail...) On Mon, Jan 30, 2023 at 9:04 AM Chad Dombrova wrote: > > Hi Israel, > Thanks for responding.

Re: Dataflow and mounting large data sets

2023-01-30 Thread Chad Dombrova
Hi Robert, I know very little about the FileSystem classes, but I don’t think it’s possible for a process running in docker to create an NFS mount without running in privileged [1] mode, which cannot be done with Dataflow. The other ways of gaining access to a mount are: A. the node running docker

Re: Dataflow and mounting large data sets

2023-01-30 Thread Robert Bradshaw via user
Different idea: is it possible to serve this data via another protocol (e.g. sftp) rather than requiring a mount? On Mon, Jan 30, 2023 at 9:26 AM Chad Dombrova wrote: > > Hi Robert, > I know very little about the FileSystem classes, but I don’t think it’s > possible for a process running in dock

Re: Dataflow and mounting large data sets

2023-01-30 Thread Valentyn Tymofieiev via user
Beam SDK docker containers on Dataflow VMs are currently launched in privileged mode. On Mon, Jan 30, 2023 at 9:52 AM Robert Bradshaw via user < user@beam.apache.org> wrote: > Different idea: is it possible to serve this data via another protocol > (e.g. sftp) rather than requiring a mount? > > O

Re: Dataflow and mounting large data sets

2023-01-30 Thread Chad Dombrova
Hi Valentyn, > Beam SDK docker containers on Dataflow VMs are currently launched in > privileged mode. > Does this only apply to stock sdk containers? I'm asking because we use a custom sdk container that we build. We've tried various ways of running mount from within our custom beam container

Re: Dataflow and mounting large data sets

2023-01-30 Thread Robert Bradshaw via user
I'm also not sure it's part of the contract that the containerization technology we use will always have these capabilities. On Mon, Jan 30, 2023 at 10:53 AM Chad Dombrova wrote: > > Hi Valentyn, > >> >> Beam SDK docker containers on Dataflow VMs are currently launched in >> privileged mode. > >

Re: Dataflow and mounting large data sets

2023-01-30 Thread Valentyn Tymofieiev via user
It applies to custom containers as well. You can find the container manifest in the GCE VM metadata, and it should have an entry for privileged mode. The reason for this was to enable GPU accelerator support, but agree with Robert that it is not part of any contracts, so in theory this could change