Re: [Discuss] Datasource v2 support for Kerberos

2018-10-02 Thread Steve Loughran
On 2 Oct 2018, at 04:44, tigerquoll mailto:tigerqu...@outlook.com>> wrote: Hi Steve, I think that passing a kerberos keytab around is one of those bad ideas that is entirely appropriate to re-question every single time you come across it. It has been used already in spark when interacting with

Re: [Discuss] Datasource v2 support for Kerberos

2018-10-01 Thread tigerquoll
Hi Steve, I think that passing a kerberos keytab around is one of those bad ideas that is entirely appropriate to re-question every single time you come across it. It has been used already in spark when interacting with Kerberos systems that do not support delegation tokens. Any such system will e

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-27 Thread Steve Loughran
> On 25 Sep 2018, at 07:52, tigerquoll wrote: > > To give some Kerberos specific examples, The spark-submit args: > -–conf spark.yarn.keytab=path_to_keytab -–conf > spark.yarn.principal=princi...@realm.com > > are currently not passed through to the data sources. > > > I'm not sure why th

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-25 Thread Ryan Blue
I agree with Wenchen that we'd remove the prefix when passing to a source, so you could use the same "spark.yarn.keytab" option in both places. But I think the problem is that "spark.yarn.keytab" still needs to be set, and it clearly isn't in a shared namespace for catalog options. So I think we wo

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-24 Thread tigerquoll
To give some Kerberos specific examples, The spark-submit args: -–conf spark.yarn.keytab=path_to_keytab -–conf spark.yarn.principal=princi...@realm.com are currently not passed through to the data sources. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ -

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-24 Thread Wenchen Fan
> All of the Kerberos options already exist in their own legacy locations though - changing their location could break a lot of systems. We can define the prefix for shared options, and we can strip the prefix when passing these options to the data source. Will this work for your case? On Tue, Se

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-24 Thread tigerquoll
I like the shared namespace option better then the white listing option for any newly defined configuration information. All of the Kerberos options already exist in their own legacy locations though - changing their location could break a lot of systems. Perhaps we can use the shared namespace

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-24 Thread Ryan Blue
Dale, what do you think about the option that I suggested? I think that's different from the ones that you just listed. Basically, the idea is to have a "shared" set of options that are passed to all sources. This would not be a whitelist, it would be a namespace that ends up passed in everywhere.

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-23 Thread tigerquoll
I believe the current spark config system is unfortunate in the way it has grown - you have no way of telling which sub-systems uses which configuration options without direct and detailed reading of the code. Isolating config items for datasources into a separate namespaces (rather then using a w

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-19 Thread Ryan Blue
I’m not a huge fan of special cases for configuration values like this. Is there something that we can do to pass a set of values to all sources (and catalogs for #21306)? I would prefer adding a special prefix for options that are passed to all sources, like this: spark.sql.catalog.shared.shared

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-16 Thread Wenchen Fan
I'm +1 for this proposal: "Extend SessionConfigSupport to support passing specific white-listed configuration values" One goal of data source v2 API is to not depend on any high-level APIs like SparkSession, SQLConf, etc. If users do want to access these high-level APIs, there is a workaround: cal