I'm afraid you misunderstand the purpose of hive-site.xml. It configures access to the Hive metastore. You can read more here: http://www.hadoopmaterial.com/2013/11/metastore.html.
So the MySQL DB in hive-site.xml would be used to store hive-specific data such as schema info, partition info, etc. Now, for what you want to do, you can search the user list -- I know there have been posts about Postgres but you can do the same with MySQL. The idea is to create an object holding a connection pool (so each of your executors would have its own instance), or alternately, to open a connection within mapPartitions (so you don't end up with a ton of connections). But the write to a DB is largely a manual process -- open a connection, create a statement, sync the data. If your data is small enough you probably could just collect on the driver and write...though that would certainly be slower than writing in parallel from each executor. On Wed, May 20, 2015 at 5:48 PM, roni <roni.epi...@gmail.com> wrote: > Hi , > I am trying to setup the hive metastore and mysql DB connection. > I have a spark cluster and I ran some programs and I have data stored in > some hive tables. > Now I want to store this data into Mysql so that it is available for > further processing. > > I setup the hive-site.xml file. > > <?xml version="1.0"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > <configuration> > > <property> > > <name>hive.semantic.analyzer.factory.impl</name> > > <value>org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory</value> > > </property> > > > <property> > > <name>hive.metastore.sasl.enabled</name> > > <value>false</value> > > </property> > > > <property> > > <name>hive.server2.authentication</name> > > <value>NONE</value> > > </property> > > > <property> > > <name>hive.server2.enable.doAs</name> > > <value>true</value> > > </property> > > > <property> > > <name>hive.warehouse.subdir.inherit.perms</name> > > <value>true</value> > > </property> > > > <property> > > <name>hive.metastore.schema.verification</name> > > <value>false</value> > > </property> > > > <property> > > <name>javax.jdo.option.ConnectionURL</name> > > <value>jdbc:mysql://<*ip address* > >:3306/metastore_db?createDatabaseIfNotExist=true</value> > > <description>metadata is stored in a MySQL server</description> > > </property> > > > <property> > > <name>javax.jdo.option.ConnectionDriverName</name> > > <value>com.mysql.jdbc.Driver</value> > > <description>MySQL JDBC driver class</description> > > </property> > > > <property> > > <name>javax.jdo.option.ConnectionUserName</name> > > <value>root</value> > > </property> > > > <property> > > <name>javax.jdo.option.ConnectionPassword</name> > > <value></value> > > </property> > > <property> > > <name>hive.metastore.warehouse.dir</name> > > <value>/user/${user.name}/hive-warehouse</value> > > <description>location of default database for > the warehouse</description> > > </property> > > > </configuration> > -------------- > My mysql server is on a separate server than where my spark server is . If > I use mySQLWorkbench , I use a SSH connection with a certificate file to > connect . > How do I specify all that information from spark to the DB ? > I want to store the data generated by my spark program into mysql. > Thanks > _R >