I'm afraid you misunderstand the purpose of hive-site.xml. It configures
access to the Hive metastore. You can read more here:
http://www.hadoopmaterial.com/2013/11/metastore.html.

So the MySQL DB in hive-site.xml would be used to store hive-specific data
such as schema info, partition info, etc.

Now, for what you want to do, you can search the user list -- I know there
have been posts about Postgres but you can do the same with MySQL. The idea
is to create an object holding a connection pool (so each of your executors
would have its own instance), or alternately, to open a connection within
mapPartitions (so you don't end up with a ton of connections). But the
write to a DB is largely a manual process -- open a connection, create a
statement, sync the data. If your data is small enough you probably could
just collect on the driver and write...though that would certainly be
slower than writing in parallel from each executor.

On Wed, May 20, 2015 at 5:48 PM, roni <roni.epi...@gmail.com> wrote:

> Hi ,
> I am trying to setup the hive metastore and mysql DB connection.
>  I have a spark cluster and I ran some programs and I have data stored in
> some hive tables.
> Now I want to store this data into Mysql  so that it is available for
> further processing.
>
> I setup the hive-site.xml file.
>
> <?xml version="1.0"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
>
> <configuration>
>
>   <property>
>
>     <name>hive.semantic.analyzer.factory.impl</name>
>
>     <value>org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory</value>
>
>   </property>
>
>
>   <property>
>
>     <name>hive.metastore.sasl.enabled</name>
>
>     <value>false</value>
>
>   </property>
>
>
>   <property>
>
>     <name>hive.server2.authentication</name>
>
>     <value>NONE</value>
>
>   </property>
>
>
>   <property>
>
>     <name>hive.server2.enable.doAs</name>
>
>     <value>true</value>
>
>   </property>
>
>
>   <property>
>
>     <name>hive.warehouse.subdir.inherit.perms</name>
>
>     <value>true</value>
>
>   </property>
>
>
>   <property>
>
>     <name>hive.metastore.schema.verification</name>
>
>     <value>false</value>
>
>   </property>
>
>
>   <property>
>
>     <name>javax.jdo.option.ConnectionURL</name>
>
>     <value>jdbc:mysql://<*ip address*
> >:3306/metastore_db?createDatabaseIfNotExist=true</value>
>
>     <description>metadata is stored in a MySQL server</description>
>
>   </property>
>
>
>   <property>
>
>     <name>javax.jdo.option.ConnectionDriverName</name>
>
>     <value>com.mysql.jdbc.Driver</value>
>
>     <description>MySQL JDBC driver class</description>
>
>   </property>
>
>
>   <property>
>
>     <name>javax.jdo.option.ConnectionUserName</name>
>
>     <value>root</value>
>
>   </property>
>
>
>   <property>
>
>     <name>javax.jdo.option.ConnectionPassword</name>
>
>     <value></value>
>
>   </property>
>
>   <property>
>
>             <name>hive.metastore.warehouse.dir</name>
>
>                     <value>/user/${user.name}/hive-warehouse</value>
>
>                             <description>location of default database for
> the warehouse</description>
>
>                                 </property>
>
>
> </configuration>
>  --------------
> My mysql server is on a separate server than where my spark server is . If
> I use mySQLWorkbench , I use a SSH connection  with a certificate file to
> connect .
> How do I specify all that information from spark to the DB ?
> I want to store the data generated by my spark program into mysql.
> Thanks
> _R
>

Reply via email to