Re: Can spark sql read existing tables created in hive

Cheng Lian Mon, 30 Mar 2015 05:44:21 -0700

The "mysql" command line doesn't use JDBC to talk to MySQL server, sothis doesn't verify anything.

I think this Hive metastore installation guide from Cloudera may behelpful. Although this document is for CDH4, the general steps are thesame, and should help you to figure out the relationships here.


Cheng

On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:

I am able to connect to MySQL Hive metastore from the client clustermachine.

-sh-4.1$ mysql --user=hiveuser --password=pass--host=hostname.vip.company.com <http://hostname.vip.company.com>

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9417286
Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current inputstatement.

mysql> use eBayHDB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------------+
| Tables_in_HDB         |

+---------------------------+


Regards,
Deepak

On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com<mailto:deepuj...@gmail.com>> wrote:


    Yes am using yarn-cluster and i did add it via --files. I get
    "Suitable error not found error"

    Please share the spark-submit command that shows mysql jar
    containing driver class used to connect to Hive MySQL meta store.

    Even after including it through

     --driver-class-path
    /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
    OR (AND)
     --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar

    I keep getting "Suitable driver not found for"


    Command
    ========

    ./bin/spark-submit -v --master yarn-cluster --driver-class-path
    
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
    --jars
    
/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r
    --files $SPARK_HOME/conf/hive-site.xml  --num-executors 1
    --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G"
    --executor-memory 2g --executor-cores 1 --queue hdmi-express
    --class com.ebay.ep.poc.spark.reporting.SparkApp
    spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16
    endDate=2015-02-16
    input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
    subcommand=successevents2
    output=/user/dvasthimal/epdatasets/successdetail2

    Logs
    ====

    Caused by: java.sql.SQLException: No suitable driver found for
    jdbc:mysql://hostname:3306/HDB
    at java.sql.DriverManager.getConnection(DriverManager.java:596)
    at java.sql.DriverManager.getConnection(DriverManager.java:187)
    at
    com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
    at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
    ... 68 more
    ...
    ...

    15/03/27 23:56:08 INFO yarn.Client: Uploading resource
    file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
    
hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar

    ...

    ...




    -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
        61 Fri Oct 17 08:05:36 GMT-07:00 2014
    META-INF/services/java.sql.Driver
      3396 Fri Oct 17 08:05:22 GMT-07:00 2014
    com/mysql/fabric/jdbc/FabricMySQLDriver.class
    *   692 Fri Oct 17 08:05:22 GMT-07:00 2014
    com/mysql/jdbc/Driver.class*
      1562 Fri Oct 17 08:05:20 GMT-07:00 2014
    com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
     17817 Fri Oct 17 08:05:20 GMT-07:00 2014
    com/mysql/jdbc/NonRegisteringDriver.class
       690 Fri Oct 17 08:05:24 GMT-07:00 2014
    com/mysql/jdbc/NonRegisteringReplicationDriver.class
       731 Fri Oct 17 08:05:24 GMT-07:00 2014
    com/mysql/jdbc/ReplicationDriver.class
       336 Fri Oct 17 08:05:24 GMT-07:00 2014
    org/gjt/mm/mysql/Driver.class
    You have new mail in /var/spool/mail/dvasthimal
    -sh-4.1$ cat conf/hive-site.xml | grep Driver
    <name>javax.jdo.option.ConnectionDriverName</name>
    *<value>com.mysql.jdbc.Driver</value>*
      <description>Driver class name for a JDBC metastore</description>
    -sh-4.1$

--Deepak



    On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust
    <mich...@databricks.com <mailto:mich...@databricks.com>> wrote:

        Are you running on yarn?

         - If you are running in yarn-client mode, set HADOOP_CONF_DIR
        to /etc/hive/conf/ (or the directory where your hive-site.xml
        is located).
         - If you are running in yarn-cluster mode, the easiest thing
        to do is to add--files=/etc/hive/conf/hive-site.xml (or the
        path for your hive-site.xml) to your spark-submit script.

        On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
        <deepuj...@gmail.com <mailto:deepuj...@gmail.com>> wrote:

            I can recreate tables but what about data. It looks like
            this is a obvious feature that Spark SQL must be having.
            People will want to transform tons of data stored in HDFS
            through Hive from Spark SQL.

            Spark programming guide suggests its possible.


            Spark SQL also supports reading and writing data stored in
            Apache Hive <http://hive.apache.org/>. .... Configuration
            of Hive is done by placing your |hive-site.xml| file in
            |conf/|.
            
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables

            For some reason its not working.


            On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda
            <ar...@sigmoidanalytics.com
            <mailto:ar...@sigmoidanalytics.com>> wrote:

                Seems Spark SQL accesses some more columns apart from
                those created by hive.

                You can always recreate the tables, you would need to
                execute the table creation scripts but it would be
                good to avoid recreation.

                On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
                <deepuj...@gmail.com <mailto:deepuj...@gmail.com>> wrote:

                    I did copy hive-conf.xml form Hive installation
                    into spark-home/conf. IT does have all the meta
                    store connection details, host, username, passwd,
                    driver and others.



                    Snippet
                    ======


                    <configuration>

                    <property>
                    <name>javax.jdo.option.ConnectionURL</name>
                    <value>jdbc:mysql://host.vip.company.com:3306/HDB
                    <http://host.vip.company.com:3306/HDB></value>
                    </property>

                    <property>
                    <name>javax.jdo.option.ConnectionDriverName</name>
                    <value>com.mysql.jdbc.Driver</value>
                    <description>Driver class name for a JDBC
                    metastore</description>
                    </property>

                    <property>
                    <name>javax.jdo.option.ConnectionUserName</name>
                    <value>hiveuser</value>
                    <description>username to use against metastore
                    database</description>
                    </property>

                    <property>
                    <name>javax.jdo.option.ConnectionPassword</name>
                    <value>some-password</value>
                    <description>password to use against metastore
                    database</description>
                    </property>

                    <property>
                    <name>hive.metastore.local</name>
                    <value>false</value>
                    <description>controls whether to connect to remove
                    metastore server or open a new metastore server in
                    Hive Client JVM</description>
                    </property>

                    <property>
                    <name>hive.metastore.warehouse.dir</name>
                    <value>/user/hive/warehouse</value>
                    <description>location of default database for the
                    warehouse</description>
                    </property>

                    ......



                    When i attempt to read hive table, it does not
                    work. dw_bid does not exists.

                    I am sure there is a way to read tables stored in
                    HDFS (Hive) from Spark SQL. Otherwise how would
                    anyone do analytics since the source tables are
                    always either persisted directly on HDFS or
                    through Hive.


                    On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda
                    <ar...@sigmoidanalytics.com
                    <mailto:ar...@sigmoidanalytics.com>> wrote:

                        Since hive and spark SQL internally use HDFS
                        and Hive metastore. The only thing you want to
                        change is the processing engine. You can try
                        to bring your hive-site.xml to
                        %SPARK_HOME%/conf/hive-site.xml.(Ensure that
                        the hive site xml captures the metastore
                        connection details).

                        Its a hack,  i havnt tried it. I have played
                        around with the metastore and it should work.

                        On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
                        <deepuj...@gmail.com
                        <mailto:deepuj...@gmail.com>> wrote:

                            I have few tables that are created in
                            Hive. I wan to transform data stored in
                            these Hive tables using Spark SQL. Is this
                            even possible ?

                            So far i have seen that i can create new
                            tables using Spark SQL dialect. However
                            when i run show tables or do desc
                            hive_table it says table not found.

                            I am now wondering is this support present
                            or not in Spark SQL ?

--Deepak

--

                        Sigmoid Analytics
                        <http://htmlsig.com/www.sigmoidanalytics.com>

                        *Arush Kharbanda* || Technical Teamlead

                        ar...@sigmoidanalytics.com
                        <mailto:ar...@sigmoidanalytics.com> ||
                        www.sigmoidanalytics.com
                        <http://www.sigmoidanalytics.com/>

--Deepak

--

                Sigmoid Analytics
                <http://htmlsig.com/www.sigmoidanalytics.com>

                *Arush Kharbanda* || Technical Teamlead

                ar...@sigmoidanalytics.com
                <mailto:ar...@sigmoidanalytics.com> ||
                www.sigmoidanalytics.com
                <http://www.sigmoidanalytics.com/>

--Deepak





--
Deepak

Re: Can spark sql read existing tables created in hive

Reply via email to