I have raised a JIRA - https://issues.apache.org/jira/browse/SPARK-6622 . In order to track this issue and possibly if it requires a fix from Spark
On Tue, Mar 31, 2015 at 9:31 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > Hello Lian, > This blog talks about how to install Hive meta store. I thing that i took > from it was the mysql-connector-java jar that needs to be used and it > suggests 5.1.35 (mysql-connector-java-5.1.35-bin.jar > ). > > When i use that. > > ./bin/spark-submit -v --master yarn-cluster --driver-class-path > /apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar > --jars /apache/hadoop/lib/hadoop-lzo-0.6.0.jar, > */home/dvasthimal/spark1.3/mysql-connector-java-5.1.35-bin.jar*,/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/conf/hive-site.xml > --num-executors 1 --driver-memory 4g --driver-java-options > "-XX:MaxPermSize=2G" --executor-memory 2g --executor-cores 1 --queue > hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp > spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16 endDate=2015-02-16 > input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro > subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2 > > I still get the same error. > > > org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a > test connection to the given database. JDBC url = jdbc:mysql:// > hostname.vip.company.com:3306/HDB, username = hiveuser. Terminating > connection pool (set lazyInit to true if you expect to start your database > after your app). Original Exception: ------ > > java.sql.SQLException: No suitable driver found for > jdbc:mysql://hostname.vip. company.com:3306/HDB > > at java.sql.DriverManager.getConnection(DriverManager.java:596) > > Attached is the full stack trace & logs, if it can reveal some insights. > > Michael, > Could you please take time and look into it. > > Regards, > Deepak > > > On Mon, Mar 30, 2015 at 10:04 PM, Cheng Lian <lian.cs....@gmail.com> > wrote: > >> Ah, sorry, my bad... >> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html >> >> >> On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: >> >> Hello Lian >> Can you share the URL ? >> >> On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <lian.cs....@gmail.com> >> wrote: >> >>> The "mysql" command line doesn't use JDBC to talk to MySQL server, so >>> this doesn't verify anything. >>> >>> I think this Hive metastore installation guide from Cloudera may be >>> helpful. Although this document is for CDH4, the general steps are the >>> same, and should help you to figure out the relationships here. >>> >>> Cheng >>> >>> >>> On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: >>> >>> I am able to connect to MySQL Hive metastore from the client cluster >>> machine. >>> >>> -sh-4.1$ mysql --user=hiveuser --password=pass --host= >>> hostname.vip.company.com >>> Welcome to the MySQL monitor. Commands end with ; or \g. >>> Your MySQL connection id is 9417286 >>> Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492 >>> Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights >>> reserved. >>> Oracle is a registered trademark of Oracle Corporation and/or its >>> affiliates. Other names may be trademarks of their respective >>> owners. >>> Type 'help;' or '\h' for help. Type '\c' to clear the current input >>> statement. >>> mysql> use eBayHDB; >>> Reading table information for completion of table and column names >>> You can turn off this feature to get a quicker startup with -A >>> >>> Database changed >>> mysql> show tables; >>> +---------------------------+ >>> | Tables_in_HDB | >>> >>> +---------------------------+ >>> >>> >>> Regards, >>> Deepak >>> >>> >>> On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>> wrote: >>> >>>> Yes am using yarn-cluster and i did add it via --files. I get "Suitable >>>> error not found error" >>>> >>>> Please share the spark-submit command that shows mysql jar containing >>>> driver class used to connect to Hive MySQL meta store. >>>> >>>> Even after including it through >>>> >>>> --driver-class-path >>>> /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar >>>> OR (AND) >>>> --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar >>>> >>>> I keep getting "Suitable driver not found for" >>>> >>>> >>>> Command >>>> ======== >>>> >>>> ./bin/spark-submit -v --master yarn-cluster --driver-class-path >>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar >>>> --jars >>>> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar, >>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files >>>> $SPARK_HOME/conf/hive-site.xml --num-executors 1 --driver-memory 4g >>>> --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g >>>> --executor-cores 1 --queue hdmi-express --class >>>> com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar >>>> startDate=2015-02-16 endDate=2015-02-16 >>>> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro >>>> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2 >>>> Logs >>>> ==== >>>> >>>> Caused by: java.sql.SQLException: No suitable driver found for >>>> jdbc:mysql://hostname:3306/HDB >>>> at java.sql.DriverManager.getConnection(DriverManager.java:596) >>>> at java.sql.DriverManager.getConnection(DriverManager.java:187) >>>> at >>>> com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361) >>>> at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416) >>>> ... 68 more >>>> ... >>>> ... >>>> >>>> 15/03/27 23:56:08 INFO yarn.Client: Uploading resource >>>> file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar -> >>>> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar >>>> >>>> ... >>>> >>>> ... >>>> >>>> >>>> >>>> -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver >>>> 61 Fri Oct 17 08:05:36 GMT-07:00 2014 >>>> META-INF/services/java.sql.Driver >>>> 3396 Fri Oct 17 08:05:22 GMT-07:00 2014 >>>> com/mysql/fabric/jdbc/FabricMySQLDriver.class >>>> * 692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class* >>>> 1562 Fri Oct 17 08:05:20 GMT-07:00 2014 >>>> com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class >>>> 17817 Fri Oct 17 08:05:20 GMT-07:00 2014 >>>> com/mysql/jdbc/NonRegisteringDriver.class >>>> 690 Fri Oct 17 08:05:24 GMT-07:00 2014 >>>> com/mysql/jdbc/NonRegisteringReplicationDriver.class >>>> 731 Fri Oct 17 08:05:24 GMT-07:00 2014 >>>> com/mysql/jdbc/ReplicationDriver.class >>>> 336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class >>>> You have new mail in /var/spool/mail/dvasthimal >>>> -sh-4.1$ cat conf/hive-site.xml | grep Driver >>>> <name>javax.jdo.option.ConnectionDriverName</name> >>>> * <value>com.mysql.jdbc.Driver</value>* >>>> <description>Driver class name for a JDBC metastore</description> >>>> -sh-4.1$ >>>> >>>> -- >>>> Deepak >>>> >>>> >>>> On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust < >>>> mich...@databricks.com> wrote: >>>> >>>>> Are you running on yarn? >>>>> >>>>> - If you are running in yarn-client mode, set HADOOP_CONF_DIR to >>>>> /etc/hive/conf/ (or the directory where your hive-site.xml is located). >>>>> - If you are running in yarn-cluster mode, the easiest thing to do is >>>>> to add--files=/etc/hive/conf/hive-site.xml (or the path for your >>>>> hive-site.xml) to your spark-submit script. >>>>> >>>>> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> I can recreate tables but what about data. It looks like this is a >>>>>> obvious feature that Spark SQL must be having. People will want to >>>>>> transform tons of data stored in HDFS through Hive from Spark SQL. >>>>>> >>>>>> Spark programming guide suggests its possible. >>>>>> >>>>>> >>>>>> Spark SQL also supports reading and writing data stored in Apache >>>>>> Hive <http://hive.apache.org/>. .... Configuration of Hive is done >>>>>> by placing your hive-site.xml file in conf/. >>>>>> >>>>>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables >>>>>> >>>>>> For some reason its not working. >>>>>> >>>>>> >>>>>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda < >>>>>> ar...@sigmoidanalytics.com> wrote: >>>>>> >>>>>>> Seems Spark SQL accesses some more columns apart from those >>>>>>> created by hive. >>>>>>> >>>>>>> You can always recreate the tables, you would need to execute the >>>>>>> table creation scripts but it would be good to avoid recreation. >>>>>>> >>>>>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I did copy hive-conf.xml form Hive installation into >>>>>>>> spark-home/conf. IT does have all the meta store connection details, >>>>>>>> host, >>>>>>>> username, passwd, driver and others. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Snippet >>>>>>>> ====== >>>>>>>> >>>>>>>> >>>>>>>> <configuration> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>javax.jdo.option.ConnectionURL</name> >>>>>>>> <value>jdbc:mysql://host.vip.company.com:3306/HDB</value> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>javax.jdo.option.ConnectionDriverName</name> >>>>>>>> <value>com.mysql.jdbc.Driver</value> >>>>>>>> <description>Driver class name for a JDBC metastore</description> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>javax.jdo.option.ConnectionUserName</name> >>>>>>>> <value>hiveuser</value> >>>>>>>> <description>username to use against metastore >>>>>>>> database</description> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>javax.jdo.option.ConnectionPassword</name> >>>>>>>> <value>some-password</value> >>>>>>>> <description>password to use against metastore >>>>>>>> database</description> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>hive.metastore.local</name> >>>>>>>> <value>false</value> >>>>>>>> <description>controls whether to connect to remove metastore >>>>>>>> server or open a new metastore server in Hive Client JVM</description> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>hive.metastore.warehouse.dir</name> >>>>>>>> <value>/user/hive/warehouse</value> >>>>>>>> <description>location of default database for the >>>>>>>> warehouse</description> >>>>>>>> </property> >>>>>>>> >>>>>>>> ...... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> When i attempt to read hive table, it does not work. dw_bid does >>>>>>>> not exists. >>>>>>>> >>>>>>>> I am sure there is a way to read tables stored in HDFS (Hive) >>>>>>>> from Spark SQL. Otherwise how would anyone do analytics since the >>>>>>>> source >>>>>>>> tables are always either persisted directly on HDFS or through Hive. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda < >>>>>>>> ar...@sigmoidanalytics.com> wrote: >>>>>>>> >>>>>>>>> Since hive and spark SQL internally use HDFS and Hive metastore. >>>>>>>>> The only thing you want to change is the processing engine. You can >>>>>>>>> try to >>>>>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure >>>>>>>>> that >>>>>>>>> the hive site xml captures the metastore connection details). >>>>>>>>> >>>>>>>>> Its a hack, i havnt tried it. I have played around with the >>>>>>>>> metastore and it should work. >>>>>>>>> >>>>>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) < >>>>>>>>> deepuj...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I have few tables that are created in Hive. I wan to transform >>>>>>>>>> data stored in these Hive tables using Spark SQL. Is this even >>>>>>>>>> possible ? >>>>>>>>>> >>>>>>>>>> So far i have seen that i can create new tables using Spark SQL >>>>>>>>>> dialect. However when i run show tables or do desc hive_table it >>>>>>>>>> says table >>>>>>>>>> not found. >>>>>>>>>> >>>>>>>>>> I am now wondering is this support present or not in Spark SQL ? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Deepak >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> [image: Sigmoid Analytics] >>>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com> >>>>>>>>> >>>>>>>>> *Arush Kharbanda* || Technical Teamlead >>>>>>>>> >>>>>>>>> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Deepak >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> [image: Sigmoid Analytics] >>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com> >>>>>>> >>>>>>> *Arush Kharbanda* || Technical Teamlead >>>>>>> >>>>>>> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> >>> >> >> >> -- >> Deepak >> >> >> > > > -- > Deepak > > -- Deepak