RE: Unable to run simple spark-sql

Nirmal Kumar Fri, 21 Jun 2019 05:46:02 -0700

Hey Raymond,

This root cause of the problem was the hive database location was 
'file:/home/hive/spark-warehouse/testdb.db/employee_orc’


I checked that using desc extended testdb.employee

It might be some config issue in the cluster at that time that made the 
location to point to local filesystem.

I created a new database and confirmed that the location was in HDFS 
i.e.hdfs://xxx:8020/apps/hive/warehouse/
For this the code ran fine.

Thanks for the help,
-Nirmal

From: Nirmal Kumar
Sent: 19 June 2019 11:51
To: Raymond Honderdors <raymond.honderd...@sizmek.com>
Cc: user <user@spark.apache.org>
Subject: RE: Unable to run simple spark-sql

Hi Raymond,

I cross checked hive/conf/hive-site.xml and spark2/conf/hive-site.xml
Same value is being shown by Ambari Hive config.
Seems correct value here:

  <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/apps/hive/warehouse</value>
 </property>

Problem :
Spark trying to create a local directory under the home directory of hive user 
(/home/hive/).
Why is it referring the local file system and from where?

Thanks,
Nirmal

From: Raymond Honderdors 
<raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com>>
Sent: 19 June 2019 11:18
To: Nirmal Kumar <nirmal.ku...@impetus.co.in<mailto:nirmal.ku...@impetus.co.in>>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Unable to run simple spark-sql

Hi Nirmal,
i came across the following article 
"https://stackoverflow.com/questions/47497003/why-is-hive-creating-tables-in-the-local-file-system<https://secure-web.cisco.com/1eJXDpPVEl4WoA0ZWL4WJdfrYSsbn4TuKCqHt_IFHMsP29j7xLbCNNBf3Mvmm39OoR8qKeyuLZrkovYLX3CFWIyaUVQ2G3sCCFB9XdWPy_cd2sZrbiLq-hrsZ6rfmMFYZgd27mWYvc49jRUsx6YpUM1JNWdfOidNCVet4LOLJO3VV9kODNw0hhJAirwm0dpxceiGNfGSV_lJIDJvrPt-NG_SiqzFt9HGrOFCJCnCYJHbTlMGKh3LDbkFAvqhDvG8kYkmAU6eMvMUAkjSVQZGjP2uZg0fL1U-AwYPbfU1FsqKyd171Ctt3cFHwGgks1IxkBU-PhKMe4lwFoOI3KuMARwQOGuH2obX4ZJsgeZlZFQw/https%3A%2F%2Fstackoverflow.com%2Fquestions%2F47497003%2Fwhy-is-hive-creating-tables-in-the-local-file-system>"
(and an updated ref link : 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration<https://secure-web.cisco.com/1lHF1a_dGhv0gGAUGVVJizv-j46GpuInCGeNUEhVAkSIeRS8079OhMBRiqwAoRNS9SXkMo_hZuQnvKuiKbSfXjmbSZwpbPTMrDdKaDOB0shFSn5B_9Xn99nORdhBXNdRB0otIq_iqx3_jNdvgWkxzmlQnLnI6-wE26x8ToJYq06GIN-NEi5K9ZvIvCGRt7xNQJaVsXmTpNNKJp0v5bJ8WfTVWt2sOpR1N8W1on7ZrJCKHl9mH8QTJNdRYWEYfF4HkMn5V8U_wGEOsTcx8RDOc7kZHisS_ZUrEwDPKA0PAk35HLJtQ-26-XF1teKiEh8oKB4U_3aMoMcC41nkdckQ7ig/https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FHive%2FAdminManual%2BMetastore%2BAdministration>)
you should check "hive.metastore.warehouse.dir" in hive config files


On Tue, Jun 18, 2019 at 8:09 PM Nirmal Kumar 
<nirmal.ku...@impetus.co.in<mailto:nirmal.ku...@impetus.co.in>> wrote:
Just an update on the thread that it's kerberized.

I'm trying to execute the query with a different user xyz not hive.
Because seems like some permission issue the user xyz trying creating directory 
in /home/hive directory

Do i need some impersonation setting?

Thanks,
Nirmal

Get Outlook for 
Android<https://aka.ms/ghei36<https://secure-web.cisco.com/11z28bxN4NP4Z9g1qxRqPBXzLZShxonyI1ilwAlTV7-TyszSMWOzoSN6NKJr6jGA4169JJxYBOz8iEGs9x3uOAc9izmc36tkqKjjhkgHCJ9-BCf39p4n1xVDehS9j-LVMqvQ3E_0WFBUJS6iHhuj9iAwq_hgac83c0r_VYMtzPCsVC2dyLoiN2QaLQ4UjFMm8nv8ylOR-3ZpolBGGxEe0aKtWOm5o5iWnpTgF1uDzcAD0pDjikQCBS4FpMeXZL1T-LSQcoieAbZxNKH3_TO9PVC_CX_oedg3tlnuUaVFE3pq3DR5Ofx5YcuuGN43d3WGKK_2c8a6ZE74bdDI0IMDusQ/https%3A%2F%2Faka.ms%2Fghei36>>

________________________________
From: Nirmal Kumar
Sent: Tuesday, June 18, 2019 5:56:06 PM
To: Raymond Honderdors; Nirmal Kumar
Cc: user
Subject: RE: Unable to run simple spark-sql

Hi Raymond,

Permission on hdfs is 777
drwxrwxrwx   - impadmin hdfs          0 2019-06-13 16:09 
/home/hive/spark-warehouse


But it’s pointing to a local file system:
Exception in thread "main" java.lang.IllegalStateException: Cannot create 
staging directory  
'file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1'

Thanks,
-Nirmal


From: Raymond Honderdors 
<raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com>>
Sent: 18 June 2019 17:52
To: Nirmal Kumar 
<nirmal.ku...@impetus.co.in<mailto:nirmal.ku...@impetus.co.in>.invalid>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Unable to run simple spark-sql

Hi
Can you check the permission of the user running spark
On the hdfs folder where it tries to create the table

On Tue, Jun 18, 2019, 15:05 Nirmal Kumar 
<nirmal.ku...@impetus.co.in<mailto:nirmal.ku...@impetus.co.in>.invalid<mailto:nirmal.ku...@impetus.co.in.invalid<mailto:nirmal.ku...@impetus.co.in.invalid>>>
 wrote:
Hi List,

I tried running the following sample Java code using Spark2 version 2.0.0 on 
YARN (HDP-2.5.0.0)

public class SparkSQLTest {
  public static void main(String[] args) {
    SparkSession sparkSession = SparkSession.builder().master("yarn")
        .config("spark.sql.warehouse.dir", "/apps/hive/warehouse")
        .config("hive.metastore.uris", "thrift://xxxxxxxxx:9083")
        .config("spark.driver.extraJavaOptions", "-Dhdp.version=2.5.0.0-1245")
        
.config("spark.yarn.am<http://secure-web.cisco.com/1Oej1V5I6Wn0ZMShpetHkWPNxZdYa42iHw6_YDz46cXrzRZOTD-0yqiAfVff322KBoIQfnSgjLpj9GtmBrQwgIqXmFHEzE79eR4q3Zdg1BpR8CYu3QnPelloxvrqvvJpm4nRuUhubMBchyxfV_vobG5zekcHMWiCaQIx8q44CSJ9n6UvG5zk6KBIzQ4w1YDqbPzQ0EUiC6sLguFVlQg0pCtFjzMvWFzgcS8qtLdQt-yWQ8Lq7pvIAySuh6zGGVFIH9Ux2AmMgFIlwNR70p0WnrfeegNn7J52a4ebtyI_E6JVCsB2_QbBpecBtugmK9ILk/http%3A%2F%2Fspark.yarn.am><http://secure-web.cisco.com/1beuiC-aaBQJ0jgI7vONgZiTP5gCokYFEbllyW3ShZVdpQaIuYfuuEuS8iwzhqvwBE8C_E_bBe_7isO-HyPEVX6ZgJajKrQ6oWvTeBQCMjTHVCVImERG2S9qSHrH_mDzf656vrBFxAT1MYZhTZYzXl_3hyZ4BH-XCbKjXrCDyR1OR3tYqqDc7if9NJ1gqHWPwg84tho0__fut2d8y4XxMoMTQNnJzx5367QL6lYV5CFZj055coSLihVVYrh5jBID5jJF40PsrWSvdW7gJ_P6IAN9jTpHFJD7ZrokjlyS7WBAx5Mtnd2KxvNc2O6kKcxk2/http%3A%2F%2Fspark.yarn.am>.extraJavaOptions",
 "-Dhdp.version=2.5.0.0-1245")
        .config("spark.yarn.jars", 
"hdfs:///tmp/lib/spark2/*").enableHiveSupport().getOrCreate();

    sparkSession.sql("insert into testdb.employee_orc select * from 
testdb.employee where empid<5");
  }
}

I get the following error pointing to a local file system 
(file:/home/hive/spark-warehouse) wondering from where its being picked:

16:08:21.321 [dispatcher-event-loop-7] INFO 
org.apache.spark.storage.BlockManagerInfo - Added broadcast_0_piece0 in memory 
on 
192.168.218.92:40831<http://secure-web.cisco.com/1eI1c9QpJbbj4N_Pk9ycydcqDmF85PC5WwISSc4b9_qEOVqfTBBxfEPZY1xDNjJwOx_7Q9FU5iB0hIP6oTpNq69sQ_T27J-d1qyGafuFL_4TJpabeoUh_sOK3KNO0gc2Gb23wPb47w4qBtqjEq5ZIIdgCEWRO2rMbxvaRd3m3hZnSt0LLcLOh9L4aI_tIWiXvHaCjOUKZe6GTEapTyDenGbrcDPI8MDnBJpsf1qmyk6bnXDtZ0TKjCu4XVDipC9-iyfvXcrMPpKabMheF5xBuKddtfYz8sPK7F-2dgG8vahJAHOXgbDqXbqPVrCKS7HIxM-6zHGb69A6LhLt7kg7qjw/http%3A%2F%2F192.168.218.92%3A40831><http://secure-web.cisco.com/18zd_gzhF2N4NeZyolJRHaQMm3mYmE7J-u5p8lbMjuy7lxIZN8zgUUzR8pAzFfMxMiTknORj-329_qyn9tpyQcLejfGKtMK8lhr24CVjsWQVC_YXrT8Ie0c3rifE3KxpJ2y2k58cNtAr0je4JPtzOp6x1HuSmOHLU6CXb80FNn2yi0-PBSRKBHYDJVGU9TlTto9wpY5gkO3U-u7BLR69hXgrqotcSHjzbipPVbI1-HcKKcTbYaEFEqUkM7yy9XJiBfxeqYYJyvstG-5JMJ8Vu8R9DU7gRE0VWMYDNKWPF9KAk_ky4jPHMYHf_DEJimDFI9l0OCyJlELPQs0iw1M6d5g/http%3A%2F%2F192.168.218.92%3A40831>
 (size: 30.6 KB, free: 4.0 GB)
16:08:21.322 [main] DEBUG org.apache.spark.storage.BlockManagerMaster - Updated 
info of block broadcast_0_piece0
16:08:21.323 [main] DEBUG org.apache.spark.storage.BlockManager - Told master 
about block broadcast_0_piece0
16:08:21.323 [main] DEBUG org.apache.spark.storage.BlockManager - Put block 
broadcast_0_piece0 locally took  4 ms
16:08:21.323 [main] DEBUG org.apache.spark.storage.BlockManager - Putting block 
broadcast_0_piece0 without replication took  4 ms
16:08:21.326 [main] INFO org.apache.spark.SparkContext - Created broadcast 0 
from sql at SparkSQLTest.java:33
16:08:21.449 [main] DEBUG 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable - Created staging dir = 
file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1
 for path = file:/home/hive/spark-warehouse/testdb.db/employee_orc
16:08:21.451 [main] INFO org.apache.hadoop.hive.common.FileUtils - Creating 
directory if it doesn't exist: 
file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1
Exception in thread "main" java.lang.IllegalStateException: Cannot create 
staging directory  
'file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1'
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.getStagingDir(InsertIntoHiveTable.scala:83)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.getExternalScratchDir(InsertIntoHiveTable.scala:97)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.getExternalTmpPath(InsertIntoHiveTable.scala:105)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:148)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:142)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:313)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at com.xxxx.xxx.xxx.xxx.xxxx.SparkSQLTest.main(SparkSQLTest.java:33)
16:08:21.454 [pool-8-thread-1] INFO org.apache.spark.SparkContext - Invoking 
stop() from shutdown hook
16:08:21.455 [pool-8-thread-1] DEBUG 
org.spark_project.jetty.util.component.AbstractLifeCycle - stopping 
org.spark_project.jetty.server.Server@620aa4ea<mailto:org.spark_project.jetty.server.Server@620aa4ea<mailto:org.spark_project.jetty.server.Server@620aa4ea>>
16:08:21.455 [pool-8-thread-1] DEBUG org.spark_project.jetty.server.Server - 
Graceful shutdown 
org.spark_project.jetty.server.Server@620aa4ea<mailto:org.spark_project.jetty.server.Server@620aa4ea<mailto:org.spark_project.jetty.server.Server@620aa4ea>>
 by

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org><mailto:user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>>

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


--




[http://secure-web.cisco.com/1J8E_pPd0U8-41M82KchUeTYEL1zVUiIgzVTYJrvDPIdEu0_gxkqLZZF0C3-gOOkGJPABfZXDxMLmdtIs0RSF37S3sMH-7JeRciRJ5CGuxrOaBIGnRCDZgoQtd4XIaP6MNEUqrKzkY5j1nkkS2GzGq4ZSL4r1uZBtCc02Z8wLNH73la-OJ7MfLteiMYr7Em7XVWSIMSF8eoJFGjDDyjtbDGvdk-hVlK9Mdxp0O6zHqJ__39obcMoS5PIULCpwgqu2n6QJks_h-YEQXGA7mLJtcbHnWVeZLIGdd7ZqzjyOCctjiCaWTjAmEUPRmEy2_icatpb_jCdp_q1VMmpQxBmHSw/http%3A%2F%2Fwww.sizmek.com%2Fmedia%2Ffiler_public%2F19%2Fc6%2F19c6b884-b374-4adb-8ef1-6d08226bc1ba%2Femail-signature-logo.png]

Raymond Honderdors

R&D Tech Lead / Open Source evangelist

raymond.honderd...@sizmek.com<mailto:first.l...@sizmek.com>

w: +972732535698

Herzliya
[https://secure-web.cisco.com/1MZCah12N9eVeV70hk81tZGUfeO3SPwkKwT-9Rb4Dk6oJRx7r3_Rlq5P5W7x3fpGXvWYA9lLOhFYecMIHhqKv3YKAMS2LINXgDHG3i8UyY6llcg0YjTX3YQn1Zxb2NMAyw9a06C-2Jd4_IIqMq4e_rFAQvKC4BvDA9o0vxbHiIWBv8LIsYF3a6mtEMzb7z9z-2e9CA1fOe7eCkBt2NSeqYdnfMxNY6Kn1LAwrdoWzv-Besq9scTlx6s5uletOkwOdyhbpFKQEHppGctP2VQyVK9u87FoJM38m1D-ixnQwmKy3hfwRxFK3nuWr2seM5w0Nq7xrI2rkkM90UqSfKZYXlPMWxHFPpVcuGDRtTaUPhEs/https%3A%2F%2Finfo.sizmek.com%2Frs%2F303-WWQ-966%2Fimages%2FSAS_Signature_5.gif]<https://secure-web.cisco.com/1fwToZsgXAv2WVmExBuodAeT9UroAUGo6KLmHj2G7dSabl16faT1FnM_NzJmcEo7CsVq72taZ-efARSJN8GWMI5OoxY9Q7dsLWRb9yIiUqzg6cYF5r8Kxb0xK095FX8aDM6UAmoUSuAzO36iS3wFT_1wqQlEh3ydko7HcAMBF_pJ9L6fCE1Df-V0dcjcAULFxFlrMVYu0e9t2seCXxX98RhawqglzWyzUqZXsfiPkAjNbgt3yhgpFD_zzKGZKqd1AXmHIQhtMUFDpQcG0bp8HDQPdyV1fZi8fy8w05hliR9EzRJdux54gu3gv2auAE_ppejdqEZhceSfBbYdgt1-w25xt7DIgjHioAO47SCGmE4A/https%3A%2F%2Finfo.sizmek.com%2Fsizmek-advertising-suite>

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

<<attachment: winmail.dat>>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: Unable to run simple spark-sql

Reply via email to