SparkR with Hive integration

Peter Zhang Mon, 18 Jan 2016 20:25:05 -0800

Hi all,

http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
From Hive tables
You can also create SparkR DataFrames from Hive tables. To do this we will need 
to create a HiveContext which can access tables in the Hive MetaStore. Note 
that Spark should have been built with Hive support and more details on the 
difference between SQLContext and HiveContext can be found in the SQL 
programming guide.


# sc is an existing SparkContext.
hiveContext <- sparkRHive.init(sc)

sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src")

# Queries can be expressed in HiveQL.
results <- sql(hiveContext, "FROM src SELECT key, value")

# results is now a DataFrame
head(results)
##  key   value
## 1 238 val_238
## 2  86  val_86
## 3 311 val_311

I use RStudio to run above command, when I run "sql(hiveContext, "CREATE TABLE 
IF NOT EXISTS src (key INT, value STRING)”)”

I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it 
doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:file:/user/hive/warehouse/src is not a directory or 
unable to create one)

How  to use HDFS instead of local file system(file)?
Which parameter should to set?

Thanks a lot.


Peter Zhang
-- 
Google
Sent with Airmail

SparkR with Hive integration

Reply via email to