yes this is feasible.
You can use databricks jar file to loas csv files from staging directory.
This is pretty standard
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header", "true").load("hdfs://xx:9000/data/stg/")
You can then create an O
Hello,
How feasible is to use Spark to extract csv files and creates and writes the
content to an ORC table in a Hive database.
Is Parquet file the best (optimum) format to write to HDFS from Spark app.
Thanks