You can also insert into existing tables via .insertInto(tableName, overwrite). 
You just have to import sqlContext._

On 19.11.2014, at 09:41, Daniel Haviv <danielru...@gmail.com> wrote:

> Hello,
> I'm writing a process that ingests json files and saves them a parquet files.
> The process is as such:
> 
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val jsonRequests=sqlContext.jsonFile("/requests")
> val parquetRequests=sqlContext.parquetFile("/requests_parquet")
> 
> jsonRequests.registerTempTable("jsonRequests")
> parquetRequests.registerTempTable("parquetRequests")
> 
> val unified_requests=sqlContext.sql("select * from jsonRequests union select 
> * from parquetRequests")
> 
> unified_requests.saveAsParquetFile("/tempdir")
> 
> and then I delete /requests_parquet and rename /tempdir as /requests_parquet
> 
> Is there a better way to achieve that ? 
> 
> Another problem I have is that I get a lot of small json files and as a 
> result a lot of small parquet files, I'd like to merge the json files into a 
> few parquet files.. how I do that?
> 
> Thank you,
> Daniel
> 
> 

Reply via email to