Hi Mich, Following is a sample code snippet:
*val *userDF = userRecsDF.toDF("idPartitioner", "dtPartitioner", "userId", "userRecord").persist() System.*out*.println(" userRecsDF.partitions.size"+ userRecsDF.partitions.size) userDF.registerTempTable("userRecordsTemp") sqlContext.sql("SET hive.default.fileformat=Orc ") sqlContext.sql("set hive.enforce.bucketing = true; ") sqlContext.sql("set hive.enforce.sorting = true; ") sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS users (userId STRING, userRecord STRING) PARTITIONED BY (idPartitioner STRING, dtPartitioner STRING) stored as ORC LOCATION '/user/userId/userRecords' ") sqlContext.sql( """ from userRecordsTemp ps insert overwrite table users partition(idPartitioner, dtPartitioner) select ps.userId, ps.userRecord, ps.idPartitioner, ps.dtPartitioner CLUSTER BY idPartitioner, dtPartitioner """.stripMargin) On Mon, Jun 13, 2016 at 10:57 AM, swetha kasireddy < swethakasire...@gmail.com> wrote: > Hi Bijay, > > If I am hitting this issue, > https://issues.apache.org/jira/browse/HIVE-11940. What needs to be done? > Incrementing to higher version of hive is the only solution? > > Thanks! > > On Mon, Jun 13, 2016 at 10:47 AM, swetha kasireddy < > swethakasire...@gmail.com> wrote: > >> Hi, >> >> Following is a sample code snippet: >> >> >> *val *userDF = userRecsDF.toDF("idPartitioner", "dtPartitioner", "userId", >> "userRecord").persist() >> System.*out*.println(" userRecsDF.partitions.size"+ >> userRecsDF.partitions.size) >> >> userDF.registerTempTable("userRecordsTemp") >> >> sqlContext.sql("SET hive.default.fileformat=Orc ") >> sqlContext.sql("set hive.enforce.bucketing = true; ") >> sqlContext.sql("set hive.enforce.sorting = true; ") >> sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS users (userId >> STRING, userRecord STRING) PARTITIONED BY (idPartitioner STRING, >> dtPartitioner STRING) stored as ORC LOCATION '/user/userId/userRecords' " >> ) >> sqlContext.sql( >> """ from userRecordsTemp ps insert overwrite table users >> partition(idPartitioner, dtPartitioner) select ps.userId, ps.userRecord, >> ps.idPartitioner, ps.dtPartitioner CLUSTER BY idPartitioner, dtPartitioner >> """.stripMargin) >> >> >> >> >> On Fri, Jun 10, 2016 at 12:10 AM, Bijay Pathak < >> bijay.pat...@cloudwick.com> wrote: >> >>> Hello, >>> >>> Looks like you are hitting this: >>> https://issues.apache.org/jira/browse/HIVE-11940. >>> >>> Thanks, >>> Bijay >>> >>> >>> >>> On Thu, Jun 9, 2016 at 9:25 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> cam you provide a code snippet of how you are populating the target >>>> table from temp table. >>>> >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 9 June 2016 at 23:43, swetha kasireddy <swethakasire...@gmail.com> >>>> wrote: >>>> >>>>> No, I am reading the data from hdfs, transforming it , registering the >>>>> data in a temp table using registerTempTable and then doing insert >>>>> overwrite using Spark SQl' hiveContext. >>>>> >>>>> On Thu, Jun 9, 2016 at 3:40 PM, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> how are you doing the insert? from an existing table? >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 9 June 2016 at 21:16, Stephen Boesch <java...@gmail.com> wrote: >>>>>> >>>>>>> How many workers (/cpu cores) are assigned to this job? >>>>>>> >>>>>>> 2016-06-09 13:01 GMT-07:00 SRK <swethakasire...@gmail.com>: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> How to insert data into 2000 partitions(directories) of >>>>>>>> ORC/parquet at a >>>>>>>> time using Spark SQL? It seems to be not performant when I try to >>>>>>>> insert >>>>>>>> 2000 directories of Parquet/ORC using Spark SQL. Did anyone face >>>>>>>> this issue? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-into-2000-partitions-directories-of-ORC-parquet-at-a-time-using-Spark-SQL-tp27132.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >