Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Rajesh Balamohan
#x27; AND > `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”) take 8 > seconds. > > > thanks > > From: Mich Talebzadeh > Date: Sunday, April 17, 2016 at 2:52 PM > > To: maurin lenglart > Cc: "user @spark" > Subject: Re: orc vs

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh
event_date`,sum(`bookings`) as `bookings`,sum(`dealviews`) > as `dealviews` FROM myTable WHERE `event_date` >= '2016-01-06' AND > `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”) take 8 > seconds. > > > thanks > > From: Mich Talebzadeh

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart
7; AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”) take 8 seconds. thanks From: Mich Talebzadeh mailto:mich.talebza...@gmail.com>> Date: Sunday, April 17, 2016 at 2:52 PM To: maurin lenglart mailto:mau...@cuberonlabs.com>> Cc: "user @spark"

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh
’) >- 15 seconds on ORC using sqlContext(‘use myDatabase’) > > The use case that I have is the second and slowest benchmark. Is there > something I can do to speed that up? > > thanks > > > > From: Mich Talebzadeh > Date: Sunday, April 17, 2016 at 2:22 PM > >

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart
eh mailto:mich.talebza...@gmail.com>> Date: Sunday, April 17, 2016 at 2:22 PM To: maurin lenglart mailto:mau...@cuberonlabs.com>> Cc: "user @spark" mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Hi Maurin, Have you tried to create

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh
).registerAsTable(‘myTable’) > The queries done on myTable take at least twice the amount of time > compared to queries done on the table loaded with hive directly. > For technical reasons my pipeline is not fully migrated to use hive > tables, and in a lot of place I still manua

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart
; Date: Saturday, April 16, 2016 at 4:14 AM To: maurin lenglart mailto:mau...@cuberonlabs.com>>, "user @spark" mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Apologies that should read desc formatted Example for table dummy hive

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Mich Talebzadeh
t: FAILED: SemanticException [Error 10001]: Table > not found statistics” > Even after doing something like : “ANALYZE TABLE myTable COMPUTE > STATISTICS FOR COLUMNS" > > Thank you for your answer. > > From: Mich Talebzadeh > Date: Saturday, April 16, 2016 at 12:32 AM >

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart
"user @spark" mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Generally a recommendation (besides the issue) - Do not put dates as String. I recommend here to make them ints. It will be in both cases much faster. It could be that you load th

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart
u for your answer. From: Mich Talebzadeh mailto:mich.talebza...@gmail.com>> Date: Saturday, April 16, 2016 at 12:32 AM To: maurin lenglart mailto:mau...@cuberonlabs.com>> Cc: "user @spark" mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Jörn Franke
Generally a recommendation (besides the issue) - Do not put dates as String. I recommend here to make them ints. It will be in both cases much faster. It could be that you load them differently in the tables. Generally for these tables you should insert them in both cases sorted into the tables

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Mich Talebzadeh
Have you analysed statistics on the ORC table? How many rows are there? Also send the outp of desc formatted statistics HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart
Hi, I am executing one query : “SELECT `event_date` as `event_date`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` >= '2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2” My table was created something like : CREATE TA