thank you for your answer by the way this is explain commands out: EXPLAIN select AVG(dryBulbTempF) FROM testindex WHERE ndate = '20070710'; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1
STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: testindex filterExpr: (ndate = '20070710') (type: boolean) Statistics: Num rows: 4285905 Data size: 445734176 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (ndate = '20070710') (type: boolean) Statistics: Num rows: 2142952 Data size: 222867035 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: drybulbtempf (type: int) outputColumnNames: drybulbtempf Statistics: Num rows: 2142952 Data size: 222867035 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: avg(drybulbtempf) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE value expressions: _col0 (type: struct<count:bigint,sum: double,input:int>) Reduce Operator Tree: Group By Operator aggregations: avg(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred. SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io. HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink 2016-10-27 13:58 GMT+01:00 Mich Talebzadeh <mich.talebza...@gmail.com>: > Long answer > > An EXTERNAL TABLE in hive is a stub for table in Hive metastore. Data is > not managed by Hive and it is an HDFS concern. > > For that reason if you drop an EXTERNAL table in Hive you just drop > references for that table in Hive metastore. Data is not deleted. > > The conventional indexes don't work in Hive (although I would like to see > they work) is that HDFS lacks the capability to co-locate index blocks. The > classic index keeps the index blocks contiguous on Disk. > > However, I assume your external table is used for csv files etc. > > What you can do is to create a managed (normal) Hive table in ORC format > and partition it by ndate or Dtsramp = '2016-10-27' > > Then you can do periodic INSERT/SELECT from the external table to ORC > table. In that case you will utilise Store Index in Hive. > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 27 October 2016 at 12:41, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Have you checked running SQL with >> >> EXPLAIN EXTENDED SELECT ...... >> >> And post the results. >> >> In general your compact index will not work >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 27 October 2016 at 12:34, Hilmi Egemen Ciritoğlu < >> hilmi.egemen.cirito...@gmail.com> wrote: >> >>> Hi guys, >>> >>> I created table and index on this table. Also after I alter that index >>> with rebuild option. But still can't see performance improvements. >>> >>> Only, I can see performance improvements when I query to index table. >>> >>> I used following commands : >>> >>> CREATE EXTERNAL TABLE testindex(wban INT, ndate STRING, time STRING, >>> stationType STRING, skyCondition STRING, skyConditionFlag STRING, >>> visibility INT, >>> visibilityFlag INT, weatherType STRING, weatherTypeFlag STRING, >>> dryBulbTempF INT, dryBulbTempFFlag INT, wetBulbTempF INT, >>> wetBulbTempFFlag INT, wetBulbTempC DOUBLE, webBulbTempCFlag INT, >>> dewPointF INT, dewPointFFlag INT, dewPointC INT, dewPointCFlag INT, >>> relativeHumidity INT, relativeHumidityFlag INT, windSpeed DOUBLE, >>> windSpeedFlag INT, windDirection STRING, windDirectionFlag INT, >>> valueForWindCharacter STRING, valueForWindCharacterFlag INT, >>> stationPressure DECIMAL, stationPressureFlag INT, pressureTendency INT, >>> pressureTendencyFlag INT, pressureChange STRING, pressureChangeFlag INT, >>> seaLevelPressure INT, seaLevelPressureFlag INT, recordType STRING, >>> recordTypeFlag INT, hourlyPrecip DECIMAL, hourlyPrecipFlag INT, >>> altimeter INT, altimeterFlag INT) >>> ROW FORMAT DELIMITED >>> FIELDS TERMINATED BY ',' >>> LINES TERMINATED BY '\n'; >>> >>> LOAD DATA LOCAL INPATH '/home/user1/Downloads/dataset' INTO TABLE >>> testindex; >>> >>> CREATE INDEX test ON TABLE testindex (ndate) AS 'COMPACT' WITH DEFERRED >>> REBUILD; >>> ALTER INDEX test ON testindex REBUILD; >>> >>> Do you know any reason behind this issue ? >>> >>> Any help would be highly appreciated... >>> >>> Thanks. >>> >> >> >