Re: Creating Index and no performance improvements

Hilmi Egemen Ciritoğlu Thu, 27 Oct 2016 07:09:11 -0700

thank you for your answer by the way this is explain commands out:

EXPLAIN select AVG(dryBulbTempF) FROM testindex WHERE ndate = '20070710';
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1


STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: testindex
            filterExpr: (ndate = '20070710') (type: boolean)
            Statistics: Num rows: 4285905 Data size: 445734176 Basic stats:
COMPLETE Column stats: NONE
            Filter Operator
              predicate: (ndate = '20070710') (type: boolean)
              Statistics: Num rows: 2142952 Data size: 222867035 Basic
stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: drybulbtempf (type: int)
                outputColumnNames: drybulbtempf
                Statistics: Num rows: 2142952 Data size: 222867035 Basic
stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: avg(drybulbtempf)
                  mode: hash
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
                  Reduce Output Operator
                    sort order:
                    Statistics: Num rows: 1 Data size: 0 Basic stats:
PARTIAL Column stats: NONE
                    value expressions: _col0 (type: struct<count:bigint,sum:
double,input:int>)
      Reduce Operator Tree:
        Group By Operator
          aggregations: avg(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column
stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.
SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.
HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink



2016-10-27 13:58 GMT+01:00 Mich Talebzadeh <mich.talebza...@gmail.com>:

> Long answer
>
> An EXTERNAL TABLE in hive is a stub for table in Hive metastore. Data is
> not managed by Hive and it is an HDFS concern.
>
> For that reason if you drop an EXTERNAL table in Hive you just drop
> references for that table in Hive metastore. Data is not deleted.
>
> The conventional  indexes don't work in Hive (although I would like to see
> they work) is that HDFS lacks the capability to co-locate index blocks. The
> classic index keeps the index blocks contiguous on Disk.
>
> However, I assume your external table is used for csv files etc.
>
> What you can do is to create a managed (normal) Hive table in ORC format
> and partition it by ndate or Dtsramp = '2016-10-27'
>
> Then you can do periodic INSERT/SELECT from the external table to ORC
> table. In that case you will utilise Store Index in Hive.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 October 2016 at 12:41, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Have you checked running SQL with
>>
>> EXPLAIN EXTENDED SELECT ......
>>
>> And post the results.
>>
>> In general your compact index will not work
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 October 2016 at 12:34, Hilmi Egemen Ciritoğlu <
>> hilmi.egemen.cirito...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I created table and index on this table. Also after I alter that index
>>> with rebuild option. But still can't see performance improvements.
>>>
>>> Only, I can see performance improvements when I query to index table.
>>>
>>> I used following commands :
>>>
>>> CREATE EXTERNAL TABLE testindex(wban INT, ndate STRING, time STRING,
>>> stationType STRING, skyCondition STRING, skyConditionFlag STRING,
>>> visibility INT,
>>> visibilityFlag INT, weatherType STRING, weatherTypeFlag STRING,
>>> dryBulbTempF INT, dryBulbTempFFlag INT, wetBulbTempF INT,
>>> wetBulbTempFFlag INT, wetBulbTempC DOUBLE, webBulbTempCFlag INT,
>>> dewPointF INT, dewPointFFlag INT, dewPointC INT, dewPointCFlag INT,
>>> relativeHumidity INT, relativeHumidityFlag INT, windSpeed DOUBLE,
>>> windSpeedFlag INT, windDirection STRING, windDirectionFlag INT,
>>> valueForWindCharacter STRING, valueForWindCharacterFlag INT,
>>> stationPressure DECIMAL, stationPressureFlag INT, pressureTendency INT,
>>> pressureTendencyFlag INT, pressureChange STRING, pressureChangeFlag INT,
>>> seaLevelPressure INT, seaLevelPressureFlag INT, recordType STRING,
>>> recordTypeFlag INT, hourlyPrecip DECIMAL, hourlyPrecipFlag INT,
>>> altimeter INT, altimeterFlag INT)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY ','
>>> LINES TERMINATED BY '\n';
>>>
>>> LOAD DATA LOCAL INPATH '/home/user1/Downloads/dataset' INTO TABLE
>>> testindex;
>>>
>>> CREATE INDEX test ON TABLE testindex (ndate) AS 'COMPACT' WITH DEFERRED
>>> REBUILD;
>>> ALTER INDEX test ON testindex REBUILD;
>>>
>>> Do you know any reason behind this issue ?
>>>
>>> Any help would be highly appreciated...
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Creating Index and no performance improvements

Reply via email to