[jira] [Created] (HIVE-7990) With fetch column stats disabled number of elements in grouping set is not taken into account

Mostafa Mokhtar (JIRA) Thu, 04 Sep 2014 17:22:06 -0700

Mostafa Mokhtar created HIVE-7990:
-------------------------------------

             Summary: With fetch column stats disabled number of elements in 
grouping set is not taken into account
                 Key: HIVE-7990
                 URL: https://issues.apache.org/jira/browse/HIVE-7990
             Project: Hive
          Issue Type: Bug
          Components: File Formats
    Affects Versions: 0.13.1
         Environment: Loading into orc
            Reporter: Mostafa Mokhtar
            Assignee: Prasanth J
             Fix For: 0.14.0



When loading into an un-paritioned ORC table WriterImpl$StructTreeWriter.write 
method is synchronized.

When hive.optimize.sort.dynamic.partition is enabled the current thread will be 
the only writer and the synchronization is not needed.

Also  checking for memory per row is an over kill , this can be done per 1K 
rows or such

{code}
  public void addRow(Object row) throws IOException {
    synchronized (this) {
      treeWriter.write(row);
      rowsInStripe += 1;
      if (buildIndex) {
        rowsInIndex += 1;

        if (rowsInIndex >= rowIndexStride) {
          createRowIndexEntry();
        }
      }
    }
    memoryManager.addedRow();
  }
{code}

This can improve ORC load performance by 7% 

{code}
Stack Trace     Sample Count    Percentage(%)
WriterImpl.addRow(Object)       5,852   65.782
   WriterImpl$StructTreeWriter.write(Object)    5,163   58.037
   MemoryManager.addedRow()     666     7.487
      MemoryManager.notifyWriters()     648     7.284
         WriterImpl.checkMemory(double) 645     7.25
            WriterImpl.flushStripe()    643     7.228
               
WriterImpl$StructTreeWriter.writeStripe(OrcProto$StripeFooter$Builder, int)     
 584     6.565
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7990) With fetch column stats disabled number of elements in grouping set is not taken into account

Reply via email to