[ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363001#comment-15363001
 ] 

Sahil Takiar commented on HIVE-7224:
------------------------------------

[~vgumashta] is seems the behavior you are seeing is by design. Looking at 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions
 the following explanation of the {{--incremental}} property suggests that this 
is expected:

{quote}
Defaults to false. When set to false, the entire result set is fetched and 
buffered before being displayed, yielding optimal display column sizing. When 
set to true, result rows are displayed immediately as they are fetched, 
yielding lower latency and memory usage at the price of extra display column 
padding. Setting --incremental=true is recommended if you encounter an 
OutOfMemory on the client side (due to the fetched result set size being large).
{quote}

So it seems there is a tradeoff when using {{--incremental}} that the column 
padding won't be optimal, but memory usage will be better. This makes sense 
since the {{IncrementalRows}} class that controls this logic doesn't do any 
buffering of rows, so it cannot predict what the optimal column width should be 
since it only looks at one row at a time.

I think a better approach for the {{IncrementalRows}} class would be to instead 
buffer 1000 rows at a time (by default, this value can be configurable), this 
way it can optimally set the column width for each set of 1000 rows. This 
shouldn't introduce memory issues unless each row is huge, in which case the 
use can decrease the buffer size to say 100 or 10.

What do you think?

> Set incremental printing to true by default in Beeline
> ------------------------------------------------------
>
>                 Key: HIVE-7224
>                 URL: https://issues.apache.org/jira/browse/HIVE-7224
>             Project: Hive
>          Issue Type: Bug
>          Components: Beeline, Clients, JDBC
>    Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
>            Reporter: Vaibhav Gumashta
>            Assignee: Sahil Takiar
>         Attachments: HIVE-7224.1.patch, HIVE-7224.2.patch, HIVE-7224.2.patch, 
> HIVE-7224.3.patch
>
>
> See HIVE-7221.
> By default beeline tries to buffer the entire output relation before printing 
> it on stdout. This can cause OOM when the output relation is large. However, 
> beeline has the option of incremental prints. We should keep that as the 
> default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to