[jira] [Commented] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

ASF GitHub Bot (Jira) Wed, 14 Aug 2024 20:13:10 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873787#comment-17873787
 ]


ASF GitHub Bot commented on HADOOP-18873:
-----------------------------------------

zzccctv commented on PR #6010:
URL: https://github.com/apache/hadoop/pull/6010#issuecomment-2290467176

   @saxenapranav Hello.
   ii.All the object can be GCed and the direct-memory allocated may be 
returned on the GC. What if the process crashes, the memory never goes back and 
cause memory issue on the machine.
   Under what environment OOM occurs and how to reproduce it




> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> ------------------------------------------------------
>
>                 Key: HADOOP-18873
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18873
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.3.4
>            Reporter: Pranav Saxena
>            Assignee: Pranav Saxena
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

Reply via email to