[ 
https://issues.apache.org/jira/browse/PIG-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reassigned PIG-5357:
---------------------------------------

         Assignee: Jacob Tolar
     Hadoop Flags: Reviewed
    Fix Version/s: 0.18.0

{quote}All of the internal code now uses InternalDistinctBag instead of 
DistinctDataBag.
{quote}
Difference is that InternalDistinctBag proactively spills based on memory usage 
and caching limit configured. It also spills when spill() is called if read is 
not already started.  DistinctDataBag does not have proactive spilling, but 
takes care of spilling even if it is in the middle of a read when spill() is 
called. So it is fine to still use it.

 

+1. Committed to trunk. Thanks [~jtolar] for this enhancement.

> BagFactory interface should support creating a distinct bag from a set
> ----------------------------------------------------------------------
>
>                 Key: PIG-5357
>                 URL: https://issues.apache.org/jira/browse/PIG-5357
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jacob Tolar
>            Assignee: Jacob Tolar
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: PIG-5357-1.patch, PIG-5357-2.patch
>
>
> It would be nice if BagFactory supported creating a distinct bag from a set 
> of tuples, similar to:
> {code:java}
> newDefaultBag(List<Tuple> listOfTuples);
> {code}
> [https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/BagFactory.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to