[ https://issues.apache.org/jira/browse/PIG-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy reassigned PIG-5357: --------------------------------------- Assignee: Jacob Tolar Hadoop Flags: Reviewed Fix Version/s: 0.18.0 {quote}All of the internal code now uses InternalDistinctBag instead of DistinctDataBag. {quote} Difference is that InternalDistinctBag proactively spills based on memory usage and caching limit configured. It also spills when spill() is called if read is not already started. DistinctDataBag does not have proactive spilling, but takes care of spilling even if it is in the middle of a read when spill() is called. So it is fine to still use it. +1. Committed to trunk. Thanks [~jtolar] for this enhancement. > BagFactory interface should support creating a distinct bag from a set > ---------------------------------------------------------------------- > > Key: PIG-5357 > URL: https://issues.apache.org/jira/browse/PIG-5357 > Project: Pig > Issue Type: Improvement > Reporter: Jacob Tolar > Assignee: Jacob Tolar > Priority: Minor > Fix For: 0.18.0 > > Attachments: PIG-5357-1.patch, PIG-5357-2.patch > > > It would be nice if BagFactory supported creating a distinct bag from a set > of tuples, similar to: > {code:java} > newDefaultBag(List<Tuple> listOfTuples); > {code} > [https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/BagFactory.java] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)