Re: [PR] feat: add `head`, `tail` methods [datafusion-python]

via GitHub Sun, 13 Oct 2024 06:11:10 -0700


timsaucer commented on code in PR #915:
URL: https://github.com/apache/datafusion-python/pull/915#discussion_r1798327582



##########
python/datafusion/dataframe.py:
##########
@@ -223,6 +223,30 @@ def limit(self, count: int, offset: int = 0) -> DataFrame:
         """
         return DataFrame(self.df.limit(count, offset))
 
+    def head(self, n: int) -> DataFrame:

Review Comment:
   Would it be helpful to have a default `n`?



##########
python/datafusion/dataframe.py:
##########
@@ -223,6 +223,30 @@ def limit(self, count: int, offset: int = 0) -> DataFrame:
         """
         return DataFrame(self.df.limit(count, offset))
 
+    def head(self, n: int) -> DataFrame:
+        """Return a new :py:class:`DataFrame` with a limited number of rows.
+
+        Args:
+            n: Number of rows to take from the head of the DataFrame.
+
+        Returns:
+            DataFrame after limiting.
+        """
+        return DataFrame(self.df.limit(n, 0))
+
+    def tail(self, n: int) -> DataFrame:
+        """Return a new :py:class:`DataFrame` with a limited number of rows.
+
+        Be aware this could be potentially expensive due to the size of the 
frame.
+

Review Comment:
   Is there a better way we could do this? Maybe add something upstream if 
necessary?
   
   As I'm thinking of it, I don't know that this operation is necessarily well 
defined. Just like with `limit` when you call it multiple times on a large 
dataframe you get different results, I would expect different results from 
multiple calls here.
   
   If we do put this in, I would suggest adding more text to the description to 
explain why this is an expensive operation - that it performs a collect to 
determine the size of the dataframe. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add `head`, `tail` methods [datafusion-python]

Reply via email to