paleolimbot commented on code in PR #176:
URL: https://github.com/apache/sedona-db/pull/176#discussion_r2400522763


##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -151,6 +151,29 @@ def count(self) -> int:
         """
         return self._impl.count()
 
+    def __len__(self) -> int:
+        """Compute the number of rows in the DataFrame"""
+        return self.count()
+
+    @property
+    def columns(self) -> list[str]:
+        """Return the column names in the DataFrame"""
+        columns = list()
+        field_index = 0
+        while True:
+            try:
+                columns.append(self._impl.schema().field(field_index).name)
+                field_index += 1
+            except IndexError:
+                break
+
+        return columns

Review Comment:
   Agreed...I mostly just didn't expose enough from the schema object when I 
wrote it 😬 



##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -151,6 +151,29 @@ def count(self) -> int:
         """
         return self._impl.count()
 
+    def __len__(self) -> int:
+        """Compute the number of rows in the DataFrame"""
+        return self.count()
+
+    @property
+    def columns(self) -> list[str]:
+        """Return the column names in the DataFrame"""
+        columns = list()
+        field_index = 0
+        while True:
+            try:
+                columns.append(self._impl.schema().field(field_index).name)
+                field_index += 1
+            except IndexError:
+                break
+
+        return columns
+
+    @property
+    def shape(self) -> tuple[int, int]:
+        """Return the shape of the DataFrame as a tuple of integers 
corresponding to (rows, columns)"""
+        return self.count(), len(self.columns)

Review Comment:
   Neither Ibis nor DuckDB implement a `.shape` accessor. Given that this would 
also trigger execution, I don't think it's a good idea to include this 😬 



##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -151,6 +151,29 @@ def count(self) -> int:
         """
         return self._impl.count()
 
+    def __len__(self) -> int:
+        """Compute the number of rows in the DataFrame"""
+        return self.count()

Review Comment:
   I wonder if we should include this or not. Our "DataFrame" hasn't been 
materialized yet and might well take quite a long time to do so.
   
   As a data point, Ibis implements `__len__()` but returns an error:
   
   ```
   ExpressionError: Use .count() instead
   ```
   
   Another data point...duckdb implements `__len__()` and executes the query 
like this implementation. I'm not sure what pyspark does here.
   
   I would personally lean towards the Ibis approach (return an error forcing a 
user to explicitly count).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to