[
https://issues.apache.org/jira/browse/BEAM-12533?focusedWorklogId=614782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614782
]
ASF GitHub Bot logged work on BEAM-12533:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Jun/21 00:25
Start Date: 25/Jun/21 00:25
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#15072:
URL: https://github.com/apache/beam/pull/15072#discussion_r658370349
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1843,9 +1873,69 @@ def repeat(self, repeats, axis):
f"DeferredSeries (encountered {type(repeats)}).")
+def _justify_str_column(objs, rjust=True):
+ strs = [str(o) for o in objs]
+ maxlen = max(len(s) for s in strs)
+ return [s.rjust(maxlen) if rjust else s.ljust(maxlen) for s in strs]
+
+
+def _ljustify_str_column(objs):
+ strs = [str(o) for o in objs]
+ maxlen = max(len(s) for s in strs)
+ return [s.ljust(maxlen) for s in strs]
+
+
+def _justify_columns_and_transpose(columns, rjust=True):
+ for row in zip(*[_justify_str_column(objs, rjust) for objs in columns]):
+ yield ' '.join(row)
+
+
@populate_not_implemented(pd.DataFrame)
@frame_base.DeferredFrame._register_for(pd.DataFrame)
class DeferredDataFrame(DeferredDataFrameOrSeries):
Review comment:
DataFrames are just concatenated Series, but they also have a common,
shared index, so that logic will only need to happen once for the DataFrame
case.
I tried to share as much code as possible by pulling out the justification
logic. It's probably possible to pull out some common logic for rendering the
index though, I'll see what I can come up with there.
WDYT about the general approach of using ":" in the columns and "??" for the
length to indicate this is a deferred object?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 614782)
Time Spent: 1h 10m (was: 1h)
> DeferedSeries and DeferredDataFrame should have a useful repr
> -------------------------------------------------------------
>
> Key: BEAM-12533
> URL: https://issues.apache.org/jira/browse/BEAM-12533
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Fix For: 2.32.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> DeferredSeries and DeferredDataFrame just use the default __repr__
> implementation right now, which means outputting them in a notebook is not
> useful at all. Users will need to inspect columns, dtypes, index, name, etc..
> manually. We should include basic information about the frames in a simple
> __repr__ implementation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)