korbit-ai[bot] commented on code in PR #34519:
URL: https://github.com/apache/superset/pull/34519#discussion_r2249267636
##########
superset/common/utils/dataframe_utils.py:
##########
@@ -43,11 +43,19 @@ def left_join_df(
def full_outer_join_df(
left_df: pd.DataFrame,
right_df: pd.DataFrame,
+ join_keys: list[str] | None = None,
lsuffix: str = "",
rsuffix: str = "",
) -> pd.DataFrame:
- df = left_df.join(right_df, lsuffix=lsuffix, rsuffix=rsuffix, how="outer")
- df.reset_index(inplace=True)
+ if join_keys:
+ df = left_df.set_index(join_keys).join(
+ right_df.set_index(join_keys), lsuffix=lsuffix, rsuffix=rsuffix,
how="outer"
+ )
+ df.reset_index(inplace=True)
+
+ else:
+ df = left_df.join(right_df, lsuffix=lsuffix, rsuffix=rsuffix,
how="outer")
Review Comment:
### Unsafe default join behavior for time series alignment <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The default join behavior in pandas when join_keys is not provided uses the
index for joining, which may not be appropriate for time series data alignment.
###### Why this matters
If the indices of the dataframes are not properly set for time series data,
the join operation could result in incorrect data alignment or missing dates,
contradicting the developer's intent to fix misalignment issues.
###### Suggested change ∙ *Feature Preview*
Require join_keys parameter to be mandatory to ensure explicit join criteria
for time series alignment:
```python
def full_outer_join_df(
left_df: pd.DataFrame,
right_df: pd.DataFrame,
join_keys: list[str], # Remove None default
lsuffix: str = "",
rsuffix: str = "",
) -> pd.DataFrame:
df = left_df.set_index(join_keys).join(
right_df.set_index(join_keys), lsuffix=lsuffix, rsuffix=rsuffix,
how="outer"
)
df.reset_index(inplace=True)
return df
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/cf016c84-3cfd-4bbd-9c77-29d75491b49a/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/cf016c84-3cfd-4bbd-9c77-29d75491b49a?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/cf016c84-3cfd-4bbd-9c77-29d75491b49a?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/cf016c84-3cfd-4bbd-9c77-29d75491b49a?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/cf016c84-3cfd-4bbd-9c77-29d75491b49a)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:5666fd5e-feb0-431b-a667-9adb5463ea98 -->
[](5666fd5e-feb0-431b-a667-9adb5463ea98)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]