Always-prog commented on PR #33300:
URL: https://github.com/apache/superset/pull/33300#issuecomment-2897460497

   
   After hours of trying to pass all the test cases, I realized that the entire 
direction I was going in — trying to temporarily close the session and "bring 
everything back" — is flawed and won’t work.
   
   ---
   
   ### What I tried and what I discovered
   
   I attempted to temporarily close the SQLAlchemy session during queries to 
datasources to avoid idle-in-transaction sessions. My approach was:
   
   1. Save all objects from the current session via `identity_map`.
   2. Call `db.session.close()` to release the connection.
   3. After the task finishes, reopen the session and call `db.session.merge()` 
on each object to re-attach them.
   
   This worked in simple, isolated cases — such as when loading chart data — 
where no new objects were created and no state was reused across the session 
boundary.
   
   However, I ran into serious issues in datasource-related test cases and in 
SQL Lab, especially around the `Query` model.
   
   ---
   
   ### A representative issue with the `Query` object
   
   In SQL Lab, a `Query` instance is created and added to the session as seen 
here:
   
[https://github.com/apache/superset/blob/e2a22d481c0560eb3afb317367765b62dd46c310/superset/commands/sql\_lab/execute.py#L144](https://github.com/apache/superset/blob/e2a22d481c0560eb3afb317367765b62dd46c310/superset/commands/sql_lab/execute.py#L144)
   
   When I inspect the identity map before closing the session:
   
   ```python
   logger.info(list(db.session.identity_map.values()))
   # Output: [..., <Query at 0x7f14453c7310>, ...]
   ```
   
   This shows the `Query` object in memory at `0x7f14453c7310`.
   
   But once I call `db.session.close()`, the instance becomes detached — 
effectively lost from SQLAlchemy's tracking system.
   
   Later, when I call:
   
   ```python
   db.session.merge(query)
   ```
   …it creates a **new** `Query` instance with a different memory address. Any 
existing references to the old object now point to a stale, invalid object. 
Trying to use that variable (`query`) in the code leads to errors like:
   
   > `Instance '<Query at 0x7f14453c7310>' is not persistent within this 
Session.`
   > `Object was deleted or detached.`
   
   ---
   
   ### Why it fails
   
   This breaks because **SQLAlchemy sessions are not designed to be paused and 
resumed**.
   
   Once you close a session:
   
   * All attached objects become detached.
   * They cannot be used unless re-merged.
   * Any code still holding a reference to an old instance is now working with 
an invalid object.
   
   Even calling `flush()` before closing the session doesn’t help. It pushes 
changes to the DB, but the objects are still bound to the session.
   
   And this problem doesn’t just affect the `Query` model — it will happen with 
**any SQLAlchemy model**, whether it exists today or is added in the future.
   
   ---
   
   ### Conclusion
   
   The key takeaway:
   
   > **Closing a session and trying to "restore everything as it was" is 
fundamentally incompatible with how SQLAlchemy works.**
   
   This kind of logic might work in small, single-purpose tasks, but it breaks 
completely when:
   
   * new objects are created after the session starts,
   * objects are reused across the codebase,
   * or internal state depends on session identity.
   
   This leads to detached references, subtle bugs, and hard-to-diagnose 
failures.
   
   ---
   
   ### So what *can* be done to fix idle-in-transaction issue?
   
   The proper long-term solution is to **refactor the code around fetch data 
from datasources** — making it safe to:
   * close the session without losing any critical objects,
   * reopen it later, and
   * *explicitly re-fetch all required objects from the database* (instead of 
trying to preserve state in memory).
   
   But that’s a significant amount of work — and not something that can be 
hacked around with `merge()` and `identity_map`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to