Hello Everyone, Myself Aira

I would like to share a progress update on my GSoC 2026 project: "Apache
Fineract Business Intelligence".

The project builds a downstream, containerized BI and analytics stack that
extracts data from Fineract, processes it in a dedicated analytics
warehouse, and displays interactive dashboards in Apache Superset—without
impacting the transactional database's performance.

Here is a summary of the architecture, modules, progress, and current state.

1. Architecture & Modules

The stack consists of three downstream layers running via Docker Compose:
a Ingestion (Python Extractor CLI ): Streams data from Fineract
read-replica views in batches using server-side cursors and upserts it into
the warehouse raw schema. Includes safety gates for PostgreSQL replica lag
(<300s) and Close-of-Business (COB) completion checks.
b Transformation(dbt): Cleans and models raw data through a 4-layer DAG
(Raw → Staging → Intermediate → Mart). Implements client PII
pseudonymization (MD5 hashing), generates calendar-date spines for daily
snapshots, and calculates CGAP/MIX Market standard financial metrics.
c Visualization (Apache Superset): Displays dashboards using virtual SQL
datasets that enforce branch-level Row-Level Security (RLS) via Jinja2
templates. Fully automated and bootstrapped on startup via custom Python
scripts using SQLAlchemy.


2. Completed Milestones (Pull Requests)

1.PR 1: Extractor & Ingestion Infrastructure: Built the CLI supporting
`backfill` and `incremental` runs with watermark tracking. Set up safety
gates, pipeline audit logging , and Docker Compose setup for PostgreSQL
instances.
2.PR 2: dbt Transformation Layer: Established the dbt models, custom macros
(safe division), PII hashing, and daily snapshotting algorithms. Calculated
PAR 30/60/90 ratios and NPA flags. Added 57 automated data quality tests.
3.PR 3: Superset Bootstrap & Security: Set up automated provisioning for
database connections, virtual datasets, charts, and dashboards. Implemented
RLS so branch managers are restricted to their branch's data on login.
4.PR 4: Portfolio Health & Repayment Marts: Added modeling and automated
charts for stock/flow metrics (GLP, active loan counts, disbursements,
collection efficiency, repayments split by
principal/interest/fees/penalties).


3. Built Dashboards

Two fully configured dashboards are programmatically bootstrapped:
a Delinquency and PAR: Key metrics (PAR 30/60/90, NPA Exposure), trend
charts, bucket distributions (Current, Watch-list, PAR 30-59, etc.),
branch/product bar breakdowns, and a detailed summary table.
b Portfolio Health: Gross Loan Portfolio (GLP), Active Loans, Borrowers,
Average Loan Size, composition trends, transaction flow analysis
(disbursements vs collections), and branch concentration.



4. Current Project State

* Operational: The entire pipeline runs end-to-end with a single docker
compose up.
* Tested: The dbt DAG successfully runs and passes all 57 data quality
tests against seeded demo data.
* Compliant: The codebase passes Apache RAT license scans and lint checks.
* Documented: A detailed README has been created for project setup.
Repo: https://github.com/apache/fineract-business-intelligence

I look forward to your feedback or any questions you may have!

Best regards,
Aira Jena
GSoC 2026 Contributor

Reply via email to