Hello Everyone, Myself Aira I would like to share a progress update on my GSoC 2026 project: "Apache Fineract Business Intelligence".
The project builds a downstream, containerized BI and analytics stack that extracts data from Fineract, processes it in a dedicated analytics warehouse, and displays interactive dashboards in Apache Superset—without impacting the transactional database's performance. Here is a summary of the architecture, modules, progress, and current state. 1. Architecture & Modules The stack consists of three downstream layers running via Docker Compose: a Ingestion (Python Extractor CLI ): Streams data from Fineract read-replica views in batches using server-side cursors and upserts it into the warehouse raw schema. Includes safety gates for PostgreSQL replica lag (<300s) and Close-of-Business (COB) completion checks. b Transformation(dbt): Cleans and models raw data through a 4-layer DAG (Raw → Staging → Intermediate → Mart). Implements client PII pseudonymization (MD5 hashing), generates calendar-date spines for daily snapshots, and calculates CGAP/MIX Market standard financial metrics. c Visualization (Apache Superset): Displays dashboards using virtual SQL datasets that enforce branch-level Row-Level Security (RLS) via Jinja2 templates. Fully automated and bootstrapped on startup via custom Python scripts using SQLAlchemy. 2. Completed Milestones (Pull Requests) 1.PR 1: Extractor & Ingestion Infrastructure: Built the CLI supporting `backfill` and `incremental` runs with watermark tracking. Set up safety gates, pipeline audit logging , and Docker Compose setup for PostgreSQL instances. 2.PR 2: dbt Transformation Layer: Established the dbt models, custom macros (safe division), PII hashing, and daily snapshotting algorithms. Calculated PAR 30/60/90 ratios and NPA flags. Added 57 automated data quality tests. 3.PR 3: Superset Bootstrap & Security: Set up automated provisioning for database connections, virtual datasets, charts, and dashboards. Implemented RLS so branch managers are restricted to their branch's data on login. 4.PR 4: Portfolio Health & Repayment Marts: Added modeling and automated charts for stock/flow metrics (GLP, active loan counts, disbursements, collection efficiency, repayments split by principal/interest/fees/penalties). 3. Built Dashboards Two fully configured dashboards are programmatically bootstrapped: a Delinquency and PAR: Key metrics (PAR 30/60/90, NPA Exposure), trend charts, bucket distributions (Current, Watch-list, PAR 30-59, etc.), branch/product bar breakdowns, and a detailed summary table. b Portfolio Health: Gross Loan Portfolio (GLP), Active Loans, Borrowers, Average Loan Size, composition trends, transaction flow analysis (disbursements vs collections), and branch concentration. 4. Current Project State * Operational: The entire pipeline runs end-to-end with a single docker compose up. * Tested: The dbt DAG successfully runs and passes all 57 data quality tests against seeded demo data. * Compliant: The codebase passes Apache RAT license scans and lint checks. * Documented: A detailed README has been created for project setup. Repo: https://github.com/apache/fineract-business-intelligence I look forward to your feedback or any questions you may have! Best regards, Aira Jena GSoC 2026 Contributor
