The Challenge
FinServe Analytics, a financial compliance consulting firm, had a critical problem: their analysts spent 40% of their time searching through regulatory documents, SEC filings, and internal compliance guidelines to answer client questions.
With 100,000+ documents across multiple regulatory frameworks (SEC, FINRA, CFTC), finding relevant information was slow and error-prone. They needed a system that could:
- Deliver accurate answers with source citations for audits
- Search across multiple document types (PDFs, Word docs, HTML)
- Handle complex financial terminology and acronyms
- Meet strict audit and compliance requirements (SOC 2, SEC)
- Provide data residency options for sensitive client data
Critical Constraints:
- Zero tolerance for hallucinations or incorrect citations
- Must maintain complete audit trail of all queries
- Needed on-premises deployment option for sensitive data
- Had to explain its reasoning process to compliance officers
Our Approach
We designed a multi-stage RAG (Retrieval-Augmented Generation) system with strong emphasis on accuracy, auditability, and explainability.
Phase 1: Document Processing & Embedding (Weeks 1-3)
Challenge: Financial documents are complex—nested tables, footnotes, cross-references, and domain-specific terminology.
Solution:
- Built custom document parsers for PDFs, Word docs, and HTML using PyMuPDF and python-docx
- Chunking strategy: 512-token chunks with 50-token overlap, preserving document structure
- Created a financial domain-specific embedding model by fine-tuning
sentence-transformerson regulatory text corpus - Indexed 100K+ documents into Pinecone vector database with metadata (document ID, section, date, source)
Phase 2: Retrieval & Reranking (Week 4)
Challenge: Vector search alone wasn't precise enough for regulatory queries—we needed hybrid search.
Solution:
- Implemented hybrid search combining:
- Vector similarity (embeddings) for semantic matching
- BM25 keyword search for exact term matching (critical for regulations with specific identifiers)
- Metadata filtering (date ranges, document types, agencies)
- Added a reranking layer using cross-encoder model to refine top 20 results down to top 5
- Achieved 92% recall@5 on test queries
Phase 3: Generation & Citation (Week 5)
Challenge: LLMs can hallucinate. In finance, that's unacceptable.
Solution:
- Used GPT-4 with strict prompt engineering:
- "Only answer based on provided context. If unsure, say so."
- Required inline citations using
[Source: Doc ID, Section X]format
- Implemented citation verification: every statement in the response was mapped back to source chunks
- Added confidence scoring: flagged low-confidence answers for human review
Phase 4: Audit & Compliance (Week 6)
Challenge: SEC auditors need to trace every answer back to its source.
Solution:
- Built complete audit trail in PostgreSQL:
- Query text, timestamp, user ID
- Retrieved documents with relevance scores
- Generated response with inline citations
- User feedback (thumbs up/down)
- Created admin dashboard showing:
- Query volume and patterns
- Most-cited documents
- Low-confidence responses flagged for review
- Implemented role-based access control (RBAC) with Okta SSO integration
Phase 5: Production Deployment (Weeks 7-8)
Deployed as a containerized microservice architecture:
- FastAPI backend with async processing
- React frontend with chat interface
- PostgreSQL for audit logs and user data
- Pinecone for vector search (SaaS)
- Optional: Self-hosted vector DB (Weaviate) for on-prem deployments
Architecture
┌──────────────────┐
│ Document Store │
│ (S3 / On-Prem) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Doc Processing │
│ (Chunking + │
│ Embedding) │
└────────┬─────────┘
│
▼
┌──────────────────┐ ┌─────────────────┐
│ Vector DB │ │ PostgreSQL │
│ (Pinecone / │◄──────┤ (Metadata + │
│ Weaviate) │ │ Audit Logs) │
└────────┬─────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ RAG API │
│ (FastAPI) │
│ │
│ 1. Query │
│ 2. Hybrid Search│
│ 3. Rerank │
│ 4. LLM Generate │
│ 5. Verify Cites │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Web App │
│ (React) │
└──────────────────┘
Results
After 8 weeks of development and 2 weeks of testing:
- 85% time savings: Analysts found answers in seconds instead of hours
- 100K+ documents indexed: Sub-second retrieval across entire corpus
- Passed SEC audit: Complete audit trail met all regulatory requirements
- 92% user satisfaction: Internal survey showed high confidence in answers
- Zero hallucination incidents: Citation verification caught all potential errors
Usage Statistics (First 3 Months):
- 15,000+ queries processed
- Average response time: 3.2 seconds
- 94% of answers required no human correction
- Most common query types: SEC filing interpretations, FINRA rule lookups
Key Technical Decisions
Why Hybrid Search?
Pure vector search missed exact regulation codes (e.g., "Rule 10b-5"). Combining BM25 keyword search with vector similarity gave us best of both worlds.
Why Reranking?
Initial retrieval is fast but imprecise. Reranking with a cross-encoder model improved relevance significantly with minimal latency cost.
Why PostgreSQL for Audit Logs?
We needed ACID transactions and complex queries for audit reports. NoSQL wasn't suitable.
Why Not Train a Custom LLM?
GPT-4 with strong prompting and retrieval was more cost-effective than training a domain-specific model. We invested in retrieval quality instead.
Security & Compliance Features
- Data Encryption: At-rest (AES-256) and in-transit (TLS 1.3)
- Access Controls: Role-based permissions with SSO via Okta
- Audit Logging: Every query, response, and user action logged
- Data Residency: On-premises deployment option for sensitive data
- Citation Verification: Automated checks to prevent hallucinations
- PII Redaction: Automatic detection and masking of sensitive information
Lessons Learned
-
Chunking strategy matters: We iterated multiple times to balance context window size with retrieval precision.
-
Hybrid search is essential for regulatory text: Pure semantic search missed important exact-match queries.
-
Citation verification is non-negotiable: Automated checks caught several edge cases where the LLM embellished beyond the source text.
-
User feedback loops improve quality: We collected thumbs up/down feedback and used it to refine prompts and retrieval parameters.
-
Compliance requirements drive architecture: Audit trail and access controls weren't afterthoughts—they shaped the entire system design.
Technology Stack
- LLM: OpenAI GPT-4, GPT-3.5-Turbo
- Embeddings: Fine-tuned Sentence-Transformers (all-mpnet-base-v2)
- Vector DB: Pinecone (SaaS), Weaviate (on-prem option)
- Search: BM25 via Elasticsearch, Pinecone for vector search
- Backend: Python, FastAPI, LangChain, Pydantic
- Database: PostgreSQL, Redis (caching)
- Frontend: React, TypeScript, TailwindCSS
- Deployment: Docker, Kubernetes, AWS (EKS) / On-Premises
Conclusion
Building a production RAG system for financial services requires more than just plugging an LLM into a vector database. It demands:
- Domain-specific embedding models
- Hybrid search combining semantic and keyword approaches
- Rigorous citation verification
- Complete audit trails for compliance
- Security and access controls from day one
By focusing on these fundamentals, we delivered a system that not only passed regulatory scrutiny but became an indispensable tool for compliance analysts.
Need a compliance-ready RAG system? Let's talk about your requirements.