Invoice Fraud Detector

πŸš€ Invoice Fraud Detector Service

License Repo Size scikit-learn XGBoost Pandas Pydantic Flask Joblib Imblearn Faker Last Commit

An end-to-end Machine Learning service that detects fraudulent invoices using XGBoost. This project features a full pipeline: synthetic data generation, model training with SMOTE (oversampling), and a Flask-based web dashboard with a real-time risk speedometer.


πŸ› οΈ Setup Instructions

1. Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

Run the Pipeline (Sequence is Important!)

You must run these in order to create the "brain" for the app:


πŸ•΅οΈβ€β™‚οΈ How to Use the Dashboard

The AI is trained to recognize specific patterns of risk. To see the "Speedometer" in action, try these test cases:

βœ… Scenario 1: The Trusted Partner (Low Risk)

⚠️ Scenario 2: High-Value Fraud (High Risk)

πŸ§ͺ Pro Tip: Find Your Own Test Cases

Open data/raw/fakeinvoices.csv. Any row where isfraud is 1 will trigger a high risk score. Any row where is_fraud is 0 should come back clear!


✨ Interactive Features


πŸ’» Tech Stack


πŸ§ͺ Automated Testing

This project includes a comprehensive test suite to ensure the data generator and AI API are perfectly synced. Run them with:

pytest


🀝 Contributing


πŸ“ Notes


πŸ—ΊοΈ Roadmap

[ ] Batch Processing: Ability to upload an entire CSV for bulk fraud scanning.

[ ] User Auth: Secure login for finance team members.

[ ] Email Alerts: Auto-notify admins when a "High Risk" invoice is detected.


❀️ Thanks

Scikit-learn & XGBoost: For the heavy lifting in the ML pipeline Faker - For helping create the fake data.


Built By Roy Peters Click here for contact details😁