An end-to-end Machine Learning service that detects fraudulent invoices using XGBoost. This project features a full pipeline: synthetic data generation, model training with SMOTE (oversampling), and a Flask-based web dashboard with a real-time risk speedometer.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
You must run these in order to create the "brain" for the app:
Generate Data: python core/generator.py (Creates 100-row fake_invoices.csv)
Train AI: python core/trainer.py (Trains the model and saves .pkl files)
Start Service: python app.py (Launches the dashboard at http://127.0.0.1:5000)
The AI is trained to recognize specific patterns of risk. To see the "Speedometer" in action, try these test cases:
Vendor: Small Ltd
Amount: 250
Verdict: The needle will stay in the Green (Low Risk).
Vendor: QuickPay UK
Amount: 45000
Verdict: The needle will swing to Red (High Risk) because the AI recognizes the suspicious vendor name and unusually high amount.
Open data/raw/fakeinvoices.csv. Any row where isfraud is 1 will trigger a high risk score. Any row where is_fraud is 0 should come back clear!
Backend: Pydantic, Joblib, Pandas, Faker
Machine Learning: XGBoost, Scikit-learn, Imbalanced-learn (SMOTE)
Frontend: Flask, HTML5/CSS3 (Animated Gauge), JavaScript (Fetch API)
This project includes a comprehensive test suite to ensure the data generator and AI API are perfectly synced. Run them with:
pytest
Contributions are welcome! If you have ideas to improve the fraud detection logic or the dashboard UI:
Fork the Project.
Create your Feature Branch (git checkout -b feature/AmazingFeature).
Commit your Changes (git commit -m 'Add some AmazingFeature').
Push to the Branch (git push origin feature/AmazingFeature).
Open a Pull Request.
Data Privacy: This project uses synthetic data generated by Faker. No real invoice data is included or required to run the demo.
Model Accuracy: The XGBoost model is trained on a small synthetic sample (100 rows by default). For higher accuracy in a production setting, increase the n value in generator.py and retrain.
CORS: Ensure Flask-CORS is active if you plan to host the frontend and backend on different ports.
[ ] Batch Processing: Ability to upload an entire CSV for bulk fraud scanning.
[ ] User Auth: Secure login for finance team members.
[ ] Email Alerts: Auto-notify admins when a "High Risk" invoice is detected.
Scikit-learn & XGBoost: For the heavy lifting in the ML pipeline Faker - For helping create the fake data.
Built By Roy Peters Click here for contact detailsπ