An interactive intelligence dashboard and automated reporting tool built to analyze linguistic patterns and global etymology. This project transforms raw text input into actionable forensic insights using a modern Python stack.
Note: Click the badge above to view the full feature walkthrough and linguistic analysis demo on LinkedIn.
See the full gallery here:
Click to expand screenshots
This project is divided into two main components to balance real-time user interaction with deep-dive analytical processing:
views.py & templates/)The "Frontend" logic of the project. It provides a real-time interface for users to explore their text data. * Dynamic Geospatial Mapping: Visualizes the "geographic DNA" of a text by pinpointing word origins across a global map using Folium. * Instant Linguistic KPIs: Calculates Lexical Diversity (TTR), Overused Words, and Passive Voice detection on the fly. * User Vault: A persistent history system allowing users to search, review, and manage their analysis records securely.
services/ & models.py)The "Analytical Backend." This handles the heavy lifting of data management and document generation. * Dual-State Storage: Manages persistent user history in SQLite while offloading high-speed etymological lookups to a DuckDB OLAP engine. * Global Etymology Pipeline: A custom ingestion layer that maps over 500+ words to global coordinates (Latin, Germanic, Arabic, Sanskrit, and more). * Automated Document Generation: Compiles findings into professional PDF reports (via WeasyPrint) and Word documents (python-docx) for offline review.
word_counter/settings.py: Core configuration for the Django environment.counter/views.py: Logic for text processing, regex normalization, and dashboard rendering.counter/services/seed_origins.py: Data pipeline script for ingesting the global word library.counter/services/word_data.json: The "Source of Truth" containing 500+ global etymology records.word_vault_analytics.duckdb: High-performance database for geospatial word lookups.To run this project locally:
1. Clone the repo: git clone https://github.com/reory/Word-Counter-Vault.git
2. Install dependencies: pip install -r requirements.txt
3. Seed the Global Vault: python -m counter.services.seed_origins
4. Launch the app: python manage.py runserver
This project implements a comprehensive automated testing suite using Pytest to ensure data integrity and security across the analytical pipeline.
pytest-mock to simulate DuckDB OLAP connections, allowing for high-speed testing without disk I/O dependency..txt, .pdf, and .docx uploads using Django's SimpleUploadedFile.pytest
Faker for large-scale stress testing.This project is licensed under the MIT License - see the LICENSE file for details.