Rag Library AI

Last Commit Repo Size License

Python LangChain Streamlit ChromaDB

This project is a Retrieval-Augmented Generation (RAG) application designed to act as an intelligent librarian. By indexing technical Python books, it allows users to ask complex questions and receive answers grounded in the specific text of those books, complete with context.


📸 Screenshots


🚀 How It Works

The application follows a standard RAG pipeline: - Ingestion: PDF books are loaded from the data/books/ directory.
- Chunking: Documents are split into 1000-character segments with a 150-character overlap to maintain context.
- Vectorization: Text chunks are converted into numerical embeddings and stored in a local ChromaDB instance.
- Retrieval: When a user asks a question, the system searches the database for the most relevant text chunks.
- Generation: The retrieved chunks and the user's question are sent to Gemini 2.5 Flash to generate a precise, grounded answer.


📖 Example Queries


📂 Project Structure

├── data/
│   └── books/              # PDF source files
├── vectorstore/
│   └── db/                 # Local ChromaDB persistent storage
├── app.py                  # Main Streamlit UI
├── ingest.py               # Script to process and embed PDFs
├── query.py                # CLI tool for testing queries
├── requirements.txt        # Python dependencies
├── pyproject.toml          # Project metadata and dependencies
├── .env                    # API Keys
└── .gitignore              # Files excluded from version control

💻 Getting Started

Clone the Repo

git clone <https://github.com/reory/Rag_Library_AI.git>
cd rag-library-ai

Setup Environment

Create a .env file in the root directory and add your Google API Key: ```.env GOOGLEAPIKEY=youractualkey_here

## Install Dependencies
```Bash
uv add -r requirements.txt

Ingest Data

Place your PDFs in data/books/ and run the ingestion script to build the vector database:

uv python ingest.py

Run the App

uv run streamlit run app.py

⚒️ Tech Stack:


🛣️ Roadmap Features


📝 Notes


Built by Roy Peters 😁 LinkedIn