first commit
This commit is contained in:
67
CLAUDE.md
Normal file
67
CLAUDE.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
Juicepyter is a Pokémon card generator pipeline that takes a natural language description, cleans it, extracts structured JSON metadata, and generates a card image using a LoRA-finetuned Stable Diffusion model. A Streamlit UI (`app.py`) ties it all together.
|
||||
|
||||
## Architecture — Three-Stage Pipeline
|
||||
|
||||
The pipeline (`prompt_to_card_pipeline.py`) orchestrates three stages:
|
||||
|
||||
1. **Text cleaning** (`text-cleaner/text_cleaning_pipeline.py`): NLTK-based pipeline — lowercasing, punctuation/slang removal, stopword filtering, POS-aware lemmatization. Entry point: `get_clean_text(raw_text) -> str`.
|
||||
|
||||
2. **Keyword extraction + JSON inference** (`clean-text-to-keywords/`): spaCy + YAKE keyword extraction (`keyword_extractor.py`) → rule-based JSON inference (`json_inference.py`) that populates a TCG-style card template. CLI: `infer_json_usage.py`. No LLM calls — deterministic and rule-based.
|
||||
|
||||
3. **Card image generation** (`card_generator_adapter.py`): Loads `runwayml/stable-diffusion-v1-5` with a LoRA adapter (PEFT) from `pokemon_card_lora/`, converts metadata to a SD prompt via `metadata_to_conditioning()`, runs inference. The generator module is pluggable via `--generator-module`.
|
||||
|
||||
`fetch_card.py` is a standalone data collection script that downloads real Pokémon TCG card images with embedded metadata using the TCGdex SDK.
|
||||
|
||||
## Commands
|
||||
|
||||
### Run the Streamlit app
|
||||
```bash
|
||||
streamlit run app.py
|
||||
```
|
||||
|
||||
### Run the full pipeline CLI
|
||||
```bash
|
||||
python prompt_to_card_pipeline.py "description text" \
|
||||
--text-cleaner-path text-cleaner/text_cleaning_pipeline.py \
|
||||
--infer-script-path clean-text-to-keywords/infer_json_usage.py \
|
||||
--checkpoint pokemon_card_lora \
|
||||
--template clean-text-to-keywords/json_template_example.json \
|
||||
--generator-module card_generator_adapter.py \
|
||||
--device cpu \
|
||||
--save-path generated_card.png \
|
||||
--print-json
|
||||
```
|
||||
|
||||
### Run keyword extraction + JSON inference only
|
||||
```bash
|
||||
cd clean-text-to-keywords
|
||||
python infer_json_usage.py --template json_template_example.json "your pokemon description"
|
||||
```
|
||||
|
||||
### Tests
|
||||
```bash
|
||||
cd clean-text-to-keywords
|
||||
python -m unittest -q
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **text-cleaner**: `nltk` (punkt, stopwords, wordnet, averaged_perceptron_tagger)
|
||||
- **clean-text-to-keywords**: `spacy>=3.7.0`, `yake>=0.4.2`, spaCy model `en_core_web_sm`
|
||||
- **card generation**: `diffusers`, `torch`, `peft`, `transformers`, `accelerate`, `safetensors`
|
||||
- **app**: `streamlit`, `Pillow`
|
||||
- **fetch_card**: `tcgdexsdk`, `Pillow`
|
||||
|
||||
Python 3.13 or lower recommended (spaCy compatibility).
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
- The generator module pattern is pluggable: any module with `build_pipeline(checkpoint_path, device)` and optionally `metadata_to_conditioning(meta)` can be swapped in via `--generator-module`.
|
||||
- The JSON inference stage preserves non-empty fields in the provided template — only empty fields get populated.
|
||||
- The LoRA base model is `runwayml/stable-diffusion-v1-5` with PEFT adapter weights in `pokemon_card_lora/`.
|
||||
Reference in New Issue
Block a user