# Pokemon Text-to-JSON Pipeline This project converts free-form Pokemon description text into: 1. A normalized keyword list 2. A populated Pokemon JSON object (from a blank/key-only template) The pipeline is deterministic and rule-based. ## Architecture ### Stage 1: Keyword Extraction File: `keyword_extractor.py` Input: raw text description Core logic: - spaCy tokenization and POS tagging - POS filtering (`NOUN`, `ADJ`, `VERB`) - stopword and punctuation removal - lemma-based normalization - domain synonym normalization (example: `flames -> fire`) - optional YAKE relevance scoring - conservative retention policy so detail is not over-pruned Output: ordered list of normalized keywords ### Stage 2: JSON Inference File: `json_inference.py` Input: keyword list + optional JSON template Core logic: - infer primary/secondary type - infer name candidate - infer attacks, abilities, habitat, personality - infer basic stats (`hp`, `attack`, `defense`, `speed`) - fill nested TCG-like template fields (`types`, `attacks`, `weaknesses`, `stage`, `retreat`, etc.) - preserve already non-empty values in the provided template Output: inferred JSON profile ### Stage 3: Orchestration CLI File: `infer_json_usage.py` This is the main entrypoint for end-to-end usage. Default behavior: 1. prints extracted keyword list 2. prints inferred JSON ## Project Structure - `keyword_extractor.py`: keyword extraction engine - `json_inference.py`: keyword-to-JSON inference logic - `infer_json_usage.py`: end-to-end CLI - `example_usage.py`: keyword extraction only CLI - `json_template_example.json`: sample blank/key-only template - `test_keyword_extractor.py`: extraction tests - `test_json_inference.py`: inference tests - `requirements.txt`: Python dependencies ## Requirements - Python 3.13 or lower is recommended for spaCy compatibility - pip Dependencies in `requirements.txt`: - `spacy>=3.7.0` - `yake>=0.4.2` ## Setup 1. Create and activate a virtual environment (recommended) ```bash python -m venv .venv source .venv/bin/activate ``` 2. Install dependencies ```bash pip install -r requirements.txt ``` 3. Install spaCy English model ```bash python -m spacy download en_core_web_sm ``` ## How To Run ### A) Extract keywords only ```bash python example_usage.py "furret long slender agile creature with soft fur" ``` Output: JSON list of keywords. ### B) End-to-end: text -> keywords -> JSON ```bash python infer_json_usage.py --template json_template_example.json "furret long slender agile creature with soft fur" ``` Output order: 1. keyword list 2. inferred JSON ### C) End-to-end but JSON only ```bash python infer_json_usage.py --json-only --template json_template_example.json "furret long slender agile creature with soft fur" ``` ### D) Start from keywords directly ```bash python infer_json_usage.py --template json_template_example.json --keywords furret normal tail smash tunnel agile cheerful explore endurance ``` Tip: If you pass `--keywords`, text extraction is skipped. ## Template Behavior If `--template` is omitted, inference returns a full inferred profile object. If `--template` is provided: - empty fields are populated from inferred values - non-empty fields are preserved Current sample template supports nested card-like data including: - `types` - `attacks` with `cost`, `name`, `effect`, `damage` - `weaknesses` with `type`, `value` - `stage`, `retreat`, `legal` ## Tests Run all tests: ```bash python -m unittest -q ``` ## Troubleshooting ### 1) spaCy model not found Error mentions `en_core_web_sm` not installed. Fix: ```bash python -m spacy download en_core_web_sm ``` ### 2) spaCy import/runtime problems on very new Python versions Use Python 3.13 or lower and reinstall requirements. ### 3) `--template` path errors Ensure `--template` points to a valid file path, for example: ```bash --template json_template_example.json ``` If your input is already a keyword list, use `--keywords` instead of putting the list in `--template`. ## Design Notes - deterministic and explainable (no LLM calls) - domain mappings are easy to extend in `keyword_extractor.py` and `json_inference.py` - scoring and template fill rules are intentionally simple and stable for game-content generation