4.2 KiB
Pokemon Text-to-JSON Pipeline
This project converts free-form Pokemon description text into:
- A normalized keyword list
- A populated Pokemon JSON object (from a blank/key-only template)
The pipeline is deterministic and rule-based.
Architecture
Stage 1: Keyword Extraction
File: keyword_extractor.py
Input: raw text description
Core logic:
- spaCy tokenization and POS tagging
- POS filtering (
NOUN,ADJ,VERB) - stopword and punctuation removal
- lemma-based normalization
- domain synonym normalization (example:
flames -> fire) - optional YAKE relevance scoring
- conservative retention policy so detail is not over-pruned
Output: ordered list of normalized keywords
Stage 2: JSON Inference
File: json_inference.py
Input: keyword list + optional JSON template
Core logic:
- infer primary/secondary type
- infer name candidate
- infer attacks, abilities, habitat, personality
- infer basic stats (
hp,attack,defense,speed) - fill nested TCG-like template fields (
types,attacks,weaknesses,stage,retreat, etc.) - preserve already non-empty values in the provided template
Output: inferred JSON profile
Stage 3: Orchestration CLI
File: infer_json_usage.py
This is the main entrypoint for end-to-end usage.
Default behavior:
- prints extracted keyword list
- prints inferred JSON
Project Structure
keyword_extractor.py: keyword extraction enginejson_inference.py: keyword-to-JSON inference logicinfer_json_usage.py: end-to-end CLIexample_usage.py: keyword extraction only CLIjson_template_example.json: sample blank/key-only templatetest_keyword_extractor.py: extraction teststest_json_inference.py: inference testsrequirements.txt: Python dependencies
Requirements
- Python 3.13 or lower is recommended for spaCy compatibility
- pip
Dependencies in requirements.txt:
spacy>=3.7.0yake>=0.4.2
Setup
- Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate
- Install dependencies
pip install -r requirements.txt
- Install spaCy English model
python -m spacy download en_core_web_sm
How To Run
A) Extract keywords only
python example_usage.py "furret long slender agile creature with soft fur"
Output: JSON list of keywords.
B) End-to-end: text -> keywords -> JSON
python infer_json_usage.py --template json_template_example.json "furret long slender agile creature with soft fur"
Output order:
- keyword list
- inferred JSON
C) End-to-end but JSON only
python infer_json_usage.py --json-only --template json_template_example.json "furret long slender agile creature with soft fur"
D) Start from keywords directly
python infer_json_usage.py --template json_template_example.json --keywords furret normal tail smash tunnel agile cheerful explore endurance
Tip: If you pass --keywords, text extraction is skipped.
Template Behavior
If --template is omitted, inference returns a full inferred profile object.
If --template is provided:
- empty fields are populated from inferred values
- non-empty fields are preserved
Current sample template supports nested card-like data including:
typesattackswithcost,name,effect,damageweaknesseswithtype,valuestage,retreat,legal
Tests
Run all tests:
python -m unittest -q
Troubleshooting
1) spaCy model not found
Error mentions en_core_web_sm not installed.
Fix:
python -m spacy download en_core_web_sm
2) spaCy import/runtime problems on very new Python versions
Use Python 3.13 or lower and reinstall requirements.
3) --template path errors
Ensure --template points to a valid file path, for example:
--template json_template_example.json
If your input is already a keyword list, use --keywords instead of putting the list in --template.
Design Notes
- deterministic and explainable (no LLM calls)
- domain mappings are easy to extend in
keyword_extractor.pyandjson_inference.py - scoring and template fill rules are intentionally simple and stable for game-content generation