190 lines
4.2 KiB
Markdown
190 lines
4.2 KiB
Markdown
# Pokemon Text-to-JSON Pipeline
|
|
|
|
This project converts free-form Pokemon description text into:
|
|
|
|
1. A normalized keyword list
|
|
2. A populated Pokemon JSON object (from a blank/key-only template)
|
|
|
|
The pipeline is deterministic and rule-based.
|
|
|
|
## Architecture
|
|
|
|
### Stage 1: Keyword Extraction
|
|
|
|
File: `keyword_extractor.py`
|
|
|
|
Input: raw text description
|
|
|
|
Core logic:
|
|
|
|
- spaCy tokenization and POS tagging
|
|
- POS filtering (`NOUN`, `ADJ`, `VERB`)
|
|
- stopword and punctuation removal
|
|
- lemma-based normalization
|
|
- domain synonym normalization (example: `flames -> fire`)
|
|
- optional YAKE relevance scoring
|
|
- conservative retention policy so detail is not over-pruned
|
|
|
|
Output: ordered list of normalized keywords
|
|
|
|
### Stage 2: JSON Inference
|
|
|
|
File: `json_inference.py`
|
|
|
|
Input: keyword list + optional JSON template
|
|
|
|
Core logic:
|
|
|
|
- infer primary/secondary type
|
|
- infer name candidate
|
|
- infer attacks, abilities, habitat, personality
|
|
- infer basic stats (`hp`, `attack`, `defense`, `speed`)
|
|
- fill nested TCG-like template fields (`types`, `attacks`, `weaknesses`, `stage`, `retreat`, etc.)
|
|
- preserve already non-empty values in the provided template
|
|
|
|
Output: inferred JSON profile
|
|
|
|
### Stage 3: Orchestration CLI
|
|
|
|
File: `infer_json_usage.py`
|
|
|
|
This is the main entrypoint for end-to-end usage.
|
|
|
|
Default behavior:
|
|
|
|
1. prints extracted keyword list
|
|
2. prints inferred JSON
|
|
|
|
## Project Structure
|
|
|
|
- `keyword_extractor.py`: keyword extraction engine
|
|
- `json_inference.py`: keyword-to-JSON inference logic
|
|
- `infer_json_usage.py`: end-to-end CLI
|
|
- `example_usage.py`: keyword extraction only CLI
|
|
- `json_template_example.json`: sample blank/key-only template
|
|
- `test_keyword_extractor.py`: extraction tests
|
|
- `test_json_inference.py`: inference tests
|
|
- `requirements.txt`: Python dependencies
|
|
|
|
## Requirements
|
|
|
|
- Python 3.13 or lower is recommended for spaCy compatibility
|
|
- pip
|
|
|
|
Dependencies in `requirements.txt`:
|
|
|
|
- `spacy>=3.7.0`
|
|
- `yake>=0.4.2`
|
|
|
|
## Setup
|
|
|
|
1. Create and activate a virtual environment (recommended)
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
```
|
|
|
|
2. Install dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Install spaCy English model
|
|
|
|
```bash
|
|
python -m spacy download en_core_web_sm
|
|
```
|
|
|
|
## How To Run
|
|
|
|
### A) Extract keywords only
|
|
|
|
```bash
|
|
python example_usage.py "furret long slender agile creature with soft fur"
|
|
```
|
|
|
|
Output: JSON list of keywords.
|
|
|
|
### B) End-to-end: text -> keywords -> JSON
|
|
|
|
```bash
|
|
python infer_json_usage.py --template json_template_example.json "furret long slender agile creature with soft fur"
|
|
```
|
|
|
|
Output order:
|
|
|
|
1. keyword list
|
|
2. inferred JSON
|
|
|
|
### C) End-to-end but JSON only
|
|
|
|
```bash
|
|
python infer_json_usage.py --json-only --template json_template_example.json "furret long slender agile creature with soft fur"
|
|
```
|
|
|
|
### D) Start from keywords directly
|
|
|
|
```bash
|
|
python infer_json_usage.py --template json_template_example.json --keywords furret normal tail smash tunnel agile cheerful explore endurance
|
|
```
|
|
|
|
Tip: If you pass `--keywords`, text extraction is skipped.
|
|
|
|
## Template Behavior
|
|
|
|
If `--template` is omitted, inference returns a full inferred profile object.
|
|
|
|
If `--template` is provided:
|
|
|
|
- empty fields are populated from inferred values
|
|
- non-empty fields are preserved
|
|
|
|
Current sample template supports nested card-like data including:
|
|
|
|
- `types`
|
|
- `attacks` with `cost`, `name`, `effect`, `damage`
|
|
- `weaknesses` with `type`, `value`
|
|
- `stage`, `retreat`, `legal`
|
|
|
|
## Tests
|
|
|
|
Run all tests:
|
|
|
|
```bash
|
|
python -m unittest -q
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### 1) spaCy model not found
|
|
|
|
Error mentions `en_core_web_sm` not installed.
|
|
|
|
Fix:
|
|
|
|
```bash
|
|
python -m spacy download en_core_web_sm
|
|
```
|
|
|
|
### 2) spaCy import/runtime problems on very new Python versions
|
|
|
|
Use Python 3.13 or lower and reinstall requirements.
|
|
|
|
### 3) `--template` path errors
|
|
|
|
Ensure `--template` points to a valid file path, for example:
|
|
|
|
```bash
|
|
--template json_template_example.json
|
|
```
|
|
|
|
If your input is already a keyword list, use `--keywords` instead of putting the list in `--template`.
|
|
|
|
## Design Notes
|
|
|
|
- deterministic and explainable (no LLM calls)
|
|
- domain mappings are easy to extend in `keyword_extractor.py` and `json_inference.py`
|
|
- scoring and template fill rules are intentionally simple and stable for game-content generation
|