Pipeline uses subprocess calls instead of direct Python imports between stages #18

Open
opened 2026-03-19 17:31:31 +00:00 by llabeyrie · 0 comments
Owner

Description

The pipeline orchestration in prompt_to_card_pipeline.py uses subprocess calls and dynamic file-based module loading instead of direct Python imports:

# Stage 1: loads module from file path
get_clean_text = _load_function_from_file(args.text_cleaner_path, "get_clean_text")

# Stage 2: subprocess call to another Python script
result = subprocess.run([sys.executable, str(infer_script), "--json-only", ...])

# Stage 3: dynamically loads generator module from file path
module = _load_module_from_file(self.generator_module_path)

Problems

  • Error handling is fragmented: each stage's errors are trapped differently (Python exceptions vs subprocess exit codes vs dynamic import errors)
  • Performance: spawning subprocesses has overhead (new Python interpreter, re-importing all modules)
  • Debugging is painful: stack traces don't span across subprocess boundaries
  • Testing is hard: can't mock individual stages easily
  • The _load_module_from_file() function reinvents importlib in a fragile way

Fix

Once the project has proper package structure (issue #17), replace subprocess calls with direct function calls:

from juicepyter.text_cleaner import get_clean_text
from juicepyter.keyword_inference import extract_and_infer

cleaned = get_clean_text(raw_text)
metadata = extract_and_infer(cleaned, template)
image = generator.generate(metadata)
## Description The pipeline orchestration in `prompt_to_card_pipeline.py` uses subprocess calls and dynamic file-based module loading instead of direct Python imports: ```python # Stage 1: loads module from file path get_clean_text = _load_function_from_file(args.text_cleaner_path, "get_clean_text") # Stage 2: subprocess call to another Python script result = subprocess.run([sys.executable, str(infer_script), "--json-only", ...]) # Stage 3: dynamically loads generator module from file path module = _load_module_from_file(self.generator_module_path) ``` ### Problems - **Error handling is fragmented**: each stage's errors are trapped differently (Python exceptions vs subprocess exit codes vs dynamic import errors) - **Performance**: spawning subprocesses has overhead (new Python interpreter, re-importing all modules) - **Debugging is painful**: stack traces don't span across subprocess boundaries - **Testing is hard**: can't mock individual stages easily - The `_load_module_from_file()` function reinvents `importlib` in a fragile way ### Fix Once the project has proper package structure (issue #17), replace subprocess calls with direct function calls: ```python from juicepyter.text_cleaner import get_clean_text from juicepyter.keyword_inference import extract_and_infer cleaned = get_clean_text(raw_text) metadata = extract_and_infer(cleaned, template) image = generator.generate(metadata) ```
llabeyrie added the priority: lowstructure labels 2026-03-19 17:31:52 +00:00
Sign in to join this conversation.