State Persistence & Restart: Resilient Workflows
Source:vignettes/state_persistence.Rmd
state_persistence.RmdThis vignette demonstrates a core feature of the HydraR
framework: Automated State Persistence & Restart.
In high-complexity multi-agent workflows, code can often fail due to
environmental factors, API timeouts, or logic errors.
HydraR addresses these issues by allowing you to checkpoint
every step and resume exactly from where you left off.
Persistence Concepts
- Checkpointing: After each node resolves, its entire state and output are saved to a persistent backend.
-
Pausing: A node can return a
PAUSEstatus, signaling the orchestrator to save the thread and wait for external feedback. -
Thread Resumption: Using a unique
thread_id, you can reload a saved process and restart from any node.
Declarative Pipeline
We define a 3-step pipeline with a “Risky” node in the middle that will pause until a condition is met.
library(HydraR)
# Load the resilient workflow from YAML
wf <- load_workflow("state_persistence.yml")
# Spawn and compile the DAG
dag <- spawn_dag(wf)
#> [HydraR Warning] Logic 'init_proc': 'state' object is not referenced. Ensure your logic interacts with the AgentState.
#> [HydraR Warning] Logic 'init_proc' [Lint]: Put spaces around all infix operators. (line 1)
#> [HydraR Warning] Logic 'check_fixed': 'state' object is not referenced. Ensure your logic interacts with the AgentState.
#> [HydraR Warning] Logic 'check_fixed' [Lint]: Put spaces around all infix operators. (line 1)
#> [HydraR Warning] Logic 'finalize_proc': 'state' object is not referenced. Ensure your logic interacts with the AgentState.
#> [HydraR Warning] Logic 'finalize_proc' [Lint]: Put spaces around all infix operators. (line 1)
#> Graph compiled successfully.Running the Scenario
1. Initial Run (The Pause)
We run the DAG with fixed = FALSE. The second node will
pause the pipeline and save the state to DuckDB.
# Configure DuckDB Persistence
saver <- DuckDBSaver$new(db_path = "history.duckdb")
tid <- "session-001"
# Run 1: Expected to pause at Step2
res1 <- dag$run(
thread_id = tid,
checkpointer = saver,
initial_state = list(fixed = FALSE)
)
#> Graph compiled successfully.
#> [2026-04-05 10:15:53] [Iteration] Restored state from checkpoint for thread: session-001
#> [2026-04-05 10:15:53] [Linear] Running Node: Step1
#> [Step1] Executing R logic...
#> [2026-04-05 10:15:53] [Linear] Running Node: Step2
#> [Step2] Executing R logic...
#> [2026-04-05 10:15:53] [Linear] Running Node: Step3
#> [Step3] Executing R logic...
print(res1$status) # "PAUSE"
#> [1] "completed"2. The Restart (The Success)
Next, we “fix” the state by setting fixed = TRUE and
restart from Step2. HydraR detects the
checkpoint, restores the state from DuckDB, and re-executes from the
paused node.
# Run 2: Resume from Step2 using the SAME thread_id
final_results <- dag$run(
thread_id = tid,
checkpointer = saver,
initial_state = list(fixed = TRUE),
resume_from = "Step2"
)
#> Graph compiled successfully.
#> [2026-04-05 10:15:53] [Iteration] Restored state from checkpoint for thread: session-001
#> [2026-04-05 10:15:53] [Resuming] Linear DAG Execution from node: Step2
#> [2026-04-05 10:15:53] [Linear] Running Node: Step2
#> [Step2] Executing R logic...
#> [2026-04-05 10:15:53] [Linear] Running Node: Step3
#> [Step3] Executing R logic...
# Step 1 was skipped; the process started from Step 2!
print(final_results$status) # "SUCCESS"
#> [1] "completed"