Agents are Decision-Makers First: Leveraging Graph of Decisions for Intermediate Reward Modeling
GoD-IRM introduces intermediate reward modeling for structured decision-making in language models, assigning rewards at each divergence point in a reasoning trajectory. This approach enables fine-grained credit assignment, improving model robustness in long-horizon problem-solving. By reinforcing decision-making rather than just final outputs, GoD-IRM aligns language models more closely with traditional agent-based RL.