Rejection Sampling

Beyond Correctness: Generating New Problems from Divergent Solutions for Reasoning with Rearrangement Sampling

Rearrangement Sampling transforms rejected solutions into new problem statements, expanding the problem-solution space beyond correctness constraints. A larger model acts as a judge to assess whether alternative problems can be inferred from divergent completions, assigning structured rewards to prior generations. This enables efficient data reuse, improves distributional coverage, and enhances reasoning generalization across domains.

Closing the Loop: Execution-Guided Continuous Generation for Adaptive Model Reasoning

We propose a feedback-driven decoding method where each generated candidate is iteratively refined using execution traces or reward-based adjustments. By conditioning generation on structured feedback from previous attempts, the method enforces progressive error minimization and adaptive correction. This approach enhances model reasoning, reduces compounding failure modes, and improves convergence in both code generation and reinforcement learning-based post-training.