Beyond Correctness: Generating New Problems from Divergent Solutions for Reasoning with Rearrangement Sampling
Rearrangement Sampling transforms rejected solutions into new problem statements, expanding the problem-solution space beyond correctness constraints. A larger model acts as a judge to assess whether alternative problems can be inferred from divergent completions, assigning structured rewards to prior generations. This enables efficient data reuse, improves distributional coverage, and enhances reasoning generalization across domains.