Preliminary Abstract
Language models construct implicit world models as they process information, yet their latent space reasoning remains disconnected from explicit external representations. We propose a framework that bridges latent space reasoning to structured world model representations by analyzing the iterative hypothesis cycles formed as a model refines its understanding of a complex system. Our approach examines the evolving structure of hypotheses generated within the latent space, identifying recurring patterns that suggest fundamental governing rules. By capturing these patterns, we aim to translate the model’s internal abstractions into explicit, verifiable representations, enabling structured reasoning over dynamic environments. We outline a methodology to extract and refine these latent-world mappings, incorporating external memory, hypothesis evaluation mechanisms, and structured priors. Through targeted experiments, we assess whether fundamental principles of a system can be recovered, analyzed, and aligned with human-interpretable structures. This work advances our ability to create autonomous reasoning agents capable of adaptive world model construction, a crucial step toward AI systems that continuously learn and refine their understanding of reality.