Language Model Reasoning

Closing the Loop: Execution-Guided Continuous Generation for Adaptive Model Reasoning

We propose a feedback-driven decoding method where each generated candidate is iteratively refined using execution traces or reward-based adjustments. By conditioning generation on structured feedback from previous attempts, the method enforces progressive error minimization and adaptive correction. This approach enhances model reasoning, reduces compounding failure modes, and improves convergence in both code generation and reinforcement learning-based post-training.

Can Language Models Formulate ML Problems?

LLMs struggle to identify ML problems in real-world data, limiting their reliability for analytical tasks. While agentic systems offer partial solutions, true automation requires reasoning over complex systems. This blog examines these challenges and explores a new data representation model as a potential step forward.

The Need for Hypotheses Generation Cycles, Similar Link Prediction & Agency for Dynamic Databases

A robust framework for reasoning requires more than memorization; it must dynamically form and refine hypotheses. Inspired by theorem-proving frameworks, I propose a dynamic database with static relationships and evolving entities, enabling hypothesis cycles and similar link prediction. This method allows LLMs to infer hidden relationships across subsystems, addressing challenges in AI-driven scientific discovery and decision-making.

Developing Swan AI & the Six Graphical Representations for Complex Systems

I developed Swan AI to explore hybrid vector-graph representations for complex, interrelated systems. The goal was a data pipeline enabling AI to search, converse, and query while preserving hierarchical relationships. Existing knowledge graphs and vector databases lacked dynamic dependency modeling, prompting our exploration of six graphical representations, including hybrid vector-graph models and TensorDB. The core research question: Can LLMs infer hidden relationships in unstructured, hierarchical data to automate decision-making?

Thought Experiment: Can LLMs Understand & Predict Similar Links in the 'Arc' World?

I tested LLMs’ ability to predict similar links in complex systems using a thought experiment in the ‘Arc’ World. Inspired by the ARC-AGI benchmark, I created hidden relationships where objects had different meanings. The experiment revealed LLMs struggle with semantic ambiguity and adapting to unseen structures. This highlights challenges in enabling reasoning and generalization in AI.

Introduction: The Problem with Holistic, Interrelated Systems

While working on BMZ’s AI system, I realized the problem required a merged vector-graph approach, but the closed nature of the project limited its broader impact. A friend’s advice to ‘generalize it’ led me to formalize holistic, interrelated systems—complex, multi-layered decision-making structures where subsystems interact dynamically. Inspired by Taikyoku Shogi, I explored how AI can infer hidden dependencies within unstructured data, a challenge spanning domains like global logistics, finance, and governance. The key research question emerged: Can we automate the discovery of implicit relationships in such systems?

Complete Blog: The Problem of Reasoning in Holistic Systems

This blog presents my research and engineering efforts in language model reasoning, abstract representation of linked entities, and link prediction. It explores my work at BMZ, where I developed agentic multi-hop reasoning systems for policy decisions, and how this experience led me to investigate hidden relationships in complex datasets. Through Swan AI, I examined whether language models can learn, predict, and represent unseen links in dynamic databases. The blog discusses experiments, insights from ARC-AGI, agency in dynamic learning pipelines, and the role of hypothesis cycles in continual learning, culminating in a proposed framework for link prediction and adaptive data representation.

My 54 Modifications on Naïve RAG for BMZ

In this deep dive, I break down my iterative modifications to Naïve RAG, tackling the challenge of retrieving and reasoning over complex, hierarchical reports with hidden entity relationships. From agentic parsing and multi-hop retrieval to metadata-validated vector databases, I detail how I optimized retrieval, generation, and reasoning pipelines—reducing response time from 5 minutes to 15 seconds. This post unpacks key technical challenges, my final agentic reasoning pipeline, and the lessons learned in designing AI systems for high-stakes government use cases.

The Problem at BMZ: Designing a Decision-Making System for Country Policy Decisions & Negotiations

In 2024, I worked on a GovTech AI system to automate decision-making over deeply hierarchical, unstructured reports. Initially envisioned as a multi-agentic workflow, the project revealed fundamental challenges in reasoning over hidden relationships across multidimensional data. Similar to optimizing strategic imperfection in chess, the system had to dynamically infer cross-level dependencies, making explicit what was never written—highlighting core reasoning failures in current AI approaches.