Formalizing the Future of Agentic Systems
A focused hackathon exploring how Lean can become the foundation for building provably reliable, multi-agent systems.
April 17–18, 2026 | IISc Bengaluru + Online
Agentic systems powered by LLMs are moving from prototypes to production. Coding agents are writing, testing, and deploying software. But one boundary remains fragile: Human intent → Machine execution. When intent is written in natural language, ambiguity creeps in. At scale, that drift becomes dangerous. The next evolution of AI systems requires Formal Specification as the interface to autonomy.
Verified autonomy will define the next decade of AI. India can lead this transformation—our mathematical rigor and engineering talent are exactly what trustworthy AI demands. That transformation starts here, with builders like you.
This is the highest-graded project. The judges noted it has a "realistic model of k8s" and proves "sensible theorems," making it the clear winner.
I worked on making a formally verified model of Kubernetes in Lean, that was used as a judge during reinforcement learning to train a model that can be used as an SRE agent.
This was my first time working with Lean (or any theorem proving language) and it seemed to fill most of the gaps that I saw as potential issues with the rise of AI agents.
Tied with Kleisli Triple on both Grade and Expertise, it takes 2nd prize because it achieved "end-to-end verification" and a "working LLM → jq → SQL pipeline," whereas Kleisli Triple was noted as needing "further furnishing" and having potential scalability issues.
The project gave me the opportunity to apply formal verification in a practical setting while learning Lean.
I'm deeply grateful to Dr. Siddhartha Gadgil for the mentorship and resources that made learning Lean4 possible for us.
A toy model of a version control system (VCS) with verified behavior to flag merge conflicts and certify safe actions for collaborating agents.
Despite some concerns about scalability, it is technically very strong and was one of the highest-rated projects overall.
A formal verification framework that proves a foundational safety property: individual agent safety does not guarantee compositional safety. It detects emergent capabilities in multi-agent systems before deployment.
Highly praised by the judges for its theoretical approach to multi-agent systems and for mechanizing "impossibility results."
A bridge between LLMs and Lean4 that rigorously verifies the legality and soundness of chess mate-in-N puzzles.
The highest expertise score among the Grade 2 projects, providing a "nice demonstration of what's possible with Lean + verification."
A capability envelope for agentic coding pipelines, ensuring agents only invoke operations within their declared scope.
A very comprehensive submission that delivered all three core objectives, including a bonus Python middleware layer and a Cursor/MCP integration.
A Lean 4 library that turns redaction policies into proof-carrying artefacts to ensure safe data handling in multi-agent pipelines.
The judges specifically highlighted that the "framing, implementation, and theorems are nice," making it stand out among other Grade 2 projects.
Use Lean as the grounding layer for AI agents—proving that verification can be embedded at every level of autonomous decision-making.
Formalize system invariants before code generation, proving that transformations preserve semantics even as systems evolve.
Attach machine-checkable guarantees to autonomous workflows and agent-generated code—moving from hope to proof.
This hackathon isn't about theoretical exercises. You'll build working systems that demonstrate how verification makes autonomy trustworthy.
You'll work hands-on in teams, guided by experts, tackling real-world challenges at the intersection of formal verification and autonomous AI. The theme for this hackathon is:
Multi-agent systems such as those built on OpenClaw have become extremely popular because of their power across many domains. However, they also introduce significant risks. These risks must be addressed without sacrificing the capabilities that make such systems valuable. Guardrails backed by formal proofs are a natural way to ensure safety without being excessively restrictive. This hackathon is focused on building such verification systems in the Lean Prover.
The hackathon will begin with a one-day workshop on Lean, focused mainly on programming and the basic metaprogramming needed to build these systems. This will be followed by one day of in-person guided project work, and then 10 days of online interaction to complete the projects.
Participants will develop a model of a class of multi-agent systems like OpenClaw in Lean, together with the basic theory needed to reason about them. They will define types, terms, and functions that capture the semantics of these systems in Lean, and prove the core theorems on which proof automation can build. To interface with the Lean code, they may also extend Lean's syntax and build connections to proof automation tools. Participants may choose the style of agentic system they want to formalize, including specialized systems such as those for mathematics.
A successful project will be able to verify correct behaviour and flag risks within a powerful agentic system, without being so defensive that it weakens the system's usefulness. All code will be open-sourced (with acknowledgement of the hackathon sponsor) and should be well documented. After the hackathon, we aim to aggregate the various contributions into a common system.
This hackathon is for builders who believe "move fast and break things" is no longer enough.
Sponsored by Emergence
For exceptional formal specification and practical application
For strong verification work and innovation
For outstanding effort and promising verification work
If you are serious about building AI systems that scale safely, preserve semantics, respect invariants, and earn trust — this is for you.
April 17–18, 2026 | IISc Bengaluru + Online