Is 93 > 89.1?

Browser Use vs General Agency

Apr 07, 2025

Surfing the Internet [1920x1080] : r/wallpaper

Unless you’re living under a rock, you’ve heard of Browser Use. Basically, it’s an open source dev tool that allows AI systems like Manus to surf the web through browser interactions. It has just announced a $17mm seed round, and developers love it as you can tell from the ~54k stars on GitHub. I mean, why shouldn’t they? After all, Browser Use boasts 89.1 on the Webvoyager benchmark, crushing AI powerhouses like Anthropic and OpenAI.

However, quietly emerging from the same YC batch is another contender: General Agency, whose state-of-the-art model currently scores 93 on Webvoyager, yet it hasn’t received the same recognition as Browser Use. How come?! Quickly checking my maths…but 93 is clearly higher than 89.1 right? With those questions, I sat down with Harvey and Mo, the founders of General Agency to learn more about their company.

Before we dive in, to see a sample side by side comparison between General Agency and Browser Use, when they are both prompted with

Find out the price of The Jay, Autograph Collection SF on March 9th, 2025. Go to http://booking.com and click through all the steps to give me the final price after taxes and fees.

General Agency was able to retrieve the most accurate pricing by actually attempting to reserve the room, whereas Browser Use prematurely stopped and only reported the pricing without taxes and fees. When I asked if there is a set of real world tasks that General Agency would beat Browser Use at, Mo said

“Any task requiring piping data through API integrations. One of our customers is using our system to interface with Intercom, pull data from there, browse the internet using that context, and also sending emails. Most importantly, we can do this with basically 99.9%+ reliability.”

Despite achieving the state of art performance on web browsing benchmark, General Agency has a bigger vision. Just as its named suggested, General Agency is not satisfied with just being a dev tool, and web browsing is simply an element of a bigger picture. General Agency wants to be your trusted teammate that helps you with the busywork, and in order to do that, it developed a new architecture called the Neurosymbolic Self-Model.

The compute graph is the orchestration layer, consists of multiple LLMs, that can reason, plan, and prioritize. However, what sets the architecture apart is its use of a procedure graph, which allows the system to continuously learn without constant fine-tuning. While the knowledge graph helps with ingestion and retrieval, the procedural graph captures state-action pairs, effectively creating a “world model” for the agent. This enables the system to remember and adapt to unexpected outcomes—something an LLM alone cannot do—by retaining episodic replay without modifying model weights.

In terms of GTM, General Agency is first setting its eyes on business process outsourcing (BPO), which you can learn more about from a recent interview with Kimberly Tan, Investing Partner at a16z.

In terms of their growth strategy, while the team is keenly aware of the benefits of open source, they’re choosing to block out the noise and overhead that comes with maintaining an open source project.

Overall, I’m excited to see where 2025 will take General Agency and revisit the “rivalry” at the end of the year. Currently, the team is working with design partners in supply chain, finance and consulting firms, so if you are an enterprise looking to automate part of your BPO stack, talk to the team to learn more!

Renaissance Fish

Discussion about this post