Written by humans
Exploits, in-depth write-ups, and game-breaking bugs explored by our team.
Sign Up for Updates
by Tyler Edwards
Frontier models are overkill for most agent tasks. Specialised small language models, beat them on cost, latency, and accuracy.
The agent can still delete a production database - and then gleefully reported its action back to you in the same way a dog might excitedly drop a dead bird at your feet.
Your agent is the most thoroughly documented system in your engineering org and yet you still can't answer simple questions.