Reducing Token Burn Rate With A Well-Designed Architecture
Trying to put out the AI token fire - or at least manage it as a controlled burn by using deterministic scripts for gathering inputs and directing agents
I’ve been reading how companies have already burned through their AI budget for the year. It’s mid-April. I just spoke at the Computer History Museum on How I use AI for Penetration Testing and part of that talk was a consideration of how the cost of tokens is not sustainable due to lack of ROI. Throwing more hardware at the problem is like throwing gasoline on a fire at this point. But what alternatives do we have?
Burning the midnight tokens
Here’s my latest token burning scenario. I was up way too long trying to prompt my way to a solution that wasn’t cooperating. I had pretty much gotten Opus 4.6 to behave with a few tweaks of my system prompt after reviewing things Anthropic changed and telling the model not to do those things. There’s no way to know if that is actually what fixed the problem or if Anthropic changed something on their end. I only had a few of those “but wait…but wait…but wait…” nonsense responses which I think has something to do with their adaptive thinking.
Great. But my problem that I had right before taking some time off to present at AWS Security Community Day in Mountain View was still not fixed. I still couldn’t access an API served up by an API Gateway. Forbidden. I prompted and prompted and prompted to no avail…
I had already created a Lambda Troubleshooter previously that queries a bunch of logs like CloudWatch and related configuration data - deterministically not with AI. After obtaining the data it can optionally send the output to an AI agent to troubleshoot the problem. Before I left for my trip it suggested the problem was with VPC Endpoint networking not being configured properly. That was part of the problem. I took a look and the security groups were missing on the VPC Endpoints so the model couldn’t figure out how to correctly set up an AWS with VPC endpoints without explicit instructions. I left that to fix until after I got back.
I was able to successfully fix the networking in a reasonable amount of time. However, that did not fix the problem. One of the frustrating things was that I was restoring security groups to their original locations after the model gave me some incorrect or at least incomplete information. I set this all up before as you can see from my older blog posts on AWS networking but I forget things because I am always working on so many different things all the time. I had to redo something I had undone in the name of efficiency which backfired.
The bigger problems occurred after that. I and the agent were spinning in circles trying to figure out why I still couldn’t get to a Lambda endpoint. I presumed it was networking so that’s what I was focused on. I was getting various unhelpful and repetitive responses. I was being told the problem was likely an SCP over and over again among other things. I was feeding the relevant logs and configuration (or so I thought) to an agent but it could not figure out the problem.
After way too many hours the agent couldn’t find the problem. I went to bed frustrated. But this was all a mistake. It was my fault for not stopping earlier to THINK. Like with my own brain. What was I feeding the agent and why was I in this loop. I knew I could just go read all the API documentation and pour over that like I did in my old posts where I had similar problems that took hours or days to resolve:
https://medium.com/cloud-security/automating-cybersecurity-metrics-890dfabb6198
I supposed I would have to bite the bullet and log into all the relevant accounts, review all the logs, look at the configuration of the network, VPC endpoints, API Gateway, Lambda functions, and SCPs manually myself. I’d have to possibly even go back and read my old posts on private networking and related DNS issues among other things. What was it? I would save that project for some point after getting sleep.
Reducing speculation with improved inputs
I woke up in the morning determined not to waste any more time or tokens. Here’s the problem. The agent was focused on certain issues and speculating about answers. But why? Why was it speculating? I started to think about that.
If you have the proper logs and information - you know what the problem is. You don’t need to guess. Was I actually providing the logs I thought I was? What was missing that would immediately answer the question?
I must not be giving the agent enough information, the right information, or it’s getting too much information and getting sidetracked. I started to think about how I could get the agent the right information and possibly narrow the focus.
First I asked Google aimode for potential issues that could be causing my problems because the agent that was performing the troubleshooting was not saying anything like “I can’t tell what the problem is because I need X”. Google’s ai tool came up with a list of potential issues including some that the Lambda troubleshooting agent had not suggested. I did tell it that I was specifically running Lambda with an API gateway in a private network but for the Lambda troubleshooting agent that should have been obvious with the information I knew it did have. At any rate I had some more things to explore as a potential cause for the problem.
Now I needed to know what logs the agent was and was not getting. So at this point I stopped and had the code summarize the list of data sources and whether they had been successfully retrieved or not.
CloudWatch logs in Lambda account - y
CloudTrail data events - n
VPC Flow Logs - n
Etc.When I run the troubleshooter it allows me to select an agent to send the logs to at the end and optionally add some additional context. I ran the troubleshooter, sent the logs to the agent and added context to ask what logs were missing to help troubleshoot the problem including the potential errors in the list provided by Google’s aimode. That gave me a few more log sources and some additional configuration to query and provide to the agent.
The agent at the end of the troubleshooter is a separate agent than the one I was using to write the troubleshooter code. I went back to the agent that was working on the troubleshooter code and I asked it - why is the Lambda troubleshooter agent still guessing. What logs are missing or what information is causing it to do that?
Then I also expanded my list of what logs and configuration were retrieved or not to include the following:
Any errors that occurred that were not allowing the logs to be retrieved
Any logs that were not configured
Logs that had no errors so nothing was returned
Resources that were not configured (like a WAF which I’m not using in this case)
One of the findings that came to light was that CloudTrail logs related to Service Control Policies (SCPs) were not being returned due to a permissions error. Rather than tell me there were no SCP logs the agent was guessing that the SCP might be the problem. By fixing that error and reporting that there were no errors related to SCPs in CloudTrail I got the agent to stop guessing on that point.
Eventually going back and forth between the two agents for a while - the troubleshooter and the code agent - I came to the realization that some base URL mapping data in the API gateway configuration was missing from the data fed into the troubleshooting agent. It eventually found a related bug which I had yet another agent fix in my deployment script.
After digging into network logs and configuration I determined the script wasn’t correctly identifying the IP addresses and reporting that traffic may or may not be blocked. The queries were not clearly retrieving and reporting a list of blocked IP addresses and the IP addresses which the domain resolves to and API gateway IPs used to call the functions. Turns out even though I thought VPC flow logs were properly configured, they were not. In addition, my script to retrieve them had a bug where the agent created the script to retrieve them but never sourced that script in the parent project.
At some point I had changed the API gateway from V2 (HTTP) to V2 (REST) so I could use a proper policy that can block IP addresses without adding a whole WAF. At that point the CNAME was still pointing to the old API gateway configuration somehow. Although an agent correctly identified the old stale record when I removed it the domain didn’t resolve anymore. In fact the domain completely stopped working. Or did it have me delete the correct record instead of the stale record? I don’t know but I fixed the code and redeployed the CNAME and that resolved the problem.
FINALLY no more forbidden error. I could get to my API in the browser that serves up a web page to register a Yubikey. Great. I registered the Yubikey. The next problem was that I still could not authenticate.
I added more log queries including logs for a Lambda to Lambda and queries of CloudTrail in additional accounts involved to determine at what point the error occurs. Since there are no CloudWatch errors I presume this is a networking or permissions error. I need to know if it’s networking, an SCP, the API gateway resource policy or something else. The agent that writes the lambda code used by the troubleshooter has already figured out that a Lambda endpoint was missing that is required to use a Lambda Layer.
How long would that have taken me to figure out in the past? And now with AI - how long would it take AI to figure that out in a never ending loop without the complete data? If the agent doesn’t know there’s a Lambda layer and can’t see the network logs to determine what is blocked and can’t rule out the SCP or other policy issues it will guess. Or fail. And it might burn many tokens before it gives up or as it produces many incorrect answers.
Reducing token burn rate and achieving ROI
I used to work with some venture capitalists back in the dot-bomb days for those old enough to know what that is or have learned the history. What a crazy time of over investment in companies that likely had value but not that much value. The token burn problem is like the start up burn rate problem. How much cash are they burning through as they try to achieve profitability and ROI?
Think of a token budget the same way as the start up burn rate on the road to profitability and return on investment. How many tokens are you burning while you try to create something valuable that makes your organization money or solves whatever problem you set out to solve? Think back to all those companies who already ran out of AI budget for the year and it’s April. Did they get the resulting value from that AI spend that they were expecting? What exactly were they expecting?
As for my little project, I should have stopped going in circles earlier and just thought through the problem. But honestly I was so frustrated and I just wanted it to work so I kept prompting and prompting. I was honestly a bit stressed out that I was not getting this done faster. Yeah, forget that. You can only do so much in a day. I should have just gone to bed and saved myself some money.
In the end, I did achieve the objective I set out to achieve and it is going to save me a lot of time and tokens in the future. That’s because my script to query all the logs is deterministic and doesn’t use AI at all. I do not have to use A SINGLE TOKEN. That right there will save me a lot of money compared to an AI-only solution. But the AI analysis of the logs is going to save me a lot of time. It already has - and that is the ROI for AI. Using AI appropriately has value. Using it incorrectly is a waste of money.
How a good architecture helps when AI falters
I’ve already written about how I improve AI outcomes with the way I architect my multi-agent development process, but that doesn’t solve every problem. And it never will. AI is not deterministic and it never will be. A good architecture will also help with security and the token burning problem.
Obviously when I had the AI agent generate my troubleshooter it made numerous mistakes writing the code to query the logs and configuration. It did not query all relevant logs and configurations to solve the problem. It queried the wrong accounts, and failed to accurately assess what data it needed, and was full of bugs which I fixed after many iterations. AWS API gateway configuration with Lambda functions, layers and VPC endpoints in multiple accounts is complex.
Imagine if I simply asked AI every time - why isn’t this working? It may or may not look at the right data, report issues retrieving data, or even know what to look at without the complete architectural overview, which in and of itself would take a lot of information, tokens and context to explain each time. I’d have to provide all that context and repeatedly fix AI hallucinations and query errors each time I try to troubleshoot issues.
If you use AI to do things you can do deterministically you’re going to burn tokens unnecessarily.
Tokens = $$$$$$$$.
Thinking through your AI architecture and designing it to efficiently use tokens is part of the challenge with AI development. The agent is not going to do everything for you in that regard. With this solution I have saved myself a lot of tokens and in the future, a lot of time troubleshooting Lambda functions. What took me a couple of days just now will take me minutes in the future.
Key points:
Now I have a deterministic way (read no AI cost) to query and see the problem in the logs. Because I am using a deterministic solution, once I have complete and working code, the correct data will be retrieved every time with no incorrectly summarized, skipped, or hallucinated data.
If the logs are complete the answer to the problem is in the logs 100% of the time. It’s just a matter of deciphering it. No need to guess, hallucinate, or iterate trying to come up with non-existent answers to problems that cannot be efficiently solved without the relevant data.
When I have the script run the queries I don’t immediately feed it to an agent. I stop and ask the user for any additional prompt and ask if the user wants to send all that information to the agent. It can be verbose. If I can pinpoint a single problem in the logs I can give that one problem with as few tokens as possible to whatever agent needs to fix it.
If the problem is more complex or I prefer to save time rather than weeding through all the logs I can push them all to an agent to tell me what the problem is - but I’ve optimized the log output to be complete and to provide all the necessary information to minimize guessing, speculation, and iterative loops that don’t provide a solution. The answer is clearly there in the logs.
Now every time I need to troubleshoot a Lambda function, it should take far less time to do that.
I do not need to give the agent writing the code or the agent troubleshooting my Lambda and API Gateway architecture any credentials. That’s another crucial bit I’ve written about before and I’m sure I’ll write about again later.
AWS DevOps Agent
On a related note, AWS has released a new DevOps agent that may do something similar if you want to check it out. I imagine it performs in a similar way to what I’ve written about, though I haven’t tried it. It just makes sense to be able to query all relevant information and resolve issues quickly with AI now.
I would just take a look at where your credentials end up and how they are used to make sure you configure it correctly if you choose to use this tool. Note that I do not give the AI agent any permission to do things directly on my AWS infrastructure in the above solution. Given the number of mistakes outlined above and others I have experienced, I will probably not do that any time soon. But I will use AI where it makes sense to solve problems effectively. This tool is likely doing the same.
https://aws.amazon.com/devops-agent/
Subscribe for more stories like this and follow Good Vibes.
— Teri Radichel
If you liked this post you may also like these:





