I Built Three LLM Apps for Real Work. Here’s What Actually Helped.

I build stuff for a living. I also break stuff. Both happened a lot here.

Over the last few months, I built three small AI apps to help my team (the full breakdown is in this deeper dive). I used them daily and made real changes based on what worked and what blew up in my face. You know what? I learned fast.

Quick take: LLM apps can save time and feel smart. They also get weird unless you add guardrails, logs, and clear tasks.


App 1: The Slack Meeting Buddy That Doesn’t Miss Action Items

  • Tools I used: Slack app, OpenAI’s gpt-4o-mini for speed, Vercel for hosting, and Supabase for storage (if you're curious about plugging an OpenAI key into a native project, this guide shows what actually worked).

  • What it does: It listens to meeting notes, then posts a summary with:

    • Decisions
    • Next steps
    • Owners and due dates

If you’re just starting to blend AI with your chat workflow, Slack has a handy primer on best practices for AI features in channels — worth skimming before you wire things up (Tips for working with AI in Slack).

What went right:

  • It cut note-taking time by a lot. My 30-minute standup recap dropped to 10 lines. No fluff.
  • It tagged people in Slack by name. Folks can’t dodge tasks now.

What went wrong:

  • The first week, it skipped dates. It wrote “soon.” Soon? No thanks.
  • It also softened tough notes. It turned “Blocker: API down” into “Minor issue.” Not helpful.

How I fixed it:

  • I added a strict format: “Task / Owner / Date / Risks.” The model followed it well.
  • I made it ask one follow-up in-thread: “Are the dates right?” That tiny check cut misses.
  • I logged every message. When it went off script, I could see why and patch fast.

Result:

  • People finished tasks faster. I know because Slack threads got shorter, not longer.
  • Cost last month: about $11 for model calls. Worth it.

Tiny digression: I built the first version on a Sunday with iced coffee and a dog at my feet. It felt like a tiny win with big vibes.


App 2: A Support Bot That Knows Our Docs (and Knows When To Ask For Help)

I wanted a bot that reads our Notion docs and helps with support tickets. But it must say “I’m not sure” when the docs don’t match.

  • Tools I used: LlamaIndex for doc indexing, Pinecone for search, Claude 3.5 Sonnet for long answers, FastAPI for the service.
  • Data: Notion help center and a small Zendesk export.

If you plan to hook Notion content into your ticketing flow, the official page on the Notion–Zendesk integration gives a quick overview of how the two tools talk to each other.

What went right:

  • It found the right page most of the time and quoted key lines. Clear, short, and linked to sources. I love source quotes.
  • First reply time dropped a lot. Agents started from a good draft instead of a blank box.

What went wrong:

  • When docs were old, it made stuff up. Once it told a customer we had same-day refunds. We don’t. I felt that one in my gut.

How I fixed it:

  • I set a confidence floor. If scores were low, it said: “I’m not sure. Want me to tag support?” It asked, and then routed.
  • I added nightly crawls, so the index stays fresh.
  • I pinned “hard rules” for money things. If a refund came up, it read from a short, strict policy first.

How I tested it:

  • I replayed 50 past tickets. I tagged each answer as correct, off, or risky. Old me would’ve skipped this. New me is a believer.

Result:

  • Agents used the bot for drafts in about 70% of cases.
  • Wrong-answer rate dropped after the rules. My stress did too.
  • Cost: around $19 a month, mostly model and Pinecone. Still fine.

App 3: A Lead Qualifier That Runs on the Site Without Feeling Pushy

We needed a chat box on our site that asks a few smart questions and tags the lead in HubSpot.

  • Tools I used: Vercel AI SDK for the chat UI, OpenAI function calling to tag fields, HubSpot API, and a tiny rate limit with Upstash.
  • What it does: It asks three things, classifies fit (high, medium, low), and creates a lead with notes.

What went right:

  • It felt polite. It didn’t interrogate people. It used short questions and paused like a human.
  • Sales loved the tags. They got “Needs demo” or “Send case study” with reason lines.

One neat example from a very different corner of the internet: niche communities that cater to open-minded adults rely on soft-touch chat funnels to learn preferences without scaring visitors away. Just look at how the French libertine network Nous Libertin sets up its discreet onboarding—checking it out shows how a respectful tone and minimal questions can still qualify users effectively. For a Stateside spin, peek at how a regional listing board for casual connections in Oregon structures its pages; Backpage Grants Pass demonstrates simple category tags, concise headings, and clear calls-to-action that make it easy for visitors to find what they want without friction—handy inspiration if you’re fine-tuning your own chat-based qualifier.

What went wrong:

  • Trolls. Of course.
  • Also, it sometimes asked extra questions. It got nosey.

How I fixed it:

  • I added a content filter and a hard cap at four messages. After that, it says “Thanks! A human will follow up.”
  • I set “quiet hours” so it doesn’t ping the team at 2 a.m. Simple cron. Simple life.

Result:

  • More leads, fewer form drop-offs. Not huge, but steady.
  • Cost: about $6 last month. Hosting stayed cheap.

What I Liked Across Tools

  • Vercel AI SDK: Fast to ship. Nice streaming. The chat felt smooth.
  • LlamaIndex: Easy to wire Notion and build RAG. Less glue code.
  • Claude 3.5 Sonnet: Great on long context. Calm tone.
  • gpt-4o-mini: Cheap and quick for summaries and tags.
  • Pinecone: Simple setup. Good search. The free tier carried me for a while.
  • Need a quick, no-code landing page for your next AI tool? ZyWeb lets you publish in minutes and stay focused on shipping.

What bugged me:

  • LangChain got heavy in bigger flows. I kept losing track of state. I moved logic into small functions and kept it plain.
  • Rate limits pop up at the worst time. Log them. Retry with a backoff. Or you’ll chase ghosts.
  • Hallucinations never fully vanish. You need rules, sources, and a polite “I don’t know.”

Real Tips I Wish I Knew Sooner

  • Give the model a job, not a vibe. “Make a task list with owner and date” beats “Summarize the meeting.”
  • Ground answers in your data. Quote sources. Show links. People trust receipts.
  • Add an “I’m not sure” path right away. It saves your bacon.
  • Log everything. Prompts, outputs, errors. You can’t fix what you can’t see.
  • Start tiny. One use case. One team. Then grow. (I wrote up what happened when I built two tiny tools in this story.)
  • Watch cost by model choice. Use small models for tags and routing. Save big models for deep work.

Who Should Build These

  • Small teams that live in Slack, Notion, and HubSpot.
  • Solo devs who like fast loops and clear wins.
  • Not great for strict rules or heavy review. You need humans in the loop for money, health, or legal.

The Money and the Feel

  • My total monthly spend across the three apps stayed under $40 most months.
  • Average answer time felt snappy: 2–4 seconds for short tasks, 6–8 seconds for long ones.
  • Did it change my week? Yes. Fewer pings. Cleaner notes. Less stress.

I won’t lie: I had to babysit these apps for a bit. But once the guardrails were in, they felt steady. And when the bot said “I’m not sure,” I smiled. That’s trust.

If you build one thing first, make the Slack recap bot. People notice clean notes. Then add the support bot with a hard “I don’t know.” Your future self will thank you.