I Built Three LLM Apps for Real Work. Here’s What Actually Helped.

I build stuff for a living. I also break stuff. Both happened a lot here.

Over the last few months, I built three small AI apps to help my team (the full breakdown is in this deeper dive). I used them daily and made real changes based on what worked and what blew up in my face. You know what? I learned fast.

Quick take: LLM apps can save time and feel smart. They also get weird unless you add guardrails, logs, and clear tasks.

App 1: The Slack Meeting Buddy That Doesn’t Miss Action Items

Tools I used: Slack app, OpenAI’s gpt-4o-mini for speed, Vercel for hosting, and Supabase for storage (if you're curious about plugging an OpenAI key into a native project, this guide shows what actually worked).
What it does: It listens to meeting notes, then posts a summary with:
- Decisions
- Next steps
- Owners and due dates

If you’re just starting to blend AI with your chat workflow, Slack has a handy primer on best practices for AI features in channels — worth skimming before you wire things up (Tips for working with AI in Slack).

What went right:

It cut note-taking time by a lot. My 30-minute standup recap dropped to 10 lines. No fluff.
It tagged people in Slack by name. Folks can’t dodge tasks now.

What went wrong:

The first week, it skipped dates. It wrote “soon.” Soon? No thanks.
It also softened tough notes. It turned “Blocker: API down” into “Minor issue.” Not helpful.

How I fixed it:

I added a strict format: “Task / Owner / Date / Risks.” The model followed it well.
I made it ask one follow-up in-thread: “Are the dates right?” That tiny check cut misses.
I logged every message. When it went off script, I could see why and patch fast.

Result:

People finished tasks faster. I know because Slack threads got shorter, not longer.
Cost last month: about $11 for model calls. Worth it.

Tiny digression: I built the first version on a Sunday with iced coffee and a dog at my feet. It felt like a tiny win with big vibes.

App 2: A Support Bot That Knows Our Docs (and Knows When To Ask For Help)

I wanted a bot that reads our Notion docs and helps with support tickets. But it must say “I’m not sure” when the docs don’t match.

Tools I used: LlamaIndex for doc indexing, Pinecone for search, Claude 3.5 Sonnet for long answers, FastAPI for the service.
Data: Notion help center and a small Zendesk export.

If you plan to hook Notion content into your ticketing flow, the official page on the Notion–Zendesk integration gives a quick overview of how the two tools talk to each other.

What went right:

It found the right page most of the time and quoted key lines. Clear, short, and linked to sources. I love source quotes.
First reply time dropped a lot. Agents started from a good draft instead of a blank box.

What went wrong:

When docs were old, it made stuff up. Once it told a customer we had same-day refunds. We don’t. I felt that one in my gut.

How I fixed it:

I set a confidence floor. If scores were low, it said: “I’m not sure. Want me to tag support?” It asked, and then routed.
I added nightly crawls, so the index stays fresh.
I pinned “hard rules” for money things. If a refund came up, it read from a short, strict policy first.

How I tested it:

I replayed 50 past tickets. I tagged each answer as correct, off, or risky. Old me would’ve skipped this. New me is a believer.

Result:

Agents used the bot for drafts in about 70% of cases.
Wrong-answer rate dropped after the rules. My stress did too.
Cost: around $19 a month, mostly model and Pinecone. Still fine.

App 3: A Lead Qualifier That Runs on the Site Without Feeling Pushy

We needed a chat box on our site that asks a few smart questions and tags the lead in HubSpot.

Tools I used: Vercel AI SDK for the chat UI, OpenAI function calling to tag fields, HubSpot API, and a tiny rate limit with Upstash.
What it does: It asks three things, classifies fit (high, medium, low), and creates a lead with notes.

What went right:

It felt polite. It didn’t interrogate people. It used short questions and paused like a human.
Sales loved the tags. They got “Needs demo” or “Send case study” with reason lines.

One neat example from a very different corner of the internet: niche communities that cater to open-minded adults rely on soft-touch chat funnels to learn preferences without scaring visitors away. Just look at how the French libertine network Nous Libertin sets up its discreet onboarding—checking it out shows how a respectful tone and minimal questions can still qualify users effectively. For a Stateside spin, peek at how a regional listing board for casual connections in Oregon structures its pages; Backpage Grants Pass demonstrates simple category tags, concise headings, and clear calls-to-action that make it easy for visitors to find what they want without friction—handy inspiration if you’re fine-tuning your own chat-based qualifier.

What went wrong:

Trolls. Of course.
Also, it sometimes asked extra questions. It got nosey.

How I fixed it:

I added a content filter and a hard cap at four messages. After that, it says “Thanks! A human will follow up.”
I set “quiet hours” so it doesn’t ping the team at 2 a.m. Simple cron. Simple life.

Result:

More leads, fewer form drop-offs. Not huge, but steady.
Cost: about $6 last month. Hosting stayed cheap.

What I Liked Across Tools

Vercel AI SDK: Fast to ship. Nice streaming. The chat felt smooth.
LlamaIndex: Easy to wire Notion and build RAG. Less glue code.
Claude 3.5 Sonnet: Great on long context. Calm tone.
gpt-4o-mini: Cheap and quick for summaries and tags.
Pinecone: Simple setup. Good search. The free tier carried me for a while.
Need a quick, no-code landing page for your next AI tool? ZyWeb lets you publish in minutes and stay focused on shipping.

What bugged me:

LangChain got heavy in bigger flows. I kept losing track of state. I moved logic into small functions and kept it plain.
Rate limits pop up at the worst time. Log them. Retry with a backoff. Or you’ll chase ghosts.
Hallucinations never fully vanish. You need rules, sources, and a polite “I don’t know.”

Real Tips I Wish I Knew Sooner

Give the model a job, not a vibe. “Make a task list with owner and date” beats “Summarize the meeting.”
Ground answers in your data. Quote sources. Show links. People trust receipts.
Add an “I’m not sure” path right away. It saves your bacon.
Log everything. Prompts, outputs, errors. You can’t fix what you can’t see.
Start tiny. One use case. One team. Then grow. (I wrote up what happened when I built two tiny tools in this story.)
Watch cost by model choice. Use small models for tags and routing. Save big models for deep work.

Who Should Build These

Small teams that live in Slack, Notion, and HubSpot.
Solo devs who like fast loops and clear wins.
Not great for strict rules or heavy review. You need humans in the loop for money, health, or legal.

The Money and the Feel

My total monthly spend across the three apps stayed under $40 most months.
Average answer time felt snappy: 2–4 seconds for short tasks, 6–8 seconds for long ones.
Did it change my week? Yes. Fewer pings. Cleaner notes. Less stress.

I won’t lie: I had to babysit these apps for a bit. But once the guardrails were in, they felt steady. And when the bot said “I’m not sure,” I smiled. That’s trust.

If you build one thing first, make the Slack recap bot. People notice clean notes. Then add the support bot with a hard “I don’t know.” Your future self will thank you.

I Built Three LLM Apps for Real Work. Here’s What Actually Helped.

App 1: The Slack Meeting Buddy That Doesn’t Miss Action Items

App 2: A Support Bot That Knows Our Docs (and Knows When To Ask For Help)

App 3: A Lead Qualifier That Runs on the Site Without Feeling Pushy

What I Liked Across Tools

Real Tips I Wish I Knew Sooner

Who Should Build These

The Money and the Feel

More posts

I Tried Replit’s Rivals: What Actually Worked For Me

I Tried Websites Like bolt.new: What Worked For Me

I needed a bolt.new alternative. Here’s what actually worked for me.

I Went to the Vibe Coding Hackathon. Here’s What Actually Happened.