Trained on massive text. Uses a transformer architecture to predict the next token. Good at writing, reasoning, code, Q&A, summarization, and chat. An LLM doesn't "know" things — it predicts what text should come next based on patterns learned during training.
Think of it as a very well-read generalist that can talk about almost anything but doesn't automatically know what's happening right now.
Trained on image data. Can detect objects, classify scenes, count things, read text in images (OCR), track movement, and identify anomalies. Completely different architecture from LLMs — uses convolutional nets or vision transformers (ViT) that process pixel patches.
Great for: retail shrinkage detection, factory defect inspection, store foot traffic, security camera analysis.
Converts text into a dense vector — a list of numbers that encodes meaning. Similar-meaning text produces similar vectors. This is how AI can search for "customer complaints about billing" and find relevant records even if those words aren't in the document.
The backbone of RAG (Retrieval-Augmented Generation) systems. You need this anytime you want AI to search your own data intelligently.
The original AI. Decision trees, random forests, gradient boosting (XGBoost), logistic regression. Trained on labeled tabular data. Not neural networks. Excellent for fraud detection, credit risk scoring, churn prediction, spend classification, and anything with clean structured data.
Customer spend analysis and default risk — this is the right tool. Faster, cheaper, and more explainable than LLMs for those tasks.
An LLM that also has vision (and sometimes audio). It can look at a photo and describe it, answer questions about a chart, read a document image, or analyze surveillance footage. Modern frontier models (GPT-4o, Gemini, Claude) are all multimodal.
When you need one model to handle both text reasoning AND visual analysis, this is it.
A compressed or purpose-built LLM that can run on a laptop, phone, or edge chip without a cloud connection. Much lower compute cost. Less general capability, but if fine-tuned on your specific domain, can be surprisingly powerful for narrow tasks.
Relevant for you: running models on a Qualcomm SOC for robotics or embedded devices. Phi-3, Llama 3.2 1B, Gemma 2B all qualify.
Structured rows and columns. Your system of record — orders, customers, inventory, payments. Optimized for read/write speed, not analytics. PostgreSQL, MySQL, Oracle.
Cleaned, structured data organized for business questions. Optimized for complex queries across time. "What was average customer spend by region over Q1?" This is where your customer spend analysis and default risk models draw from.
Store everything first, figure out schema later. Text files, logs, audio, video, JSON, CSV — all raw. Low cost. Used for AI training data, ML pipelines, and storing data you don't know how to use yet. Sentiment analysis of customer conversations starts here.
Stores embeddings (number vectors). Answers questions like "find me the 20 customer reviews most similar to this complaint." Enables RAG — giving an LLM access to your private data without retraining it. Critical for AI search applications.
The "USB-C for AI." Before MCP, connecting an AI model to an external tool (a database, a CRM, a file system) required custom code for every single pair. MCP standardizes that connection — you write one MCP server for your tool, and any MCP-compatible AI can use it.
Agent is the client. Tool (database, API, calendar, Slack) is the server. MCP is the wire between them.
MCP handles agent ↔ tool. A2A handles agent ↔ agent. When you have a system with a planner agent, a data retrieval agent, and a reporting agent, A2A is how they hand off tasks to each other — even if they're built on different frameworks or running in different places.
Before A2A: agents built on LangChain couldn't talk to agents built on Google ADK. A2A fixes this. It's the inter-agent routing layer.
LangChain is the most popular framework for building AI applications. It gives you building blocks: connect an LLM, add memory, plug in tools, chain operations together. LangGraph is the extension for multi-step, stateful agent workflows — where the agent needs to loop, branch, or pass state between steps.
If MCP is the protocol, LangChain is the plumbing. Your application logic and agent behavior live here.
RAG is a technique, not a tool. When a user asks a question, instead of relying only on what the LLM learned during training, RAG first searches your data (via a vector database), retrieves the relevant pieces, and feeds them into the LLM as context. The LLM then answers using your actual data.
This is how you give an LLM access to your company's private documents, customer records, or real-time data without retraining the entire model.
Pull customer conversations, reviews, support tickets, call transcripts from all sources.
Normalize text, chunk it, convert each chunk to a vector using an embedding model.
Run each chunk through a classifier or fine-tuned LLM: positive/negative/neutral + topic.
Orchestrator agent watches for trend shifts, surfaces insights, generates recommended actions.
Scheduled pipeline re-runs on new data. Insight dashboard updates automatically.
Customer transaction history, account data, payment records — structured rows.
Build features: avg spend, spend variance, days since last payment, product mix, etc.
Train an ML model (XGBoost) on labeled data: defaulted / not defaulted. Outputs probability score.
LLM + RAG over warehouse. "What's avg spend for customers with risk score > 0.7?" → SQL → answer.
Scores, segments, risk buckets, spend tiers surfaced in BI tool or agent interface.
Video feeds from store cameras. Edge device captures and preprocesses frames locally.
Vision model runs on-device: detects people, tracks movement paths, identifies products being handled.
Separate model watches for anomalies — items not scanned, concealment behavior, dwell patterns near exits.
Only flagged events (not raw video) sent to cloud for review. Saves bandwidth. Privacy-preserving.
LLM agent synthesizes patterns: "Shrinkage events cluster near aisle 4, 5–7pm Fridays." Triggers alert or restock recommendation.
5–80 TOPS. Runs small models (SLMs) locally. Private, offline, no cloud cost. For prototyping, edge demos, or privacy-sensitive use cases. Limited to smaller models.
75–275 TOPS. Runs full vision models and medium LLMs locally. Powers robots, smart cameras, industrial automation. Real-time with no cloud round-trip. 10–30W power draw.
High-end inference or fine-tuning you want to keep private. More capital cost upfront, lower per-token cost over time. Relevant for regulated industries or IP-sensitive workloads.
Effectively unlimited scale. Where frontier model training happens (millions of $ runs). Pay per token / per hour. Where most production AI inference lives today. 300–700W per GPU.
An agent is an LLM that can take actions — not just answer questions. It has access to tools (search, database, API, code runner), can decide which tool to call, observe the result, and then decide what to do next. It loops until the task is complete.
Single-agent: one LLM doing everything. Multi-agent: specialized agents (planner, researcher, writer, validator) handing tasks to each other via A2A. More reliable for complex workflows.
Context window — what the agent can "see" right now. Limited. Gets expensive fast.
Short-term memory — conversation history stored in the prompt. Fades when the session ends.
Long-term memory — stored in a vector DB or database. Persists. The agent can retrieve relevant past interactions using semantic search.
Tools / skills — what the agent can do: search web, query DB, write file, call API, send Slack message, run code. Each tool is exposed via MCP.