AI Ecosystem — Field Guide

§ 01 — How It All Fits Together (The Stack)

// FULL AI APPLICATION STACK — READ BOTTOM TO TOP

YOUR APP / USER

Web UI / Mobile / API

→

Business Logic

→

Output: insights, alerts, actions

ORCHESTRATION

Agent / Orchestrator

→

LangChain / LangGraph

→

Task Router / Planner

PROTOCOLS

MCP (agent ↔ tools)

A2A (agent ↔ agent)

AG-UI (agent ↔ frontend)

MODELS

LLM (reasoning/text)

Vision Model (images/video)

Embedding Model (search)

Classifier (labels)

DATA LAYER

Data Lake (raw, all types)

→

Data Warehouse (structured, analytics)

→

Vector DB (semantic search)

COMPUTE

Cloud GPU (train / big inference)

↔

Edge GPU / Jetson (local server)

↔

Qualcomm SOC / NPU (device/robot)

§ 02 — Model Types: What Each One Does

LLM — Large Language Model

Text in, text out. Reasoning, generation, summarization.

Trained on massive text. Uses a transformer architecture to predict the next token. Good at writing, reasoning, code, Q&A, summarization, and chat. An LLM doesn't "know" things — it predicts what text should come next based on patterns learned during training.

Think of it as a very well-read generalist that can talk about almost anything but doesn't automatically know what's happening right now.

ClaudeGPT-4oLlama 3Gemini

Vision Model

Images / video in, labels or text out.

Trained on image data. Can detect objects, classify scenes, count things, read text in images (OCR), track movement, and identify anomalies. Completely different architecture from LLMs — uses convolutional nets or vision transformers (ViT) that process pixel patches.

Great for: retail shrinkage detection, factory defect inspection, store foot traffic, security camera analysis.

YOLOSAMGPT-4VGemini Vision

Embedding Model

Text in, numbers out. Powers semantic search.

Converts text into a dense vector — a list of numbers that encodes meaning. Similar-meaning text produces similar vectors. This is how AI can search for "customer complaints about billing" and find relevant records even if those words aren't in the document.

The backbone of RAG (Retrieval-Augmented Generation) systems. You need this anytime you want AI to search your own data intelligently.

text-embedding-3BGECLIP

Traditional ML Model

Structured data in, predictions out.

The original AI. Decision trees, random forests, gradient boosting (XGBoost), logistic regression. Trained on labeled tabular data. Not neural networks. Excellent for fraud detection, credit risk scoring, churn prediction, spend classification, and anything with clean structured data.

Customer spend analysis and default risk — this is the right tool. Faster, cheaper, and more explainable than LLMs for those tasks.

XGBoostscikit-learnLightGBM

Multimodal Model

Text + image + audio in, mixed output.

An LLM that also has vision (and sometimes audio). It can look at a photo and describe it, answer questions about a chart, read a document image, or analyze surveillance footage. Modern frontier models (GPT-4o, Gemini, Claude) are all multimodal.

When you need one model to handle both text reasoning AND visual analysis, this is it.

GPT-4oGemini 2.0Claude 4LLaVA

Small Language Model (SLM)

Lightweight LLM — runs on device or edge.

A compressed or purpose-built LLM that can run on a laptop, phone, or edge chip without a cloud connection. Much lower compute cost. Less general capability, but if fine-tuned on your specific domain, can be surprisingly powerful for narrow tasks.

Relevant for you: running models on a Qualcomm SOC for robotics or embedded devices. Phi-3, Llama 3.2 1B, Gemma 2B all qualify.

Phi-3GemmaLlama 3.2 1B

§ 03 — The Data Layer: Where Your Information Lives

Database

Live transactions, real-time ops

Structured rows and columns. Your system of record — orders, customers, inventory, payments. Optimized for read/write speed, not analytics. PostgreSQL, MySQL, Oracle.

OLTPReal-time

Data Warehouse

Historical analytics, structured queries

Cleaned, structured data organized for business questions. Optimized for complex queries across time. "What was average customer spend by region over Q1?" This is where your customer spend analysis and default risk models draw from.

SnowflakeBigQueryRedshift

Data Lake

Raw everything — structured + unstructured

Store everything first, figure out schema later. Text files, logs, audio, video, JSON, CSV — all raw. Low cost. Used for AI training data, ML pipelines, and storing data you don't know how to use yet. Sentiment analysis of customer conversations starts here.

S3Azure ADLSDatabricks

Vector Database

Semantic similarity search

Stores embeddings (number vectors). Answers questions like "find me the 20 customer reviews most similar to this complaint." Enables RAG — giving an LLM access to your private data without retraining it. Critical for AI search applications.

PineconeWeaviatepgvector

§ 04 — Protocols & Frameworks: How Things Talk to Each Other

MCP — Model Context Protocol

Anthropic (Nov 2024) · Now universal standard

The "USB-C for AI." Before MCP, connecting an AI model to an external tool (a database, a CRM, a file system) required custom code for every single pair. MCP standardizes that connection — you write one MCP server for your tool, and any MCP-compatible AI can use it.

Agent is the client. Tool (database, API, calendar, Slack) is the server. MCP is the wire between them.

Think: Your AI agent asks "What's this customer's order history?" → MCP routes that to your CRM → CRM returns the data → agent reasons with it.

Adopted by: OpenAI, Google, AWS, IBM, Microsoft

A2A — Agent to Agent (Google)

Google · April 2025 · Open standard

MCP handles agent ↔ tool. A2A handles agent ↔ agent. When you have a system with a planner agent, a data retrieval agent, and a reporting agent, A2A is how they hand off tasks to each other — even if they're built on different frameworks or running in different places.

Before A2A: agents built on LangChain couldn't talk to agents built on Google ADK. A2A fixes this. It's the inter-agent routing layer.

Your question on "AAA" — this is A2A. That was the term you were reaching for.

JSON-RPC over HTTPSDiscovery via well-known URLs

LangChain / LangGraph

Orchestration framework · Python/JS

LangChain is the most popular framework for building AI applications. It gives you building blocks: connect an LLM, add memory, plug in tools, chain operations together. LangGraph is the extension for multi-step, stateful agent workflows — where the agent needs to loop, branch, or pass state between steps.

If MCP is the protocol, LangChain is the plumbing. Your application logic and agent behavior live here.

Open sourceChainsAgentsMemoryRAG

RAG — Retrieval-Augmented Generation

Pattern, not a product

RAG is a technique, not a tool. When a user asks a question, instead of relying only on what the LLM learned during training, RAG first searches your data (via a vector database), retrieves the relevant pieces, and feeds them into the LLM as context. The LLM then answers using your actual data.

This is how you give an LLM access to your company's private documents, customer records, or real-time data without retraining the entire model.

Flow: Question → Embed the question → Search vector DB → Retrieve top matches → LLM reads matches + answers.

§ 05 — Your Three Use Cases, Architected

USE CASE A: Sentiment Analysis + Ongoing Insight Engine

Ingest Raw Data

Pull customer conversations, reviews, support tickets, call transcripts from all sources.

Data Lake (S3/Azure) — raw unstructured text

Clean + Embed

Normalize text, chunk it, convert each chunk to a vector using an embedding model.

Embedding Model + Vector DB (Pinecone/Weaviate)

Classify Sentiment

Run each chunk through a classifier or fine-tuned LLM: positive/negative/neutral + topic.

Fine-tuned LLM or Classifier Model

Agent Monitors + Acts

Orchestrator agent watches for trend shifts, surfaces insights, generates recommended actions.

LangGraph Agent + MCP → CRM / Slack / Dashboard

Loop Updates

Scheduled pipeline re-runs on new data. Insight dashboard updates automatically.

Airflow / cron + streaming pipeline

USE CASE B: Customer Spend + Risk Analysis (Structured Data)

Source Data

Customer transaction history, account data, payment records — structured rows.

Database → ETL → Data Warehouse (Snowflake)

Feature Engineering

Build features: avg spend, spend variance, days since last payment, product mix, etc.

SQL + dbt + Python (pandas)

Train Risk Model

Train an ML model (XGBoost) on labeled data: defaulted / not defaulted. Outputs probability score.

XGBoost / scikit-learn — traditional ML

Query with LLM

LLM + RAG over warehouse. "What's avg spend for customers with risk score > 0.7?" → SQL → answer.

LLM + Text-to-SQL + Data Warehouse

Dashboard

Scores, segments, risk buckets, spend tiers surfaced in BI tool or agent interface.

Tableau / Superset / custom React UI

Transformer

The architecture behind most modern AI. Learns relationships between words/tokens at scale. The "T" in GPT.

Token

The unit LLMs process — roughly ¾ of a word. "Hello world" ≈ 2 tokens. Models charge per token and have context limits measured in tokens.

Context Window

How much text an LLM can "read" at once. GPT-4o: 128k tokens. Claude: 200k. Bigger window = can read more docs at once, but costs more.

Vector / Embedding

A list of numbers (e.g. 1536 floats) that encodes the meaning of text. Similar meaning → similar numbers. Foundation of semantic search.

Fine-tuning

Further training a pre-trained model on your specific data. Like giving a generalist intern domain-specific training. Results in a specialized model.

RAG

Retrieval-Augmented Generation. Search your data first → feed results to LLM → LLM answers using your actual data. No retraining needed.

Inference

Running a trained model to get predictions. Training is expensive (days/weeks/millions). Inference is cheaper (milliseconds per query).

TOPS

Tera Operations Per Second. How fast a chip can run AI math. More TOPS = can run bigger models or process faster. Qualcomm Snapdragon X Elite: 80 TOPS.

NPU

Neural Processing Unit. A chip specialized for AI math (matrix multiplication). Found in Qualcomm SOCs, Apple Silicon. Far more efficient than CPU for AI workloads.

SOC

System on a Chip. CPU + GPU + NPU + memory on one chip. Qualcomm Snapdragon is an SOC. Power-efficient. Used in phones, robots, edge devices.

MoE — Mixture of Experts

Model architecture where only a subset of the network activates per query. Allows very large models to run efficiently. Used in Mixtral, Gemini, GPT-4.

Hallucination

When a model generates confident but false information. Not lying — it doesn't know what it doesn't know. RAG and grounding reduce this.

Structured Data

Rows and columns. SQL databases, spreadsheets, CSVs. Has a schema. Easy for ML models to process directly.

Unstructured Data

Text, images, audio, video, PDFs. No schema. Requires AI/ML to extract structure. Most of the world's data is unstructured.

Data Lakehouse

Hybrid of lake + warehouse. Store everything (lake) but add structure and governance for analytics (warehouse). Databricks, Snowflake, BigQuery all do this.

Orchestration

Coordinating multiple AI components — models, agents, tools, pipelines — so they work together reliably. LangChain, LangGraph, and Apache Airflow are orchestration tools.

Model Onboarding

The process of adapting a model to run efficiently on specific hardware — optimizing formats, quantizing weights, benchmarking performance. YYZdata's core service.

Quantization

Reducing model precision (e.g., 32-bit → 4-bit) to shrink size and speed up inference. Required to run larger models on edge hardware.

AI ECOSYSTEMFIELD GUIDE

AI ECOSYSTEM
FIELD GUIDE