AI literacy for the builder’s mind

  • Android Studio Panda 4 and the Rise of AI-First Kotlin Development

    Why students should learn to build AI-enabled Android apps now

    Mobile Development Is Not Declining — It Is Becoming the Edge of AI

    There is an absolute myth floating around that mobile application development is somehow on the decline as a development technology to learn.

    Nothing could be further from the truth.

    With edge computing, the Internet of Things, and the rising need to make AI-powered intelligent applications available everywhere, learning to build apps with Android is one of the fastest lanes for new developers to learn the software engineering principles that put information intelligence in everyone’s pocket.

    Back in the big days of IBM in the early 1990s, everyone was talking about “ubiquitous computing.” We did not fully know what it meant then, but it had that cool technical panache. It sounded like the future was coming, even if the shape of that future was still hidden in the fog.

    Now the fog has lifted.

    We are all living on the edge now.

    Our phones are not just communication devices. They are sensors, cameras, wallets, identity systems, learning tools, business dashboards, AI clients, and personal command centres. The mobile platform is where cloud intelligence, local data, human attention, and real-world context all meet.

    That is why Android development matters.

    Mobile platform computing is not yesterday’s skill. It is the next enabler.

    And for students who want to become serious developers in the AI age, Android Kotlin development offers something rare: a practical, hands-on way to learn user interface design, APIs, cloud integration, databases, secure architecture, edge-aware thinking, and AI-powered business logic — all inside one platform that people actually carry with them every day.

    The future of AI will not live only in research labs, enterprise dashboards, or browser windows.

    It will live in apps.

    It will live in pockets.

    And the developers who understand how to build those apps will be the ones who help bring intelligence to the edge of everyday life.

    Android development has entered a new era.

    https://docs.google.com/presentation/d/1crk3WAhsI5V9iRj7OXPBgMmvWXtxTank/edit?usp=sharing&ouid=103411675731117310047&rtpof=true&sd=true

    Now Android Studio Panda 4 adds something new to that picture: AI is becoming part of the development environment itself.

    Android Studio Panda 4 is now stable and includes major AI-assisted development features such as Planning Mode, Next Edit Prediction, Ask Mode, and Agent Web Search. Google describes Planning Mode as a way for the agent to create a detailed project plan before making code changes, while Next Edit Prediction is designed to suggest related edits even away from the current cursor position. (Android Developers)

    That matters deeply for students.

    Because the winning student of the next few years will not merely know how to “use AI.” The winning student will know how to build AI into applications.

    And in Android Kotlin development, that means learning to place AI where it belongs: not as a toy chatbot pasted onto the side of the app, but as a properly designed service inside the business logic layer.


    The new Android developer is an AI systems builder

    Here is the shift I want my students to understand:

    The app is no longer just a user interface connected to a database.
    The modern app is a user interface connected to memory, reasoning, retrieval, workflow, and AI services.

    That is why AI should now sit at the centre of your development efforts.

    Not because AI writes all the code for you.

    That is the cheap interpretation.

    The serious interpretation is this:

    A modern Kotlin Android app may now include:

    LayerTraditional purposeAI-first purpose
    Compose UIDisplay screens and receive inputLet users interact with intelligent workflows
    ViewModelManage state and eventsCoordinate AI calls, loading states, retrieved context, and generated responses
    RepositoryFetch and store dataRetrieve documents, notes, embeddings, and AI outputs
    Business logicApply rulesDecide when to call Gemini, ChatGPT, Grok, or another model
    Backend/FirebaseAuthentication, storage, functionsSecure key management, model routing, AI service orchestration
    MongoDB / vector storeStore documentsSupport retrieval-augmented generation, or RAG

    This is where the gold is.

    Students who learn this early can build apps that do more than display information. They can build apps that reason over information.


    1. Planning Mode

    Planning Mode is the big one.

    Instead of asking AI to immediately produce code, students can ask the agent to create an implementation plan first. This supports the teaching principle we have been developing for Android classes: deliberation before coding.

    That lines up directly with our Planning Mode teaching module: students should read the specification, identify UI responsibilities, data responsibilities, navigation, state, and risks before touching the code. The teaching material emphasizes that Planning Mode is a “no code edits yet” phase where students produce a written implementation plan before implementation begins.

    This is exactly how we stop students from treating AI as a vending machine.

    The student should not say:

    “Build me the app.”

    The student should say:

    “Here is the app specification. Create a plan showing screens, state, repositories, API calls, data models, error handling, and testing steps. Do not write code yet.”

    That is the difference between AI dependency and AI-augmented engineering.


    2. Next Edit Prediction

    Next Edit Prediction, or NEP, is especially useful for Kotlin students because Android development often involves related changes across multiple files.

    Change a data class, and you may need to update:

    • a ViewModel
    • a repository
    • a mapper
    • a Compose screen
    • a test
    • a Firebase DTO
    • a serialization model

    Google describes NEP as an evolution of code completion that anticipates edits away from the current cursor position, not just at the line where you are typing. (Android Developers)

    For teaching, this is beautiful.

    It helps students see that professional code is connected. A change in one file has consequences elsewhere. NEP becomes a kind of “codebase radar.”


    3. Agent Web Search

    Agent Web Search lets the Gemini agent pull current documentation for third-party libraries directly into Android Studio. Google’s release notes describe this as expanding Gemini beyond the Android knowledge base so it can fetch current reference material from the web for external libraries such as Coil, Koin, or Moshi. (Android Developers)

    This matters because students often work from outdated tutorials.

    Agent Web Search helps keep the student closer to current practice.


    The real win: AI inside the app, not just inside the IDE

    The IDE is only half the story.

    The more important teaching move is this:

    Use AI to build the app, then build AI into the app.

    That means teaching students to integrate model APIs into Kotlin apps through a clean architecture.

    Do not hardwire “Gemini” or “ChatGPT” all over the UI.

    Instead, teach a stable abstraction:

    interface AIClient {
        suspend fun complete(prompt: String): String
    }
    

    Then you can have different implementations:

    class GeminiClient : AIClient {
        override suspend fun complete(prompt: String): String {
            // Call Gemini through Firebase AI Logic
            return "Gemini response"
        }
    }
    
    class OpenAIClient : AIClient {
        override suspend fun complete(prompt: String): String {
            // Call OpenAI Responses API through backend or secure service
            return "OpenAI response"
        }
    }
    
    class GrokClient : AIClient {
        override suspend fun complete(prompt: String): String {
            // Call xAI Grok API through backend or secure service
            return "Grok response"
        }
    }
    

    This teaches students one of the most valuable professional patterns in AI application development:

    Your app should depend on an AI capability, not on a single vendor.

    Firebase AI Logic supports Gemini model access from mobile and web apps, including Kotlin and Java SDKs for Android. (Firebase) OpenAI’s platform exposes APIs for text, structured output, multimodal workflows, tools, and stateful interactions through the Responses API. (OpenAI Developers) xAI also provides API access for integrating Grok models into applications. (xAI Docs)

    That means students can learn a vendor-neutral design:

    Compose UI
       ↓
    ViewModel
       ↓
    AIUseCase / Business Logic
       ↓
    AIClient interface
       ↓
    Gemini / ChatGPT / Grok / other model provider
    

    That is serious architecture.

    That is employable knowledge.


    Firebase as the RAD backbone for AI apps

    Firebase is now one of the fastest ways to teach students how to build serious AI-enabled mobile apps without forcing them to become backend infrastructure engineers on day one.

    Firebase AI Logic is designed to let developers build generative AI features into mobile and web apps using Gemini models, with Android support through Kotlin and Java SDKs. (Firebase) Firebase also provides a Gemini API template through Firebase Studio for building apps with the Gemini API pre-loaded. (Firebase)

    For students, Firebase can serve as a RAD environment: Rapid Application Development.

    It gives them a practical path to:

    • authenticate users
    • store app data
    • call cloud functions
    • manage AI access more safely
    • avoid embedding raw API keys directly into the Android app
    • connect app logic to Gemini-powered features

    This is a major professionalism point.

    One of the pitfalls in AI-assisted Android development is leaking sensitive data or keys to third-party APIs, or sending user data without proper masking and consent. Our AI-assisted Android pitfall guide explicitly flags weak privacy handling, bad key practices, and poor review/testing habits as recurring problems students must learn to avoid.

    So the classroom message is simple:

    Do not build “AI toy apps.”
    Build AI apps with architecture, privacy, testing, and secure backend thinking.


    Lab: Build an AI Study Coach with Android Studio Panda 4, Kotlin, Firebase, Gemini, and MongoDB RAG

    Project theme

    Students will build a simple AI-powered Android app called:

    StudyForge AI

    The app helps a student save study notes and then ask questions about those notes.

    Example:

    The student saves notes like:

    Kotlin coroutines let us run asynchronous work without blocking the main thread.
    

    Then the student asks:

    Why should I not make network calls on the main thread?
    

    The app retrieves relevant notes from MongoDB, sends them as context to an AI model, and returns a study explanation.

    That gives students a practical introduction to RAG: Retrieval-Augmented Generation.

    MongoDB Atlas Vector Search supports semantic search by storing vector representations of data and retrieving relevant documents for generative AI applications. (MongoDB) MongoDB’s own RAG tutorials show how to create vector search indexes, store embeddings, and retrieve relevant documents for LLM-powered applications. (MongoDB)

    For a student lab, I would keep MongoDB on the backend side rather than embedding database credentials directly into the Android app. The Android app should call Firebase or a small backend endpoint, and that backend should talk to MongoDB.

    That keeps the app cleaner and safer.


    What students will build

    The app will include:

    FeaturePurpose
    Add study noteUser saves short study notes
    View saved notesCompose displays a list
    Ask AIUser asks a question
    Retrieve contextBackend searches MongoDB for relevant notes
    Generate answerGemini, ChatGPT, Grok, or another model answers using retrieved notes
    Display answerCompose UI shows the AI response

    Following current development trends, we showcase the new, Compose way of doing Android.

    The benefit? Lots – but – mainly, if you want to play in this space you need to get on board with Docker and especially the CI CD way of generating your apps directly from GIT. Git does Code. XML declaration files for UI – not so much.

    Architecture

    Android Kotlin App
       ↓
    Jetpack Compose UI
       ↓
    StudyCoachViewModel
       ↓
    StudyCoachRepository
       ↓
    Firebase Callable Function or HTTPS endpoint
       ↓
    MongoDB notes collection + vector search
       ↓
    AI model provider: Gemini / ChatGPT / Grok
       ↓
    Answer returned to Android app
    

    This is the key teaching point:

    The Android app is not “the whole system.”
    The Android app is the mobile front end of an AI-enabled system.

    That is how modern apps increasingly work.


    Step 1: Create the Android Studio Panda 4 project

    1. Open Android Studio Panda 4.
    2. Create a new project.
    3. Choose a Kotlin + Jetpack Compose project.
    4. Use the Gemini API Starter template where available.
    5. Run the starter app on an emulator.

    Now pause.

    Before coding, students must use Planning Mode.

    Prompt:

    I am building an Android Kotlin Jetpack Compose app called StudyForge AI.
    
    The app lets users save short study notes, view them in a list, ask a question, retrieve relevant notes from a MongoDB-backed RAG service, and send the question plus retrieved notes to an AI model.
    
    Create an implementation plan only. Do not write code yet.
    
    Include:
    - screens
    - composables
    - ViewModel state
    - repository methods
    - backend API calls
    - data models
    - loading and error states
    - testing steps
    

    Students should save the plan as part of the assignment.

    This matches the teaching strategy from our earlier Planning Mode module: students should submit not only working code, but also the plan, prompts, AI responses, and their own edits to the plan.


    Step 2: Create the core data model

    Create a Kotlin data class:

    data class StudyNote(
        val id: String,
        val text: String,
        val createdAt: Long
    )
    

    Then create a second model for AI answers:

    data class StudyAnswer(
        val question: String,
        val answer: String,
        val sources: List<StudyNote>
    )
    

    Teaching note:

    This is a good moment to use Next Edit Prediction. After changing the data model, students should watch how Android Studio suggests related updates in ViewModels, repositories, or UI files.


    Step 3: Build the Compose screen

    Create a simple Compose screen:

    @Composable
    fun StudyForgeScreen(
        viewModel: StudyForgeViewModel = viewModel()
    ) {
        val notes by viewModel.notes.collectAsState()
        val newNote by viewModel.newNote.collectAsState()
        val question by viewModel.question.collectAsState()
        val answer by viewModel.answer.collectAsState()
        val isLoading by viewModel.isLoading.collectAsState()
    
        Column(modifier = Modifier.padding(16.dp)) {
            Text("StudyForge AI")
    
            OutlinedTextField(
                value = newNote,
                onValueChange = viewModel::onNewNoteChanged,
                label = { Text("Add a study note") }
            )
    
            Button(onClick = viewModel::saveNote) {
                Text("Save Note")
            }
    
            LazyColumn {
                items(notes) { note ->
                    Text(note.text)
                }
            }
    
            OutlinedTextField(
                value = question,
                onValueChange = viewModel::onQuestionChanged,
                label = { Text("Ask a question") }
            )
    
            Button(onClick = viewModel::askQuestion) {
                Text("Ask AI")
            }
    
            if (isLoading) {
                Text("Thinking...")
            }
    
            if (answer.isNotBlank()) {
                Text("AI Answer")
                Text(answer)
            }
        }
    }
    

    This is not meant to be visually perfect.

    It is meant to teach structure.

    Students can improve the UI later.


    Step 4: Create the ViewModel

    class StudyForgeViewModel(
        private val repository: StudyForgeRepository = StudyForgeRepository()
    ) : ViewModel() {
    
        private val _notes = MutableStateFlow<List<StudyNote>>(emptyList())
        val notes: StateFlow<List<StudyNote>> = _notes
    
        private val _newNote = MutableStateFlow("")
        val newNote: StateFlow<String> = _newNote
    
        private val _question = MutableStateFlow("")
        val question: StateFlow<String> = _question
    
        private val _answer = MutableStateFlow("")
        val answer: StateFlow<String> = _answer
    
        private val _isLoading = MutableStateFlow(false)
        val isLoading: StateFlow<Boolean> = _isLoading
    
        fun onNewNoteChanged(value: String) {
            _newNote.value = value
        }
    
        fun onQuestionChanged(value: String) {
            _question.value = value
        }
    
        fun saveNote() {
            val text = _newNote.value.trim()
            if (text.isBlank()) return
    
            val note = StudyNote(
                id = UUID.randomUUID().toString(),
                text = text,
                createdAt = System.currentTimeMillis()
            )
    
            _notes.value = _notes.value + note
            _newNote.value = ""
    
            viewModelScope.launch {
                repository.saveNote(note)
            }
        }
    
        fun askQuestion() {
            val currentQuestion = _question.value.trim()
            if (currentQuestion.isBlank()) return
    
            viewModelScope.launch {
                _isLoading.value = true
                _answer.value = repository.askAI(currentQuestion)
                _isLoading.value = false
            }
        }
    }
    

    Teaching note:

    Students must understand why the AI call runs inside viewModelScope.launch.

    One of the common Android AI pitfalls is running inference or network calls on the main thread, causing freezes or ANRs. Our pitfall guide specifically recommends lifecycle-aware background work such as coroutines, WorkManager, and lifecycle-aware scopes for AI integration labs.


    Step 5: Create the repository

    class StudyForgeRepository(
        private val apiClient: StudyForgeApiClient = StudyForgeApiClient()
    ) {
        suspend fun saveNote(note: StudyNote) {
            apiClient.saveNote(note)
        }
    
        suspend fun askAI(question: String): String {
            return apiClient.askQuestion(question)
        }
    }
    

    The repository keeps the ViewModel clean.

    This is where students learn separation of concerns.

    The UI should not know whether the answer came from Gemini, ChatGPT, Grok, or a future model that has not been invented yet.


    Step 6: Connect to Firebase or backend endpoint

    For teaching, keep this part simple.

    The Android app calls:

    POST /saveNote
    POST /askQuestion
    

    The backend handles:

    1. storing notes in MongoDB
    2. embedding the note
    3. retrieving relevant notes
    4. calling the selected AI model
    5. returning the answer

    A simplified Android API client might look like:

    class StudyForgeApiClient {
    
        suspend fun saveNote(note: StudyNote) {
            // Send note to Firebase function or backend endpoint
        }
    
        suspend fun askQuestion(question: String): String {
            // Send question to Firebase function or backend endpoint
            // Receive AI answer as String
            return "AI answer will appear here"
        }
    }
    

    In a production-quality version, students should use Retrofit, Ktor Client, Firebase Functions, or Firebase AI Logic depending on the teaching path.


    The backend should perform this sequence:

    Receive question
       ↓
    Generate embedding for the question
       ↓
    Search MongoDB for similar note embeddings
       ↓
    Retrieve top 3–5 relevant notes
       ↓
    Build prompt with retrieved notes
       ↓
    Call Gemini / ChatGPT / Grok
       ↓
    Return answer to Android app
    

    Example prompt sent to the model:

    You are a helpful study coach.
    
    Use only the notes below as your source material.
    If the answer is not present in the notes, say what is missing.
    
    Student question:
    {question}
    
    Relevant notes:
    {retrieved_notes}
    
    Answer in clear student-friendly language.
    

    This teaches students that RAG is not magic.

    It is a workflow:

    Store knowledge. Retrieve relevant knowledge. Add it to the prompt. Ask the model to answer from that context.


    Step 8: Add a model switcher

    Once the Gemini path works, students can add a provider setting:

    enum class AIProvider {
        GEMINI,
        OPENAI,
        GROK
    }
    

    Then the backend can route the request:

    if provider == GEMINI → call Gemini
    if provider == OPENAI → call OpenAI
    if provider == GROK → call xAI Grok
    

    This reinforces vendor-neutral architecture.

    The lesson is not “learn one AI API.”

    The lesson is:

    Learn how AI APIs fit into application architecture.

    That is a much more durable skill.


    Student deliverables

    Students submit:

    1. Screenshot of the running app
    2. Kotlin data models
    3. Compose screen
    4. ViewModel
    5. Repository/API client
    6. Planning Mode document
    7. AI prompts used
    8. Short reflection: “What did AI help with, and what did I have to verify?”

    Assessment should not reward blind copying. Our prior Android teaching outline stresses that students should be graded on planning, AI prompt quality, edits, final code clarity, and their ability to critique AI output.


    Common warnings for students

    Do not put raw API keys in your Android app

    Mobile apps can be inspected. Secrets embedded in APKs are not truly secret.

    Use Firebase, backend functions, or secure server-side routing.

    Do not paste private user data into prompts without thinking

    AI apps must be designed with privacy awareness.

    Do not accept generated code blindly

    AI can create code that looks professional but contains lifecycle mistakes, outdated APIs, bad threading, or weak error handling.

    Do not start with multi-agent complexity

    For student projects, begin with one clean API call.

    Then add retrieval.

    Then add model switching.

    Then add advanced orchestration.

    In that order.


    Conclusion: this is the moment for AI-enabled Android students

    Android Studio Panda 4 is not just another IDE update.

    It is a signal.

    The development environment is becoming AI-assisted. The applications are becoming AI-enabled. The student who understands both sides of that equation has a real advantage.

    This is why I am bringing this into my teaching practice.

    Students should not graduate knowing only how to build static screens and simple CRUD apps. They should graduate understanding how to build apps where AI is part of the reasoning layer, the business logic layer, and the user value proposition.

    The next wave of Android apps will not merely ask:

    “What button did the user press?”

    They will ask:

    “What does the user need to understand, decide, retrieve, summarize, automate, or create?”

    That is the opportunity.

    Android Studio Panda 4 gives us the development environment.

    Kotlin gives us the app architecture.

    Firebase gives us the rapid backend.

    MongoDB gives us the memory and retrieval layer.

    Gemini, ChatGPT, Grok, and other models give us the reasoning engines.

    Now the job of the student is to learn how to connect them intelligently.

    Where the next generation of AI-enabled Android developers will win.

    Show Me The Money: The Android Job Scene in Toronto

    Let’s be blunt: you are not studying late at night and grinding through labs for a gold star sticker. You want a career, rent money, travel money, and—yes—some room for fun. Android development in Toronto/GTA can absolutely give you that.

    Right now there are around a hundred Android‑focused roles and many more “mobile developer (iOS/Android)” postings in the Toronto area, across banks, consultancies, and product companies. That means real demand, not just hype. Companies like General Motors, TD, Tangerine, and dozens of startups and fintechs list Android and Kotlin as core skills for their mobile teams.glassdoor+5

    The pay is serious even at the junior level. Glassdoor data for Toronto shows Android developers earning a typical base range of about 66,000–101,000 CAD, with an average around 88,000 CAD once you have some experience under your belt. PayScale puts an entry‑level Android developer (less than one year) around 51,000 CAD and early‑career (1–4 years) around 73,000 CAD in Toronto. In other words, if you put a couple of focused years into building skills and a portfolio, seeing numbers in the 70k–90k range is realistic—not a fantasy.glassdoor+1

    As you level up, the ceiling gets much higher. Senior and staff Android roles in Toronto regularly advertise six‑figure salaries, with some postings showing 140,000–160,000 CAD or more for specialized Android work. Crypto, fintech, and big‑tech‑adjacent companies sometimes push even higher, with some data sources reporting averages above 120,000 CAD for experienced Android developers in the city.glassdoor+2


    Why This Matters For Your Life (Not Just Your Resume)

    Money isn’t everything, but it changes your options. A solid Android or mobile developer salary in Toronto can mean:

    • Moving out sooner and choosing where you want to live, instead of taking whatever is cheapest.
    • Paying off OSAP or other loans on your terms.
    • Having the budget for travel, festivals, hobbies, and the kind of social life that makes your twenties and thirties memorable.
    • The confidence that comes from being in demand—recruiters reach out to you, not the other way around.

    Whether you’re a pragmatic young woman who wants independence and career security, or a young guy who wants enough income to impress himself and everyone around him, the equation is the same: tech skills that employers actually pay for. Android is one of those skills.


    How George Brown Full Stack Leads Into These Jobs

    Here’s the good news: the George Brown Full Stack program already teaches most of the building blocks Toronto employers are paying for.

    Job ads for Android and junior mobile developers in the GTA consistently mention:

    • Kotlin or Java, plus Android Studio, as the main programming environment.glassdoor+1
    • REST APIs, JSON, and cloud platforms like Firebase or AWS.linkedin+1
    • Databases and data modeling—skills you practice in your back‑end and SQL courses.payscale+1
    • Version control with Git and working in agile teams.indeed+1

    When you add a couple of focused Android projects on top of your Full Stack coursework—especially AI‑powered apps built in Android Studio Panda 4—you suddenly match the wish list in real Toronto job postings. The difference between “I took some courses” and “I can show you a working Android app that talks to a cloud backend and uses AI” is the difference between hoping for a job and walking into interviews with leverage.


    Android + AI: An Edge In a Crowded Market

    Toronto is competitive, which means you want something that makes your resume jump out of the pile. Right now, that “something” is clearly AI.

    Employers are already asking mobile teams to integrate chatbots, recommendation systems, and smart in‑app assistants. When you can say, “I’ve built an AI‑powered Android app in Kotlin using Jetpack Compose, Firebase, and an external model like Gemini or ChatGPT,” you are no longer just another junior dev—you are the person who can help them ship the next generation of their product.

    That’s exactly what we practice in my labs: Android Studio Panda 4, AI agents in the IDE, Firebase for secure backends, and MongoDB/RAG for intelligent data retrieval. It’s not just a cool classroom exercise; it’s training for the job descriptions that are live in Toronto right now.glassdoor+3


    Bottom Line

    If your goal is financial independence, career flexibility, and the ability to build things people actually use every day, Android development is a very pragmatic path—especially when combined with the George Brown Full Stack program. The market is there, the salary bands are real, and the skills you learn in class map directly to what Toronto employers are hiring for.

    Toronto Android developer roles expect a mix of solid Kotlin/Android fundamentals, modern architecture, cloud/API skills, and collaboration practices.indeed+3


    Core Android & Kotlin skills

    • Strong Kotlin (and often some Java) with Android Studio and the Android SDK.indeed+2
    • Experience building screens with Jetpack Compose and modern UI toolkits.indeed
    • Understanding of Android components (activities, fragments, services), app lifecycle, and manifest configuration.indeed+1
    • Familiarity with design patterns like MVVM, MVP, or Clean Architecture.indeed+2

    Architecture, data, and networking

    • Comfortable using coroutines and Flow or other reactive patterns for async work.indeed
    • Consuming RESTful APIs and JSON, including authentication and error handling.indeed+2
    • Local data storage with Room/SQLite or similar, and awareness of caching strategies.indeed+1
    • Basic understanding of app performance, memory, and responsiveness on mobile devices.indeed+1

    Testing, tooling, and DevOps habits

    • Unit and UI testing using tools like JUnit and Espresso; some roles mention test automation and TDD.indeed+2
    • Git proficiency (branches, pull requests, code review) and experience with CI/CD is commonly requested.indeed+1
    • Ability to debug, troubleshoot crashes, and stay on top of security updates and vulnerabilities.indeed+1

    Cloud, cross‑platform, and AI‑adjacent expectations

    • Experience with cloud services such as Firebase or AWS (auth, analytics, serverless functions, etc.).indeed+2
    • Many “mobile developer” postings want Android plus iOS or React Native, so awareness of Swift/Objective‑C or cross‑platform frameworks is a plus.linkedin+2
    • Increasingly, job ads mention AI‑enhanced workflows or modern tooling, and some junior roles (e.g., at Intuit) explicitly reference AI‑assisted coding and UX‑focused Android development.talent

    Professional and soft skills

    • Ability to understand a mobile app end‑to‑end: from UI, through business logic, to backend integration.indeed
    • Collaboration with designers, product owners, and other developers in agile teams, often using Jira/Confluence.indeed+1
    • Clear written and verbal communication, plus a portfolio of apps or Play Store contributions is frequently listed as “strongly preferred.”indeed+2

    If you can:

    • build a Kotlin/Compose app,
    • talk to a cloud backend (e.g., Firebase),
    • integrate REST APIs,
    • write basic tests, and
    • work in Git with a team,

    https://ca.indeed.com/q-android-kotlin-l-toronto,-on-jobs.html?vjk=ef5b5150148db027

  • From Stone Tablets to Grok Connectors: 5 Best Practices to 1000x Your Productivity


    In the beginning, there was the stone tablet—and it was great.

    It allowed persistent memory across generations.

    Somebody who figured out a cool way to capture a mastodon could share it with the young learners of the tribe.

    But chisels are difficult to wield and clay tablets are expensive, so after a while a smart guy named Gutenberg said,

    “I have an idea. Let’s make a movable type press, ink, and paper. Suddenly we could mass-produce tracts, spark the Renaissance in Europe, and give everyone access to knowledge.”

    If your thinking is still stuck in the stone-tablet-and-chisel era, you’re missing massive leverage. Today we’re re-gearing our minds to take full advantage of Grok’s live connectors to Gmail, Google Calendar, Notion, and more.

    Best Practice: Stop reading every email. Ask Grok to search, summarize, and extract action items instead.

    Example prompts that work incredibly well:

    • “Search my Gmail for everything about the Q3 budget and give me a decision-ready summary with open questions.”
    • “Show me all unread emails from clients in the last 48 hours, ranked by urgency.”

    Why it 1000x’s you: You go from reactive inbox slave to strategic commander.

    Best Practice: Treat your calendar as a living database, not a static list.

    Powerful examples:

    • “Find 90 minutes of deep work blocks I can protect this week.”
    • “Scan my calendar and Gmail for anything related to ‘Toronto real estate’ and surface conflicts.”

    Best Practice: Stop copying and pasting. Create a unified knowledge layer.

    Try this:

    • “Pull the last three emails about the CTO from Gmail, the relevant Notion pages, and my calendar notes—then write a one-page briefing.”

    Best Practice: Never let important relationships fall through the cracks.

    • “Find every email I sent to prospects in the last 30 days that hasn’t received a reply. Draft friendly follow-ups.”

    Best Practice: Build simple recurring commands that replace entire productivity systems.

    • Morning: “Good morning briefing — unread high-priority emails, today’s calendar, top 3 Notion tasks.”
    • End-of-day: “Close the loop — what got done, what needs follow-up tomorrow.”

    The New Renaissance Starts Now

    We’re living through another Gutenberg moment—except this time the “printing press” thinks, searches, summarizes, and acts across all your tools in real time.

    Start small. Pick one of the five practices above and try it today. Then come back and tell the community what changed for you.

    What’s the first prompt you’re going to try? Drop it in the comments!

    — Peter & Grok
    🚀


    Productivity Workbook: 10 Grok Connector Prompts to Amplify Your Information Universe

    Print this, copy it into Notion, or keep it in this chat. Work through one prompt per day for the next 10 days. After each exercise, note what surprised you, what you saved, and how much time you reclaimed.

    These prompts assume your Gmail, Google Calendar, and Notion are connected to Grok. Just paste them directly into our chat and watch the magic happen.

    Day 1: Inbox Zero with Intelligence

    Prompt: “Search my Gmail for all unread emails from the last 7 days. Group them by priority (high/medium/low), extract key action items and deadlines, and give me a 5-bullet decision dashboard.”

    Goal: Move from overwhelm to clarity in under 60 seconds.

    Follow-up you can add: “Draft replies to the top 3 high-priority ones.”

    Day 2: Calendar Audit & Protection

    Prompt: “Analyze my Google Calendar for the next 14 days. Identify conflicts, back-to-back meetings, and unprotected deep work time. Suggest an optimized schedule with at least 2 hours of focused blocks per day and move or cancel low-value items.”

    Goal: Reclaim your time and energy. Follow-up: “Block the suggested deep work slots on my calendar.”

    Day 3: Cross-Tool Knowledge Synthesis

    Prompt: “Pull the last 5 emails about [specific topic, e.g., ‘CTO’ or ‘Q3 budget’] from Gmail, relevant pages from my Notion workspace, and any related calendar events. Synthesize everything into a one-page briefing with risks, opportunities, and recommended next actions.”

    Goal: Stop hunting across apps. Pro tip: Replace the topic with whatever you’re working on.

    Day 4: Automatic Follow-Up Engine

    Prompt: “Find every email I sent in the last 30 days that has no reply. List the top 5 most important ones with context from previous threads, then draft personalized, friendly follow-up messages for each.”

    Goal: Never lose momentum on relationships or deals.

    Day 5: Morning Chief-of-Staff Briefing

    Prompt: “Good morning briefing: Show unread high-priority emails, today’s calendar events with prep notes, top 3 open Notion tasks, and one powerful focus question for the day.”

    Goal: Start every day like you have a world-class assistant. Variation: Change to “End-of-day close the loop” at night.

    Day 6: Meeting Superpowers

    Prompt: “After my [meeting name/time] today, summarize the key decisions, action items with owners and deadlines, and create corresponding Notion tasks plus calendar reminders for follow-ups in 7 and 30 days.”

    Goal: Never lose what was said in a meeting again.

    Day 7: Weekly Review on Steroids

    Prompt: “Run my full weekly review: Major accomplishments this week from email + calendar + Notion, open loops or risks, calendar conflicts for next week, and three key lessons or insights.”

    Goal: Turn reflection into rocket fuel.

    Day 8: Idea Capture & Development

    Prompt: “Anytime I have a new idea, create a new Notion page in my ‘Ideas’ database. Pull any related emails or calendar context, expand the idea with pros/cons, and suggest first 3 action steps.”

    Goal: Capture and grow ideas instead of losing them.

    Day 9: Relationship Nurturing

    Prompt: “Who in my key network (personal or professional) have I not connected with in over 45 days? Pull context from past emails or meetings and suggest short, warm outreach messages or coffee catch-up invites.”

    Goal: Strengthen your network without extra effort.

    Day 10: Custom System Builder

    Prompt: “Help me design a custom productivity system. I want [describe your needs, e.g., ‘daily email triage + weekly Notion review + automatic client follow-ups’]. Suggest 3–5 recurring Grok prompts I can use and show me how to combine Gmail, Calendar, and Notion.”

    Goal: Build your own personalized AI operating system.


    1. Do it live — Paste the prompts exactly as written, then refine them in follow-up messages.
    2. Track results — After each prompt, write down: Time saved / Insight gained / Output created.
    3. Scale it — Once comfortable, combine prompts (e.g., morning briefing + follow-up engine).
    4. Share wins — Reply here or on X with your favorite prompt and results. We’ll feature the best ones.

    You now have everything you need to move from stone-tablet thinking to commanding an intelligent, connected productivity universe.

    Which prompt are you starting with today?

    Paste it here and I’ll run it with you right now, or refine it for your exact needs.

  • The AI Agent Accessibility Imperative: Don’t Be the Sears of the Agentic Web


    The web is bifurcating.

    The time to build for the new channel is before your competitors realize the channel exists.

    Before We Talk About AI Agents, Let’s Talk About a Catalogue

    If you grew up in Canada before the millennium, you probably remember the Sears catalogue. Not as a historical artifact — as furniture. It sat on the kitchen table, the coffee table, the shelf beside the phone. It was how Canadians shopped for everything from refrigerators to hockey equipment to school clothes. For generations, Sears wasn’t just a retailer. It was infrastructure.

    At its peak, Sears Canada operated over 100 full-line department stores and more than 1,700 catalogue pick-up locations across the country. It employed approximately 17,000 people. It was, by any reasonable measure, one of the most trusted and deeply embedded commercial institutions in Canadian life. The Sears name carried the weight of reliability, range, and reach that no competitor could match.

    It closed permanently in 2017.

    The bankruptcy filing cited the usual suspects — pension shortfalls, declining foot traffic, aggressive competition — but the forensic cause of death was something more specific and more instructive:

    Two failures compounded each other with fatal efficiency.

    First, Sears entered e-commerce late — not slightly late, but strategically late, in the way that signals an organization that treated the new channel as an experiment rather than an existential imperative. Amazon, Best Buy, and a generation of born-digital retailers had already built the logistics networks, the customer trust, and the user experience standards that would define what “shopping online” meant. Sears arrived at that table after the food was gone.

    Second, and more quietly devastating, Sears underinvested in mobile. As the smartphone became the primary device through which Canadians browsed, compared, and purchased, Sears’s digital presence remained optimized for a desktop experience that fewer and fewer people were using.

    They were building for the audience of five years ago while their competitors were building for the audience of five years ahead.

    The lesson is not that Sears was incompetent. They were not.

    They were large, experienced, and resource-rich.

    The lesson is that competence accumulated under one set of channel assumptions does not automatically transfer when the channel itself transforms. Sears knew retail. They never fully learned the new retail.

    Because the channel is transforming again — and most web developers are making exactly the same category of error Sears made.

    Not the technical error.

    The attitudinal error.

    The assumption that the primary consumer of your web content is a human being sitting at a browser, making deliberate navigational choices.

    That assumption is becoming less true every quarter. And the rate at which it is becoming less true is accelerating.

    This series is about not making that mistake.

    Sears had two fatal blind spots: they assumed the channel for commerce was still stores, and they assumed the device for web interaction was still a desktop.

    By the time they corrected both assumptions, the market had already structurally reorganized around their competitors.

    Web developers today face an identical structural risk — and the analogues map precisely:

    Sears Failure2026 Equivalent
    Ignored e-commerce as a channelIgnoring AI agents as primary web consumers
    Built for desktop, not mobileBuilding for humans, not AI agent parsing
    “We’ll get to it later”“SEO and structured data can wait”
    Too many entrenched stakeholders to pivot fastLegacy JS-heavy SPA architecture that breaks agent crawlers

    The signal is already in the data.

    Perplexity, ChatGPT, Google AI Overviews, Claude, Copilot, and dozens of enterprise AI agents are right now replacing direct human browsing for information retrieval.

    When a user asks an AI agent “compare the top three project management tools,” no human opens five tabs.

    The agent does — or more likely, it never opens tabs at all. It synthesizes from indexed, structured, accessible content.

    The sites that get cited are the ones that were built to be parseable.


    What AI Agents Actually Need From Your Web Content

    This is the technical literacy gap. Most developers instinctively think “accessibility” means screen readers and WCAG compliance. AI agent accessibility is a different surface of concerns entirely, though it overlaps in important ways.

    1. llms.txt — The New robots.txt You Aren’t Implementing Yet

    An emerging standard (championed by Answer.AI’s Jeremy Howard among others) that provides a structured, Markdown-formatted summary of what your site contains and how an LLM should navigate it.

    Think of it as a machine-readable table of contents and intent declaration for your site.

    # YourSite
    > A platform for full-stack developer education
    
    ## Core Content
    - [Course Catalog](/courses): All available courses with descriptions
    - [Documentation](/docs): Technical reference for all tools
    - [Blog](/blog): Weekly articles on web development
    
    ## Key Concepts
    This site covers React, Node.js, PostgreSQL, and DevOps for working developers.
    

    Place it at yourdomain.com/llms.txt. It’s to AI agents what sitemap.xml was to Google crawlers in 2005. Early movers will benefit disproportionately.

    2. JSON-LD Structured Data / Schema.org — The Semantic Layer You’re Probably Underusing

    Search engines have required this for years for rich snippets. AI agents use it to understand entity relationships, not just index keywords. Every page on your site should have appropriate Schema markup:

    • Article / BlogPosting for editorial content
    • Course for educational content
    • FAQPage for knowledge bases (this one is especially powerful for RAG systems)
    • HowTo for procedural content
    • Product / Service for commercial offerings
    • Organization and Person for entity disambiguation

    The developer who implements FAQPage schema today is creating structured training signal that AI agents will preferentially surface when answering user questions in their domain.

    3. Server-Side Rendering (SSR) / Static Site Generation (SSG) — Not Just a Performance Win

    Here’s the dirty secret of the React/Next.js/Vue ecosystem: most AI agents and crawlers cannot execute JavaScript.

    A client-side rendered SPA returns essentially an empty <div id="root"> to a crawler. Your content is invisible.

    The shift to Next.js App Router, Astro, Nuxt, and SvelteKit isn’t just about Core Web Vitals.

    It’s about ensuring your content exists in the initial HTML payload that any agent, crawler, or parser receives.

    Action today: Audit your site with JavaScript disabled.

    What an AI agent sees is roughly what you see.

    4. Semantic HTML — The Foundation That Still Gets Ignored

    AI agents parse document structure. A page where everything is <div> and <span> is informationally flat. A page with proper <article>, <section>, <h1> through <h3> hierarchy, <nav>, <main>, <aside>, and <figure> with <figcaption> gives an agent a navigable knowledge structure.

    This is what your full-stack tech stack diagram should be teaching under “Frontend Basics” — not as an accessibility checkbox, but as agent-legibility infrastructure.

    5. The Model Context Protocol (MCP) — The API Layer for Agentic Integration

    Anthropic’s MCP is rapidly becoming the standard by which AI agents interact with external services and data sources. If your web application exposes functionality — booking, querying, transacting, retrieving — building an MCP server for it means AI agents can use your service, not just read about it.

    This is the mobile-first moment for agentic integration. The platforms that built MCP endpoints in 2025 will be the ones that appear in “use this tool” recommendations by AI assistants in 2026-2027. Shopify, Stripe, Linear, and others are already there.

    6. Content Architecture for Retrieval-Augmented Generation (RAG)

    AI agents that power enterprise tools don’t just crawl the open web — they ingest, chunk, and embed your content into vector databases for retrieval.

    Content that is modular, clearly scoped, and self-contained at the section level embeds well and retrieves accurately.

    Practically this means:

    • Write headings that are complete declarative statements, not clever one-word labels
    • Each section should answer one question fully without requiring adjacent context
    • Avoid pronouns that reference content from a previous section (“As we discussed above…”)
    • Use definition-first writing: state the concept, then elaborate
    • Explicit summaries and conclusions at section and page level

    This is writing for chunking. An AI agent slicing your article into 512-token windows will either surface coherent, useful segments — or it will surface confusing fragments. The architecture of your prose determines which.

    7. Metadata Completeness — OpenGraph, Twitter Cards, and Beyond

    When an AI agent synthesizes a response and needs to attribute or recommend a source, it reads metadata to understand what the page is, who wrote it, when it was published, and whether it’s authoritative. Incomplete metadata = lower confidence = lower citation frequency.

    Every page needs:

    • og:title, og:description, og:image, og:type
    • article:author, article:published_time, article:modified_time
    • Canonical URLs (duplicate content confuses agent indexing)
    • <meta name="description"> that accurately summarizes the page

    8. Explicit robots.txt Governance for AI Crawlers

    The AI crawler landscape is fragmented. GPTBot, ClaudeBot, PerplexityBot, Bytespider, and dozens of others follow robots.txt conventions — to varying degrees. You need a deliberate policy:

    • Decide which AI crawlers you want to allow and which to block
    • Be aware that blocking all AI crawlers means invisibility in AI-powered search
    • Selectively expose high-value content and protect proprietary/paywalled material

    Not having a policy is a policy — one made by default and likely not in your interest.


    Sears-mode thinking has a characteristic internal monologue:

    “Our core users are still human.

    AI agents are a niche.

    We’ll address it in a future sprint. It’s not urgent yet. Let’s not over-engineer.”

    This is precisely the logic that Sears used about mobile commerce in 2010. It wasn’t urgent. Until it was catastrophically late.

    The structural difference this time is the speed of channel adoption.

    Mobile adoption took roughly a decade to become dominant.

    AI agent-mediated information retrieval is moving in an 18-to-36-month window.

    The S-curve is steeper.

    The developers and teams building for agent-accessibility now are not over-engineering — they are future-proofing their distribution channel.


    The Pragmatic Starting Checklist for Today

    For the developer who wants to start this week, not after a full architectural review:

    1. Add llms.txt to your domain root — 30 minutes, zero dependencies
    2. Audit with JS disabled — Chrome DevTools → Settings → Disable JavaScript. Photograph what you see.
    3. Add FAQPage JSON-LD to your highest-traffic content pages — immediate RAG pickup
    4. Verify SSR — if you’re on Create React App with no SSR, plan your Next.js or Astro migration
    5. Review heading hierarchy — use a browser extension like HeadingsMap to visualize your <h> structure
    6. Complete your OpenGraph metadata — use opengraph.xyz to preview what agents see
    7. Set a deliberate robots.txt AI crawler policy — even if that policy is “allow all for now”
    8. Write one page explicitly for chunking — restructure your best-performing article using the RAG writing principles above, then monitor its citation frequency in AI tools

    The web is bifurcating into:

    (1) content that AI agents cite and surface to their users — and

    (2) content that exists on the web but never appears in the answers those users actually see.


  • Google Willow: This Week’s Quantum Computing Breakthrough

    AI With Peter: Business AI Literacy

    Here’s what you need to know about Google’s Willow quantum processor — without the hype, without the science fiction, and without pretending this is going to replace your data center next quarter.

    What Google Actually Built

    Google Quantum AI has built a 105-qubit superconducting quantum processor called Willow. The breakthrough is not that it’s big. The breakthrough is that it works better as it gets bigger.

    That sentence might not sound revolutionary, but it solves one of the fundamental problems that has kept quantum computing in the lab for decades.

    The Real Achievement: Error Correction That Scales

    Classical computers are reliable. You can store a bit, read it back, copy it a million times, and it stays the same.

    That’s why your laptop doesn’t randomly corrupt your files.

    Every previous attempt to scale up quantum systems ran into the same wall: adding more qubits meant adding more noise. The system got worse, not better.

    Google’s published results show below-threshold quantum error correction.

    In plain English: as they increased the size of their error-correcting quantum memory systems, the logical error rate improved rather than deteriorating.

    If errors decrease as you scale up, you have a path to building quantum computers that can actually complete useful calculations before they fall apart.

    How Willow’s Error Correction Works (The Business Version)

    Think of a regular qubit like a single employee trying to remember a complex instruction while sitting in a noisy restaurant. They’re going to make mistakes.

    Error correction is like having a team of people who cross-check each other. But in previous quantum systems, adding more people to the team just meant more confusion — more chances for someone to mishear, more coordination overhead, more chaos.

    Willow’s breakthrough is that the cross-checking team actually reduces errors as the team grows. More qubits, properly configured, means less noise in the final answer.

    That’s counterintuitive. It’s also essential.

    What This Means for Business Today

    Short Answer: Nothing immediate. Willow is not a product you can buy. It’s a research milestone.

    Slightly Longer Answer: This is the foundation for everything that comes next.

    Google has opened a Willow Early Access Program for selected researchers.

    Scientific proposals are due May 15, 2026, with selection notifications planned for July 1, 2026.

    The hardware is being made available to serious researchers who want to run experiments on circuits, quantum simulations, and error-correction protocols.

    The Business Implications That Matter

    If you’re a business leader, manager, or investor trying to understand where quantum computing fits in your strategic horizon, here’s the framework.

    Timeline Reality Check

    TimeframeWhat’s HappeningWhat Business Can Do
    2026-2027Research-grade quantum processors available to select institutionsTrack developments; build quantum literacy in technical teams
    2028-2030Early specialized applications in pharma, materials science, optimizationIdentify high-value use cases; establish partnerships with quantum vendors
    2031-2035Quantum advantage in specific domains; hybrid classical-quantum workflowsPilot programs for applicable problems; infrastructure planning
    Beyond 2035Potentially transformative quantum computing for chemistry, cryptography, AIStrategic integration; competitive positioning

    Where Quantum Computing May Actually Help

    Quantum computers are not faster classical computers. They solve different kinds of problems using different physics.

    The business applications most likely to benefit are:

    1. Drug Discovery and Materials Science
    Simulating molecular interactions and chemical reactions is exponentially hard for classical computers. Quantum systems can model quantum chemistry natively.

    Implication: Pharmaceutical companies, materials manufacturers, and energy companies should track quantum simulation capabilities.

    2. Optimization Problems
    Portfolio optimization, logistics routing, supply chain configuration, network design — problems where you’re searching massive solution spaces for optimal configurations.

    Implication: Financial services, logistics companies, and manufacturing operations may see early quantum advantage here.

    3. Cryptography and Security
    Quantum computers will eventually break current encryption standards. That’s a threat and an opportunity.

    Implication: IT security teams need post-quantum cryptography roadmaps now. The NSA and NIST have already published quantum-resistant standards.

    4. Machine Learning and AI
    Quantum machine learning is speculative, but certain optimization and pattern-recognition tasks may benefit from quantum acceleration.

    Implication: AI-heavy companies should watch this space but not count on it for current roadmaps.

    Let’s clear the air on what quantum computers — even breakthrough ones like Willow — cannot and will not do:

    Replace your cloud infrastructure
    Classical computers will remain dominant for almost everything.

    Run your ERP system faster
    Quantum computers are not general-purpose speed machines.

    Solve NP-complete problems instantly
    Quantum advantage is real but bounded. It’s not magic.

    Work at room temperature in your data center
    Willow operates at millikelvin temperatures in specialized quantum facilities.

    Deliver immediate ROI for typical business software
    This is scientific and engineering infrastructure, not enterprise SaaS.

    If you’re evaluating quantum computing as an investment opportunity, ask these three questions:

    1. Is the company solving a real problem or selling quantum buzzwords?

    Real: “We are developing quantum algorithms for molecular simulation in drug discovery.”
    Buzzword fog: “Our quantum-powered AI will revolutionize all industries with quantum advantage.”

    2. What is the error correction strategy?

    Willow’s milestone matters because error correction is the hard problem. Any quantum computing company that doesn’t have a credible error-correction roadmap is not serious.

    3. What is the classical baseline?

    Quantum advantage only matters if the quantum approach actually beats the best classical algorithm.

    Many “quantum advantage” claims dissolve when compared to optimized classical computing.

    A company that can’t clearly articulate their classical baseline doesn’t understand their own value proposition.

    What You Should Do This Week

    Here’s the practical move for business leaders and technology managers.

    Step 1: Build Quantum Literacy

    You don’t need a physics PhD. You need to understand:

    • What quantum computers are good at (simulation, certain optimization problems, cryptography)
    • What they’re not good at (everything else)
    • Where your business intersects quantum-relevant problems

    Step 2: Audit Your Cryptography

    Even if you never use a quantum computer, quantum computers will affect you through post-quantum cryptography.

    Action item: Ask your security team if your encryption systems are quantum-resistant. If they don’t know, that’s your answer.

    Step 3: Identify Your Quantum-Relevant Problems

    Make a list of hard computational problems in your business:

    • Molecular simulations?
    • Complex optimization?
    • Cryptographic security?
    • High-dimensional pattern matching?

    If you have problems on that list, quantum computing might eventually matter to you. If you don’t, you can watch the field develop without panic.

    Step 4: Track, Don’t Chase

    Quantum computing is advancing. Willow proves that. But it’s advancing from research milestones toward engineering challenges toward eventual commercial applications.

    That journey takes years, sometimes decades.

    The winning strategy is not to throw money at quantum projects because they sound futuristic. The winning strategy is to understand the trajectory, identify where it intersects your domain, and position yourself to adopt when the technology actually delivers advantage.

    The Bottom Line

    Google’s Willow chip is a genuine breakthrough in quantum error correction. It demonstrates that quantum systems can become more reliable as they scale up — which is the opposite of what happened in every previous generation of quantum hardware.

    That’s important.

    It’s also not ready to run your business.

    What business leaders should understand is this:

    Quantum computing is real. It’s advancing. It will eventually matter for specific, valuable problems in chemistry, materials science, optimization, and cryptography. But it’s not replacing classical computing, and it’s not a magic solution to generic business challenges.

    • Real capability and science fiction
    • Useful application and buzzword marketing
    • Strategic positioning and premature commitment

    And frankly, that’s a more interesting story than the hype would suggest.

    Because the real revolution isn’t that quantum computers will be magic.

    The real revolution is that we’re learning how to make them work.


    About AI With Peter: Practical technology analysis for business leaders who want to understand what’s real, what’s hype, and what to do about it. None of the noise. Just the signal.

  • Beyond the Chat Window: Why Grok 4.3’s API Changes the Cognition Economics for Business Users

    In the past 24 hours, xAI released Grok 4.3—a base LLM with December 2025 knowledge cutoff, 1M token context, and aggressive pricing ($1.25/M input, $2.50/M output).

    AI with Peter | May 1, 2026

    The headlines focus on benchmark scores: #1 on CaseLaw v2 (79.3%), 1500 ELO on GDPval-AA agentic tasks, 98% on τ²-Bench Telecom.

    But the real story isn’t what Grok 4.3 can do—it’s what happens when you stop treating AI as a conversation partner and start treating it as programmable infrastructure.

    The chat interface is a demonstration environment.

    The API is the production environment.

    And for business users—project leaders managing organizational memory, data analysts automating insight pipelines, educators scaling personalized learning—this distinction is the difference between experimenting with AI and embedding intelligence into operational workflows.

    The Chat Trap: Why Conversational AI Doesn’t Scale

    Here’s the problem with chat interfaces: they optimize for single-session convenience at the expense of cross-session composability. Every conversation starts from zero. There’s no state persistence, no workflow integration, no programmatic control over temperature, top-k sampling, or system prompts. You can’t A/B test prompt strategies. You can’t batch-process 500 customer support tickets overnight. You can’t version-control your inference logic or deploy it behind authenticated endpoints.

    Chat is fine for prototyping. But it’s the computational equivalent of writing production code in a REPL—useful for exploration, catastrophic for operations.

    The API, by contrast, lets you:

    1. Separate concerns: Decouple prompt engineering from delivery UI
    2. Compose workflows: Chain LLM calls with deterministic logic, external data sources, and validation layers
    3. Control cost: Run batch jobs at 50% lower rates, cache system prompts, dynamically adjust token limits
    4. Monitor quality: Log input/output pairs, track latency/cost per request, build feedback loops
    5. Scale horizontally: Process concurrent requests, integrate with existing CI/CD pipelines, deploy multi-tenant solutions

    This is cognition as infrastructure—not as dialogue, but as a computational primitive you can orchestrate alongside databases, message queues, and business logic.

    Grok 4.3’s pricing model makes this particularly compelling: $1.25/M input tokens is 20% cheaper than previous versions while delivering better performance on agentic benchmarks. For high-volume workflows—legal document review, financial report generation, curriculum personalization—this shifts the ROI calculation from “interesting experiment” to “operational necessity.”


    Three Winning Use Cases: API-First Cognition in Practice

    Use Case 1: Business/Project Leaders — Organizational Memory as Code

    The Problem: Project leaders manage fragmented institutional knowledge—meeting notes, decision logs, technical documentation, Slack threads. This knowledge degrades over time: context gets lost, decisions are re-litigated, new hires can’t find the “why” behind legacy systems.

    The API Solution: Build a queryable organizational memory that ingests artifacts (meeting transcripts, technical specs, product roadmaps), embeds them in vector space, and exposes a REST API for natural language retrieval.

    Grok 4.3’s 1M token context window means you can stuff entire project histories into a single prompt without chunking/retrieval fragility. The agentic performance (1500 ELO on GDPval-AA) means it can reason over multi-document sets to synthesize answers like:

    • “What were the trade-offs we considered when choosing React over Vue in Q2 2025?”
    • “Show me all decisions that assumed our Series B would close by October 2025.”
    • “Generate an onboarding doc for the payments team that explains our fraud detection pipeline.”

    This isn’t just search—it’s institutional reasoning. And because it’s API-driven, you can:

    • Integrate with Slack/Teams to answer questions inline
    • Trigger weekly summary emails of key decisions
    • Version-control prompt templates as your organizational knowledge evolves
    • Enforce access controls (Finance team gets finance docs, Engineering gets technical specs)

    Cost Analysis: Assume 500 queries/month, averaging 50K input tokens (context) + 1K output tokens (response):

    • Input: 500 × 50K × $1.25 = $31.25
    • Output: 500 × 1K × $2.50 = $1.25
    • Total: $32.50/month for a system that replaces 10+ hours of “digging through Notion/Confluence” labor per employee.

    Use Case 2: Data Analysts — Automated Insight Pipelines

    The Problem: Analysts spend 60% of their time on data janitorial work—cleaning CSVs, normalizing column names, writing SQL to join disparate sources—and only 40% on insight generation. The cognition-intensive parts (pattern detection, anomaly explanation, stakeholder reporting) are bottlenecked by manual preprocessing.

    The API Solution: Build a self-serve analytics pipeline where non-technical stakeholders upload raw data, describe what they need, and receive publication-ready reports.

    Grok 4.3’s multimodal capabilities + domain specialization (98% on τ²-Bench Telecom, #1 on CorpFin) mean it can:

    1. Ingest messy data: Parse Excel files with merged cells, footnotes, and color-coded categories
    2. Generate analysis code: Write Python/SQL to clean, transform, and visualize data
    3. Explain findings: Produce executive summaries that connect statistical patterns to business decisions

    Example workflow:

    User uploads: Q1_sales_messy.xlsx
    User prompt: "Which regions underperformed, and why?"
    
    API pipeline:
    1. Grok reads Excel → identifies structure issues (e.g., "Region" column has typos: "North East" vs "Northeast")
    2. Generates pandas code to normalize, aggregate by region, compute variance
    3. Runs code, produces chart + insights:
       - "Northeast underperformed by 18% vs forecast due to delayed product launch in Feb"
       - "Southwest overperformed by 12%, driven by Q1 marketing campaign"
    4. Returns Markdown report + chart PNG
    

    Why API > Chat: Analysts can batch-process 50 datasets overnight, log all transformations for auditability, and integrate this into existing BI dashboards (e.g., Tableau’s Python API). The chat interface forces manual uploads, non-reproducible interactions, and zero version control.

    Cost Analysis: 100 datasets/month, 20K input tokens (data preview + prompt) + 5K output tokens (code + report):

    • Input: 100 × 20K × $1.25 = $2.50
    • Output: 100 × 5K × $2.50 = $1.25
    • Total: $3.75/month to eliminate 20-30 hours of manual data prep.

    Use Case 3: Content Creators (Teachers/Educators) — Personalized Learning at Scale

    The Problem: Educators want to personalize instruction (adaptive problem sets, differentiated reading levels, targeted feedback), but doing this manually is cognitively expensive. Creating 5 difficulty tiers for a single algebra problem set takes hours. Grading 30 essays with individualized feedback is a weekend job.

    The API Solution: Build a learning orchestration platform where the API generates:

    1. Adaptive assessments: Student answers Question 1 incorrectly → API generates a scaffolded follow-up at lower difficulty
    2. Multi-tier content: One lesson plan → API produces versions for grade levels 6-12
    3. Personalized feedback: Batch-grade essays, flagging conceptual gaps and suggesting resources

    Grok 4.3’s domain specialization + fast inference (197 tokens/second) makes this feasible at classroom scale. Example: A teacher uploads a biology unit on cellular respiration. The API:

    • Generates 3 versions: Honors (college-level terminology), Standard (age-appropriate), Remedial (ELL-friendly)
    • Creates 15 formative assessment questions per tier
    • Provides answer keys + worked explanations
    • Flags common misconceptions based on simulated student errors

    Cost Analysis: Generate 50 lessons/semester, 30K input tokens (source material + instructions) + 15K output tokens (3 tiers × 5K tokens each):

    • Input: 50 × 30K × $1.25 = $1.88
    • Output: 50 × 15K × $2.50 = $1.88
    • Total: $3.76/semester to create content that would take 40+ hours manually.

    Code Lab: Building a Production-Ready Grok 4.3 Integration

    This lab walks through building an organizational knowledge API (Use Case 1) with Node.js, covering:

    1. API authentication and basic completion
    2. Document chunking for 1M token context
    3. Streaming responses for UX
    4. Error handling and rate limiting
    5. Prompt versioning and A/B testing

    Prerequisites

    • Node.js 18+
    • xAI API key (get from console.x.ai)
    • Basic familiarity with Express.js

    Step 1: Project Setup

    mkdir grok-knowledge-api
    cd grok-knowledge-api
    npm init -y
    npm install express dotenv axios
    

    Create .env:

    XAI_API_KEY=your_api_key_here
    PORT=3000
    

    Step 2: Basic API Client

    Create lib/grokClient.js:

    const axios = require('axios');
    
    class GrokClient {
      constructor(apiKey) {
        this.apiKey = apiKey;
        this.baseURL = 'https://api.x.ai/v1';
        this.model = 'grok-4.3-2025-12';
      }
    
      async complete(messages, options = {}) {
        const {
          temperature = 0.7,
          max_tokens = 4000,
          stream = false,
        } = options;
    
        try {
          const response = await axios.post(
            `${this.baseURL}/chat/completions`,
            {
              model: this.model,
              messages,
              temperature,
              max_tokens,
              stream,
            },
            {
              headers: {
                'Authorization': `Bearer ${this.apiKey}`,
                'Content-Type': 'application/json',
              },
              responseType: stream ? 'stream' : 'json',
            }
          );
    
          return stream ? response.data : response.data.choices[0].message.content;
        } catch (error) {
          console.error('Grok API Error:', error.response?.data || error.message);
          throw new Error(`API request failed: ${error.response?.status || 'Unknown'}`);
        }
      }
    
      // Calculate cost for a given request
      calculateCost(inputTokens, outputTokens) {
        const inputCost = (inputTokens / 1_000_000) * 1.25;
        const outputCost = (outputTokens / 1_000_000) * 2.50;
        return {
          inputCost: inputCost.toFixed(4),
          outputCost: outputCost.toFixed(4),
          totalCost: (inputCost + outputCost).toFixed(4),
        };
      }
    }
    
    module.exports = GrokClient;
    

    Step 3: Document Chunking Strategy

    Grok 4.3 supports 1M tokens, but you still want smart chunking for:

    • Cost control: Only send relevant context
    • Latency optimization: Smaller prompts = faster responses
    • Logical boundaries: Preserve semantic coherence

    Create lib/documentProcessor.js:

    class DocumentProcessor {
      constructor() {
        // Rough heuristic: 1 token ≈ 4 characters for English
        this.charsPerToken = 4;
      }
    
      // Estimate token count (rough approximation)
      estimateTokens(text) {
        return Math.ceil(text.length / this.charsPerToken);
      }
    
      // Chunk document by semantic boundaries (paragraphs/sections)
      chunkByParagraphs(text, maxTokensPerChunk = 50000) {
        const paragraphs = text.split(/\n\s*\n/);
        const chunks = [];
        let currentChunk = [];
        let currentTokens = 0;
    
        for (const para of paragraphs) {
          const paraTokens = this.estimateTokens(para);
          
          if (currentTokens + paraTokens > maxTokensPerChunk && currentChunk.length > 0) {
            chunks.push(currentChunk.join('\n\n'));
            currentChunk = [para];
            currentTokens = paraTokens;
          } else {
            currentChunk.push(para);
            currentTokens += paraTokens;
          }
        }
    
        if (currentChunk.length > 0) {
          chunks.push(currentChunk.join('\n\n'));
        }
    
        return chunks;
      }
    
      // Prepare context for a query (retrieve top-k relevant chunks)
      prepareContext(allDocuments, query, maxContextTokens = 100000) {
        // Simple keyword-based relevance (in production, use embeddings + vector search)
        const queryTerms = query.toLowerCase().split(/\s+/);
        
        const scoredChunks = allDocuments.map(doc => {
          const docLower = doc.content.toLowerCase();
          const score = queryTerms.reduce((acc, term) => {
            const matches = (docLower.match(new RegExp(term, 'g')) || []).length;
            return acc + matches;
          }, 0);
    
          return { ...doc, relevanceScore: score };
        });
    
        // Sort by relevance, take top chunks within token budget
        scoredChunks.sort((a, b) => b.relevanceScore - a.relevanceScore);
        
        const selectedDocs = [];
        let totalTokens = 0;
    
        for (const doc of scoredChunks) {
          const docTokens = this.estimateTokens(doc.content);
          if (totalTokens + docTokens <= maxContextTokens) {
            selectedDocs.push(doc);
            totalTokens += docTokens;
          } else {
            break;
          }
        }
    
        return {
          documents: selectedDocs,
          totalTokens,
          coverage: `${selectedDocs.length}/${allDocuments.length} documents`,
        };
      }
    }
    
    module.exports = DocumentProcessor;
    

    Step 4: Knowledge Query API

    Create server.js:

    require('dotenv').config();
    const express = require('express');
    const GrokClient = require('./lib/grokClient');
    const DocumentProcessor = require('./lib/documentProcessor');
    
    const app = express();
    app.use(express.json());
    
    const grok = new GrokClient(process.env.XAI_API_KEY);
    const processor = new DocumentProcessor();
    
    // Mock document store (in production: use vector DB like Pinecone, Weaviate)
    const knowledgeBase = [
      {
        id: 'doc_001',
        title: 'Q2 2025 Product Roadmap',
        content: `Our Q2 2025 roadmap focuses on three pillars:\n\n1. Mobile-first redesign: Complete React Native migration by June 15\n2. AI-powered search: Integrate semantic search using vector embeddings\n3. Enterprise SSO: Support Okta, Auth0, and Azure AD\n\nKey trade-offs:\n- Delayed Android tablet support to prioritize iPhone parity\n- Chose React Native over Flutter due to team expertise\n- SSO implementation blocks v2.0 launch by 3 weeks`,
      },
      {
        id: 'doc_002',
        title: 'Tech Stack Decision Log - Feb 2025',
        content: `React vs Vue Debate (Resolved 2025-02-12):\n\nDecision: React\n\nRationale:\n- 4/5 senior engineers have production React experience\n- Better ecosystem for mobile (React Native)\n- Hiring pool 2x larger (per LinkedIn data)\n\nDissent (from @alice): Vue has better DX for junior devs\nCounter: Training cost < hiring risk in current market\n\nDependencies: This assumes Series B closes Q3 2025 (approved headcount: 8 engineers)`,
      },
      {
        id: 'doc_003',
        title: 'Payments Architecture RFC',
        content: `Fraud Detection Pipeline:\n\nWe use a 3-tier approach:\n1. Rule-based filtering (blocks 80% of obvious fraud)\n2. ML model (XGBoost, retrained weekly on labeled data)\n3. Manual review queue (human analysts for edge cases)\n\nPerformance:\n- False positive rate: 2.3% (industry avg: 5%)\n- False negative rate: 0.8% (acceptable per CFO)\n\nKnown limitations:\n- Model doesn't handle cryptocurrency transactions well\n- Manual queue SLA is 4 hours (compliance requires <2 hours)`,
      },
    ];
    
    // Endpoint: Query knowledge base
    app.post('/api/query', async (req, res) => {
      const { question, max_tokens = 2000 } = req.body;
    
      if (!question) {
        return res.status(400).json({ error: 'Missing required field: question' });
      }
    
      try {
        // Step 1: Retrieve relevant documents
        const { documents, totalTokens, coverage } = processor.prepareContext(
          knowledgeBase,
          question,
          100000 // Use up to 100K tokens for context (well under 1M limit)
        );
    
        if (documents.length === 0) {
          return res.json({
            answer: "I couldn't find relevant information in the knowledge base for that question.",
            sources: [],
            cost: null,
          });
        }
    
        // Step 2: Build prompt
        const systemPrompt = `You are an organizational knowledge assistant. Your job is to answer questions based ONLY on the provided internal documents. If the documents don't contain enough information, say so explicitly.
    
    When answering:
    - Cite specific documents by title
    - Highlight trade-offs or caveats mentioned in the source material
    - Flag outdated assumptions (e.g., "This decision assumed Series B by Q3 2025")`;
    
        const contextBlock = documents.map(doc => 
          `[Document: ${doc.title}]\n${doc.content}`
        ).join('\n\n---\n\n');
    
        const userPrompt = `Context:\n${contextBlock}\n\nQuestion: ${question}`;
    
        const messages = [
          { role: 'system', content: systemPrompt },
          { role: 'user', content: userPrompt },
        ];
    
        // Step 3: Call Grok API
        const startTime = Date.now();
        const answer = await grok.complete(messages, { max_tokens });
        const latency = Date.now() - startTime;
    
        // Step 4: Estimate cost
        const inputTokens = processor.estimateTokens(systemPrompt + userPrompt);
        const outputTokens = processor.estimateTokens(answer);
        const cost = grok.calculateCost(inputTokens, outputTokens);
    
        res.json({
          answer,
          sources: documents.map(d => ({ id: d.id, title: d.title })),
          metadata: {
            latency_ms: latency,
            context_coverage: coverage,
            input_tokens: inputTokens,
            output_tokens: outputTokens,
            cost_usd: cost.totalCost,
          },
        });
    
      } catch (error) {
        console.error('Query failed:', error);
        res.status(500).json({ error: error.message });
      }
    });
    
    // Endpoint: Health check
    app.get('/health', (req, res) => {
      res.json({ status: 'ok', model: 'grok-4.3-2025-12' });
    });
    
    const PORT = process.env.PORT || 3000;
    app.listen(PORT, () => {
      console.log(`Knowledge API running on port ${PORT}`);
      console.log(`Test with: curl -X POST http://localhost:${PORT}/api/query -H "Content-Type: application/json" -d '{"question":"Why did we choose React over Vue?"}'`);
    });
    

    Step 5: Test the API

    Start the server:

    node server.js
    

    Test query:

    curl -X POST http://localhost:3000/api/query \
      -H "Content-Type: application/json" \
      -d '{
        "question": "What were the trade-offs when we chose React over Vue?"
      }'
    

    Expected response:

    {
      "answer": "According to the Tech Stack Decision Log from February 2025, the team chose React over Vue with the following rationale:\n\n**Primary factors:**\n- Team expertise: 4 out of 5 senior engineers had production React experience\n- Mobile strategy: Better ecosystem support via React Native\n- Hiring advantage: LinkedIn data showed the React talent pool was 2x larger\n\n**Trade-off acknowledged:**\nAlice dissented, noting that Vue offers better developer experience for junior engineers. However, the team concluded that training costs were lower than hiring risks in the current market.\n\n**Key assumption:**\nThis decision depended on the Series B closing in Q3 2025, which would approve headcount for 8 additional engineers. If funding changes, this calculus might need revisiting.",
      "sources": [
        { "id": "doc_002", "title": "Tech Stack Decision Log - Feb 2025" }
      ],
      "metadata": {
        "latency_ms": 1247,
        "context_coverage": "1/3 documents",
        "input_tokens": 487,
        "output_tokens": 183,
        "cost_usd": "0.0011"
      }
    }
    

    Step 6: Add Prompt Versioning (A/B Testing)

    Create lib/promptTemplates.js:

    const PROMPT_VERSIONS = {
      v1_standard: {
        system: `You are an organizational knowledge assistant. Answer questions based on provided documents.`,
        format: (context, question) => 
          `Context:\n${context}\n\nQuestion: ${question}`,
      },
      
      v2_detailed: {
        system: `You are an organizational knowledge assistant. Your job is to answer questions based ONLY on the provided internal documents. If the documents don't contain enough information, say so explicitly.
    
    When answering:
    - Cite specific documents by title
    - Highlight trade-offs or caveats mentioned in the source material
    - Flag outdated assumptions (e.g., "This decision assumed Series B by Q3 2025")`,
        format: (context, question) => 
          `Context:\n${context}\n\nQuestion: ${question}`,
      },
    
      v3_socratic: {
        system: `You are an organizational knowledge assistant trained to surface decision context. Don't just answer questions—explain the "why" behind decisions, identify assumptions, and flag risks.`,
        format: (context, question) =>
          `INTERNAL DOCUMENTS:\n${context}\n\n---\n\nQUESTION: ${question}\n\nProvide:\n1. Direct answer\n2. Key assumptions in source material\n3. Risks if assumptions changed`,
      },
    };
    
    function getPrompt(version, context, question) {
      const template = PROMPT_VERSIONS[version] || PROMPT_VERSIONS.v2_detailed;
      return {
        system: template.system,
        user: template.format(context, question),
      };
    }
    
    module.exports = { PROMPT_VERSIONS, getPrompt };
    

    Update server.js to support versioning:

    const { getPrompt } = require('./lib/promptTemplates');
    
    app.post('/api/query', async (req, res) => {
      const { 
        question, 
        max_tokens = 2000,
        prompt_version = 'v2_detailed', // Default to v2
      } = req.body;
    
      // ... (document retrieval stays the same)
    
      // Use versioned prompt
      const prompt = getPrompt(prompt_version, contextBlock, question);
      const messages = [
        { role: 'system', content: prompt.system },
        { role: 'user', content: prompt.user },
      ];
    
      // ... (rest of implementation)
    });
    

    Now test different prompts:

    # Test v3_socratic (surfaces assumptions)
    curl -X POST http://localhost:3000/api/query \
      -H "Content-Type: application/json" \
      -d '{
        "question": "Why did we choose React?",
        "prompt_version": "v3_socratic"
      }'
    

    Step 7: Add Rate Limiting and Monitoring

    Install dependencies:

    npm install express-rate-limit winston
    

    Create lib/logger.js:

    const winston = require('winston');
    
    const logger = winston.createLogger({
      level: 'info',
      format: winston.format.json(),
      transports: [
        new winston.transports.File({ filename: 'error.log', level: 'error' }),
        new winston.transports.File({ filename: 'combined.log' }),
      ],
    });
    
    if (process.env.NODE_ENV !== 'production') {
      logger.add(new winston.transports.Console({
        format: winston.format.simple(),
      }));
    }
    
    module.exports = logger;
    

    Update server.js:

    const rateLimit = require('express-rate-limit');
    const logger = require('./lib/logger');
    
    // Rate limiting: 10 requests per minute per IP
    const limiter = rateLimit({
      windowMs: 60 * 1000,
      max: 10,
      message: { error: 'Too many requests, please try again later.' },
    });
    
    app.use('/api/', limiter);
    
    // Update query endpoint to log metrics
    app.post('/api/query', async (req, res) => {
      const requestId = `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
      
      logger.info('Query received', {
        requestId,
        question: req.body.question,
        prompt_version: req.body.prompt_version || 'v2_detailed',
      });
    
      try {
        // ... (existing logic)
    
        logger.info('Query completed', {
          requestId,
          latency_ms: latency,
          input_tokens: inputTokens,
          output_tokens: outputTokens,
          cost_usd: cost.totalCost,
        });
    
        res.json({ answer, sources, metadata });
    
      } catch (error) {
        logger.error('Query failed', { requestId, error: error.message });
        res.status(500).json({ error: error.message });
      }
    });
    

    Step 8: Deploy Considerations

    Environment Variables (add to .env):

    NODE_ENV=production
    XAI_API_KEY=your_api_key
    PORT=3000
    RATE_LIMIT_WINDOW_MS=60000
    RATE_LIMIT_MAX_REQUESTS=10
    

    Production Checklist:

    1. Authentication: Add JWT/API key middleware const authenticate = (req, res, next) => { const apiKey = req.headers['x-api-key']; if (apiKey !== process.env.INTERNAL_API_KEY) { return res.status(401).json({ error: 'Unauthorized' }); } next(); }; app.use('/api/', authenticate);
    2. Vector Database: Replace mock knowledgeBase with Pinecone/Weaviate for semantic search
    3. Streaming: For long responses, use Grok’s streaming mode: const stream = await grok.complete(messages, { stream: true }); stream.on('data', chunk => { res.write(`data: ${chunk}\n\n`); });
    4. Caching: Use Redis to cache frequent queries
    5. Monitoring: Integrate Datadog/Sentry for error tracking

    Deployment Example (Docker)

    Create Dockerfile:

    FROM node:18-alpine
    
    WORKDIR /app
    
    COPY package*.json ./
    RUN npm ci --only=production
    
    COPY . .
    
    EXPOSE 3000
    
    CMD ["node", "server.js"]
    

    Build and run:

    docker build -t grok-knowledge-api .
    docker run -p 3000:3000 --env-file .env grok-knowledge-api
    

    Key Takeaways

    1. APIs unlock composability: You can’t version-control chat conversations or A/B test prompts in a GUI
    2. Cost scales sub-linearly: Batch processing + caching means marginal cost per query drops as volume increases
    3. Observability is table stakes: Log every request/response pair for debugging, compliance, and model drift detection
    4. Prompt engineering is software engineering: Treat prompts as code—version them, test them, deploy them through CI/CD

    Grok 4.3’s aggressive pricing ($1.25/$2.50 per M tokens) makes API-first architectures economically viable for mid-market teams. The chat interface is where you prototype. The API is where you productionize cognition.


    Next Steps

    • Week 1: Deploy this knowledge API internally, seed with 10-20 key docs
    • Week 2: Instrument with analytics—track query patterns, identify gaps in knowledge base
    • Week 3: Integrate with Slack (answer questions inline) or email (automated weekly summaries)
    • Month 2: Expand to multi-modal (voice queries via TTS, image-based documentation)

    The organizations that win in the cognition economy won’t be the ones with the best models—they’ll be the ones who operationalize intelligence fastest. Stop chatting with AI. Start building with it.


  • LangChain 2026 Lab

    From Prompt Wrapper to Agent Engineering Platform

    This lab demonstrates the core concepts of LangChain as an agent engineering platform, including model-agnostic orchestration, custom tool creation, memory management, and the agent execution loop.

    The research_agent.py code is available here: 
    https://github.com/computationalknowledge/langchain/blob/main/research_agent.py
    
    The full lab with work steps is available here: 
    https://docs.google.com/document/d/1Gves-hIagKFEgtEMjs2tzwMpMSOTUiD-thSkem4EnZ0/edit?usp=sharing

    • How to initialize models with the unified LangChain interface
    • How to integrate external tool libraries
    • How to implement conversational memory
    • How to run the agent loop and inspect tool calls
    • The difference between declarative tool definition and imperative programming

    Prerequisites

    • Python 3.10 or higher
    • OpenAI API key (or Anthropic API key if using Claude)
    • Basic familiarity with Python and async/await

    LangChain 2026: From Prompt Wrapper to Agent Engineering Platform

    A Technical Deep-Dive with Hands-On Lab

    If you’re building with LangChain in 2026, you’re engineering production-grade autonomous systems that reason, recover from failures, manage state across database checkpoints, and execute multi-step workflows with the reliability of traditional software.

    This is not a minor version upgrade. This is a categorical shift.

    The framework has evolved from a convenience layer for LLM calls into a comprehensive agent orchestration platform with first-class support for:

    • Model-agnostic orchestration across 100+ providers
    • First-class tools and toolkits with parallel execution and retry logic
    • Dual-layer memory systems (short-term conversational context + long-term compressed knowledge stores)
    • Agentic RAG where the LLM decides when and how to retrieve, not just fetch-then-prompt
    • LangGraph for durable execution with checkpoint/resume semantics
    • Sandboxed agent deployment with pluggable remote/local/virtual execution backends
    • Production checklists for compliance, resilience, and observability

    If you’re still thinking of LangChain as “a library for calling OpenAI,” you’re operating with a 2023 mental model in a 2026 landscape. Let’s fix that.


    1. The Agent Loop: Call, Select, Execute

    At its heart, every LangChain agent implements a three-stage loop:

    1. CALL MODEL → Model reasons about the current state
    2. SELECT TOOL → Model decides which action to take  
    3. EXECUTE ACTION → Tool runs, result feeds back to step 1

    This loop continues until the objective is complete. The model isn’t just generating text—it’s making decisions about which tools to invoke, when to stop, and how to handle failures.

    1. Model-Agnostic Unification

    The langchain-core library provides a unified interface for:

    • OpenAI (GPT-4, GPT-3.5)
    • Anthropic (Claude Opus, Sonnet, Haiku)
    • Google (Gemini)
    • Open-source models (Llama, Mistral, etc.)

    Why this matters: Your agent logic doesn’t couple to a specific vendor. You define tools once, swap models with a configuration change. Same code, different inference backend.

    # Same interface, different providers
    from langchain_openai import ChatOpenAI
    from langchain_anthropic import ChatAnthropic
    
    model = ChatOpenAI(model="gpt-4")  # or
    model = ChatAnthropic(model="claude-opus-4")

    Dynamic routing and multi-provider failover become trivial when the abstraction is clean.

    1. Tools: The Hands of the Agent

    Tools are Python functions exposed to the LLM via structured schemas. The @tool decorator automatically:

    • Extracts argument metadata from type hints and docstrings
    • Generates JSON Schema for the model
    • Handles parallel tool calls with exponential backoff retries
    • Persists call history for debugging

    External capabilities library: LangChain maintains integrations for Wikipedia, SQL databases, code execution sandboxes, math engines, web search—over 160 pre-built tools.

    Key architectural decision: Tools are declared, not invoked. You hand the model a toolkit. The model decides what to call and when. This is the difference between RPA (you script the workflow) and agentic AI (the model scripts the workflow).

    Short-term memory: The MessagesState object maintains conversational continuity and active planning context. This is the chat history you see in the UI.

    Long-term memory: Postgres/embedding stores provide semantic search over compressed historical interactions. The agent can “remember” facts from 1,000 previous conversations without blowing up the context window.

    Production pattern: Use middleware (PIIMiddleware, SummarizationMiddleware) to cap LLM context costs while preserving coherent agent responses across long interactions.

    1. Agentic RAG: Dynamic Retrieval Under Agent Control

    Traditional RAG (Retrieval-Augmented Generation):

    Query → Retrieve Docs (k=3) → Prompt LLM → Answer

    Agentic RAG:

    Query → LLM decides:
      - Should I retrieve? (or answer from memory?)
      - Which retriever? (vector DB, SQL, web search?)
      - How many docs? (k=1 for precision, k=10 for coverage?)
      - Evaluate retrieved docs → refine query → retrieve again?

    The LLM orchestrates retrieval as a tool. Higher latency, deeper reasoning, better results for complex queries.

    1. LangGraph: Durable Execution with Checkpointing

    LangGraph wraps agent workflows in a state machine where every step is saved to a database. If the agent crashes at Node B:

    Execution: Node A → Node B → [CRASH]
    Recovery:  Skip to Node B (state restored from checkpoint) → Node C

    This is time travel for AI workflows. You can:

    • Resume from failures without re-running expensive LLM calls
    • Implement human-in-the-loop approval gates (execution pauses, waits for input, resumes)
    • Replay and debug agent decisions from production logs

    Production requirement: If you’re running code-executing agents or multi-hour research workflows, you MUST use LangGraph with PostgresSaver checkpoints. Non-negotiable.

    1. Sandboxed Execution: Deep Agents

    Never run code-executing agents on bare metal. The default in 2026 is Deep Agents remote sandboxing:

    • Pluggable backends (local Docker, remote VMs, virtual filesystems)
    • Per-user thread isolation with role-based access control (RBAC)
    • Network egress policies and syscall filtering

    Your agent can execute arbitrary Python, but it runs in a locked-down container with no access to your production database or internal networks.


    The 2026 Production Checklist

    If you’re deploying LangChain agents to production, validate these four pillars:

    1. Control Context
      Implement PIIMiddleware for compliance. Use summarization middleware to cap LLM context costs. Monitor token usage per session.
    2. Optimize Execution
      Use .stream(stream_mode='values') for zero-latency UX. Batch non-interactive queries to reduce compute.
    3. Sandbox Actions
      Never run code-executing agents on bare metal. Default to Deep Agents remote sandboxing with role-based access control.
    4. Ensure Resilience
      Wrap all critical workflows in LangGraph with PostgresSaver checkpoints. State recovery should be automatic, not manual.

    Hands-On Lab: Build Your First Production-Grade Agent

    Objective: Build a research assistant agent that can perform calculations, search Wikipedia, and maintain conversation memory. You’ll see model-agnostic initialization, tool declaration, memory integration, and the agent execution loop in action.

    Time: 30-45 minutes
    Prerequisites: Python 3.10+, basic familiarity with async/await
    What You’ll Learn:

    • How to initialize models with the unified interface
    • How to create custom tools with type-safe schemas
    • How to integrate external tool libraries (Wikipedia)
    • How to implement conversational memory
    • How to run the agent loop and inspect tool calls

    Step 1: Environment Setup

    Create a new directory and virtual environment:

    mkdir langchain_agent_lab
    cd langchain_agent_lab
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate

    Install dependencies:

    pip install --break-system-packages langchain langchain-core langchain-community langchain-openai wikipedia-api python-dotenv

    Note: We’re using langchain-openai, but you could swap to langchain-anthropic or langchain-google-genai with zero code changes to the agent logic. That’s the model-agnostic unification in action.


    Step 2: Configure API Keys

    Create a .env file in your project directory:

    # .env
    OPENAI_API_KEY=your_openai_key_here

    If you’re using Anthropic Claude instead:

    ANTHROPIC_API_KEY=your_anthropic_key_here

    Important: Never commit API keys to version control. Add .env to your .gitignore.


    Step 3: Build the Agent (Complete Code)

    Create research_agent.py:

    """
    LangChain 2026 Research Agent Lab
    Demonstrates: Model-agnostic init, custom tools, external tools, memory, agent loop
    """
    
    import os
    from typing import Annotated
    from dotenv import load_dotenv
    
    # Core LangChain imports
    from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
    from langchain_core.tools import tool
    from langchain_community.tools import WikipediaQueryRun
    from langchain_community.utilities import WikipediaAPIWrapper
    
    # Model provider (swap to langchain_anthropic for Claude)
    from langchain_openai import ChatOpenAI
    
    # Agent framework
    from langchain.agents import create_tool_calling_agent, AgentExecutor
    from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
    
    # Load environment variables
    load_dotenv()
    
    
    # =============================================================================
    # STEP 1: CUSTOM TOOLS
    # =============================================================================
    
    @tool
    def calculate(expression: str) -> str:
        """
        Evaluates a mathematical expression and returns the result.
    
        Use this for any arithmetic, algebra, or mathematical computations.
    
        Args:
            expression: A valid Python mathematical expression (e.g., "210", "sqrt(144)")
    
        Returns:
            The computed result as a string
    
        Examples:
            - "2 + 2" → "4"
            - "10 * 5" → "50"
            - "216" → "65536"
        """
        try:
            # Safe evaluation for math expressions
            import math
            # Add safe math functions to namespace
            safe_namespace = {
                "__builtins__": {},
                "abs": abs, "round": round, "pow": pow,
                "sqrt": math.sqrt, "sin": math.sin, "cos": math.cos,
                "log": math.log, "exp": math.exp, "pi": math.pi
            }
            result = eval(expression, safe_namespace)
            return str(result)
        except Exception as e:
            return f"Error evaluating expression: {str(e)}"
    
    
    @tool
    def create_research_note(topic: str, key_facts: str) -> str:
        """
        Saves a research note about a topic to the agent's long-term memory.
    
        Use this to persist important information discovered during research.
    
        Args:
            topic: The subject of the research note
            key_facts: The important facts to remember
    
        Returns:
            Confirmation message
        """
        # In production, this would write to a vector database
        # For the lab, we'll just acknowledge the save
        return f"✓ Research note saved for '{topic}'. Key facts stored in long-term memory."
    
    
    # =============================================================================
    # STEP 2: EXTERNAL TOOL INTEGRATION (Wikipedia)
    # =============================================================================
    
    # Initialize Wikipedia tool
    wikipedia_api = WikipediaAPIWrapper(
        top_k_results=2,  # Return top 2 search results
        doc_content_chars_max=500  # Limit content length
    )
    wikipedia_tool = WikipediaQueryRun(api_wrapper=wikipedia_api)
    
    # Assemble toolkit
    tools = [calculate, create_research_note, wikipedia_tool]
    
    
    # =============================================================================
    # STEP 3: MODEL INITIALIZATION (Model-Agnostic)
    # =============================================================================
    
    def initialize_model(provider="openai", model_name=None):
        """
        Initialize a model with the unified LangChain interface.
    
        This demonstrates model-agnostic orchestration. Same agent logic
        works with OpenAI, Anthropic, Google, or open-source models.
        """
        if provider == "openai":
            model_name = model_name or "gpt-4"
            return ChatOpenAI(
                model=model_name,
                temperature=0,  # Deterministic for consistent tool use
                streaming=True
            )
        elif provider == "anthropic":
            # Uncomment if using Anthropic:
            # from langchain_anthropic import ChatAnthropic
            # model_name = model_name or "claude-opus-4"
            # return ChatAnthropic(model=model_name, temperature=0)
            raise NotImplementedError("Install langchain-anthropic to use this provider")
        else:
            raise ValueError(f"Unknown provider: {provider}")
    
    
    # =============================================================================
    # STEP 4: AGENT CONFIGURATION WITH MEMORY
    # =============================================================================
    
    # System prompt defines agent behavior
    SYSTEM_PROMPT = """You are a research assistant with access to calculation tools and Wikipedia.
    
    Your capabilities:
    - Perform mathematical calculations using the 'calculate' tool
    - Search Wikipedia for factual information using the 'Wikipedia' tool  
    - Save important research findings using 'create_research_note'
    
    Guidelines:
    - Always think step-by-step before acting
    - Use tools when appropriate rather than guessing
    - Cite sources when referencing Wikipedia information
    - Be concise but thorough in your responses
    
    Remember: You're helping users learn and research. Be accurate, helpful, and educational."""
    
    # Create prompt template with memory placeholder
    prompt = ChatPromptTemplate.from_messages([
        ("system", SYSTEM_PROMPT),
        MessagesPlaceholder(variable_name="chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ])
    
    
    # =============================================================================
    # STEP 5: AGENT CONSTRUCTION AND EXECUTION
    # =============================================================================
    
    def create_research_agent():
        """Build the agent with model, tools, and memory."""
    
        # Initialize model (swap provider here for different models)
        model = initialize_model(provider="openai", model_name="gpt-4")
    
        # Create the agent
        agent = create_tool_calling_agent(
            llm=model,
            tools=tools,
            prompt=prompt
        )
    
        # Wrap in executor for handling the execution loop
        agent_executor = AgentExecutor(
            agent=agent,
            tools=tools,
            verbose=True,  # Shows tool calls and reasoning
            handle_parsing_errors=True,
            max_iterations=10  # Prevent infinite loops
        )
    
        return agent_executor
    
    
    def run_interactive_session():
        """
        Run an interactive session with the agent.
        Demonstrates conversational memory and multi-turn interactions.
        """
        print("=" * 70)
        print("🤖 LangChain 2026 Research Agent Lab")
        print("=" * 70)
        print("\nInitializing agent with tools: calculate, Wikipedia, research notes")
        print("Type 'quit' to exit\n")
    
        agent = create_research_agent()
        chat_history = []  # Stores conversation history (memory)
    
        while True:
            user_input = input("\n👤 You: ").strip()
    
            if user_input.lower() in ['quit', 'exit', 'q']:
                print("\n✨ Session ended. Check the output above to see tool calls!\n")
                break
    
            if not user_input:
                continue
    
            print("\n🔧 Agent thinking...\n")
    
            try:
                # Invoke agent with input and memory
                response = agent.invoke({
                    "input": user_input,
                    "chat_history": chat_history
                })
    
                # Extract response
                output = response.get("output", "No response generated.")
    
                print(f"\n🤖 Agent: {output}\n")
                print("-" * 70)
    
                # Update memory
                chat_history.append(HumanMessage(content=user_input))
                chat_history.append(AIMessage(content=output))
    
            except Exception as e:
                print(f"\n❌ Error: {str(e)}\n")
    
    
    def run_example_queries():
        """
        Run pre-defined queries to demonstrate capabilities.
        Use this to see the agent in action without manual input.
        """
        print("=" * 70)
        print("🧪 Running Example Queries")
        print("=" * 70)
    
        agent = create_research_agent()
        chat_history = []
    
        # Example queries that demonstrate different tools
        queries = [
            "What is 2 to the power of 16?",
            "Who was Alan Turing and what did he contribute to computer science?",
            "Create a research note about Alan Turing: British mathematician, Turing machine, Enigma code-breaker, father of computer science",
            "What's the square root of 144 multiplied by 5?"
        ]
    
        for i, query in enumerate(queries, 1):
            print(f"\n\n{'='*70}")
            print(f"Query {i}: {query}")
            print('='*70)
    
            try:
                response = agent.invoke({
                    "input": query,
                    "chat_history": chat_history
                })
    
                output = response.get("output", "No response")
                print(f"\n🤖 Response: {output}\n")
    
                # Update memory
                chat_history.append(HumanMessage(content=query))
                chat_history.append(AIMessage(content=output))
    
            except Exception as e:
                print(f"\n❌ Error: {str(e)}")
    
    
    # =============================================================================
    # MAIN EXECUTION
    # =============================================================================
    
    if __name__ == "__main__":
        import sys
    
        # Check for API key
        if not os.getenv("OPENAI_API_KEY"):
            print("❌ Error: OPENAI_API_KEY not found in environment")
            print("Create a .env file with: OPENAI_API_KEY=your_key_here")
            sys.exit(1)
    
        # Choose mode
        print("\nSelect mode:")
        print("1. Interactive session (manual input)")
        print("2. Example queries (automated demo)")
    
        choice = input("\nEnter 1 or 2: ").strip()
    
        if choice == "1":
            run_interactive_session()
        elif choice == "2":
            run_example_queries()
        else:
            print("Invalid choice. Running example queries...")
            run_example_queries()

    Step 4: Run the Agent

    Execute the script:

    python research_agent.py

    You’ll be prompted to choose between interactive mode (where you type queries) or example mode (automated demo).


    Step 5: Understanding the Output

    When you run the agent, watch for these key events:

    1. Tool Selection
    > Entering new AgentExecutor chain...
    Invoking: `calculate` with `{'expression': '216'}`

    The model decided to use the calculate tool. You didn’t write an if-statement. The model chose the action.

    1. Tool Execution
    65536

    The tool executed and returned a result.

    1. Agent Reasoning
    The result of 2 to the power of 16 is 65536.

    The model incorporated the tool result into its response.

    1. Multi-Step Workflows
      For complex queries, you’ll see multiple tool calls in sequence:
    Invoking: `Wikipedia` with `{'query': 'Alan Turing'}`
    Invoking: `create_research_note` with `{'topic': 'Alan Turing', ...}`

    This is the agent loop in action: call model → select tool → execute → repeat until done.


    Once the basic agent works, try these modifications:

    Extension 1: Swap Models (Model-Agnostic Orchestration)

    Install Anthropic SDK:

    pip install langchain-anthropic

    Modify research_agent.py:

    # Change this line in run_interactive_session() or run_example_queries():
    agent = initialize_model(provider="anthropic", model_name="claude-sonnet-4")

    Same agent logic. Different model. Zero changes to tools or prompts. This is the power of unified interfaces.

    Extension 2: Add Custom Tools

    Create a new tool for web search:

    @tool
    def web_search(query: str) -> str:
        """
        Searches the web for current information.
        Use this when Wikipedia doesn't have the answer or you need recent data.
        """
        # Implementation: DuckDuckGo, Bing API, etc.
        return "Web search results would appear here"

    Add it to the tools list. The agent automatically learns how to use it from the docstring.

    Extension 3: Implement Long-Term Memory with Vector DB

    from langchain.vectorstores import Chroma
    from langchain.embeddings import OpenAIEmbeddings
    
    # Initialize vector store
    embeddings = OpenAIEmbeddings()
    vectorstore = Chroma(embedding_function=embeddings, persist_directory="./agent_memory")
    
    @tool
    def search_memory(query: str) -> str:
        """Search the agent's long-term memory for relevant past conversations."""
        docs = vectorstore.similarity_search(query, k=3)
        return "\n".join([doc.page_content for doc in docs])

    Now your agent has semantic recall over all previous conversations. This is how production agents handle context that spans weeks or months.

    Extension 4: Add LangGraph for Durable Execution

    from langgraph.graph import StateGraph
    from langgraph.checkpoint.postgres import PostgresSaver
    
    # Define state schema
    class ResearchState(TypedDict):
        input: str
        chat_history: list
        agent_outcome: Any
    
    # Build graph
    workflow = StateGraph(ResearchState)
    workflow.add_node("agent", run_agent_node)
    workflow.add_node("tools", run_tools_node)
    # ... add edges, checkpointing
    
    # Compile with checkpointing
    checkpointer = PostgresSaver.from_conn_string("postgresql://localhost/agent_checkpoints")
    app = workflow.compile(checkpointer=checkpointer)

    Now if your agent crashes during a multi-hour research task, it resumes from the last checkpoint. No wasted LLM calls. No lost progress.


    Key Takeaways: The Agent Engineering Mindset

    After completing this lab, you should understand:

    1. Declarative tool definition – You describe capabilities, the model figures out when to use them
    2. Model-agnostic architecture – Same code, different providers, zero refactoring
    3. Memory is dual-layer – Short-term (chat history) + long-term (vector stores)
    4. The agent loop is autonomous – The framework orchestrates, you don’t write the control flow
    5. Production means resilience – Checkpointing, sandboxing, and middleware aren’t optional

    The shift from 2023 to 2026: You’re not building prompt wrappers anymore. You’re engineering production-grade autonomous systems with reliability patterns borrowed from distributed systems (checkpointing, state machines, retries) and safety patterns borrowed from containerization (sandboxing, RBAC, network policies).

    If you treat LangChain as a library for calling OpenAI, you’re missing the entire architecture. It’s an agent orchestration platform. Master it, and you’re building the infrastructure for reliable AI-driven workflows at scale.


    Next Steps

    • Read the LangGraph documentation – This is where production-grade agentic workflows live
    • Explore LangSmith – Observability and debugging for agent workflows (trace every tool call, inspect model reasoning, replay production failures)
    • Study the agent framework ecosystem – Compare LangChain vs Anthropic Claude SDK vs OpenAI function calling. Each has strengths. Choose based on your constraints (model support, sandboxing requirements, license)
    • Build something real – The best learning is shipping. Take this lab agent and extend it into a research assistant for your domain (legal documents, medical literature, code documentation, etc.)

    About This Series
    AI with Peter explores the architecture of AI systems—not just what they do, but how they’re built, why they’re designed that way, and what it means for developers who need to ship reliable intelligent systems. Subscribe for weekly deep-dives on frameworks, patterns, and production war stories.

    We advocate for rigorous technical education that builds genuine competence, not credentialed compliance. When institutions fail to teach systems thinking, we teach the alternative path: learn by building, validate through shipping, and never mistake a certificate for understanding.


    Ready to go deeper? The full LangChain 2026 documentation and reference architecture diagrams are available at docs.langchain.com.

  • The Real Economics of Agentic AI: What I Learned About Search, Browser Automation, and Cost


    Most people still talk about AI as if the main question is:

    “Can it do the task?”

    That is no longer the most important question.

    A much more useful question is:

    What is the cost structure of asking AI to do the task this way?

    That is where serious AI work begins.

    I’ve been using AI agents to do competitive analysis of how colleges are integrating software engineering principles into building inference-oriented mobile and I of T applications. AI assisted tooling such as Google Notebook LM gives learners their own experiential learning platforms which simulate on the job workflows in ways never before attainable. Learners can experience delivering real-world level solutions with immediate feedback and an elevation of their comprehension customized to their individual and cognitive and learning styles. Confusion and cognitive drag related to the mediation layer in text books has been flatten to zero.

    This has taught me something important:

    AI is not just an intelligence layer. It is an economic layer.

    And if you do not understand the economics, you do not really understand the workflow.

    The shift from “chatbot thinking” to “workflow thinking”

    A lot of people still approach AI tools the way they approached early chat systems:

    • ask a question
    • get an answer
    • move on

    But once you start using agents for real work — market scanning, research, comparison, structured analysis, workflow support — you are no longer just chatting.

    You are designing processes.

    And processes have:

    • cost
    • speed
    • failure modes
    • architecture decisions
    • tradeoffs

    This is especially true when you start using tools that can:

    • search the web
    • browse live sites
    • navigate pages
    • extract structured information
    • synthesize results

    At that point, the prompt is no longer just a prompt.

    It becomes a work instruction.

    And work instructions have economics.

    One of the biggest lessons: not all AI actions cost the same

    A very important distinction emerged in my experimentation:

    There is a major difference between:

    1. Search-index style retrieval

    This is when the system uses its own search infrastructure to discover material broadly.

    And:

    2. Direct browser navigation

    This is when the system goes directly to known websites, clicks, reads pages, and extracts information the way a human operator would.

    From a user perspective, both may feel like “the AI is researching.”

    But operationally, they are not the same thing.

    The lesson is:

    Broad search and direct site navigation are different economic behaviors.

    That means you should not casually design a workflow as if all “AI research” is equivalent.

    It is not.

    The hidden trap: asking one agent to do everything

    One of the easiest mistakes to make is to build a single giant agent workflow that tries to do all of the following at once:

    • find the sources
    • search the market
    • deduplicate
    • rank opportunities
    • research companies
    • discover contacts
    • draft messages
    • explain the results

    At first glance, that sounds efficient.

    In practice, it is often the exact opposite.

    Why?

    Because the broader and fuzzier the task becomes, the more the system is forced into expensive, multi-stage behavior.

    This is the point where many people accidentally turn AI from a useful assistant into an uncontrolled cost center.

    The better pattern is usually:

    Bound the search → shortlist → deepen only on the winners

    That is not just good prompt design.

    That is good economics.

    A curriculum-research example

    Let’s say you are doing a competitive analysis of how colleges are teaching:

    • web development
    • mobile development
    • AI-assisted software workflows
    • modern toolchains shaped by inference-capable systems

    There are at least two ways to ask for that.

    Expensive and naive

    “Find how colleges across Canada and the U.S. are teaching web and mobile development in the age of AI.”

    This sounds impressive.

    But it is vague, broad, and computationally undisciplined.

    It invites:

    • too many schools
    • too many irrelevant pages
    • too much crawling
    • too much synthesis too early

    Economically sane

    “Review a limited set of target institutions. Extract evidence of whether web and mobile curricula explicitly reflect AI-assisted workflows, toolchain modernization, or inference-era development changes. Return only the top findings.”

    Now the system has:

    • bounded sources
    • a narrower mission
    • a more disciplined output structure

    That is the difference between:

    • asking for “AI magic”
    • and designing a usable research workflow

    This is exactly the kind of thinking institutions, businesses, and training organizations need much more of.

    Why this matters for organizations

    A lot of leaders are still being sold a fantasy version of AI:

    • faster
    • cheaper
    • smarter
    • automatic

    But real organizational AI adoption has to deal with:

    • task architecture
    • cost discipline
    • governance
    • hidden dependencies
    • manual vs automated division of labor
    • process design

    That is why I increasingly believe one of the emerging professional skills is:

    By that I mean the ability to decide:

    • what should be searched broadly
    • what should be browsed directly
    • what should be staged
    • what should remain manual
    • where the expensive steps actually are
    • how to reduce waste without destroying value

    That is a real capability.

    And it will matter more and more as organizations move from “trying AI” to operating with AI.

    The important operational distinction

    One of the most practical lessons I’ve locked in is this:

    If an agent can work from known sites directly, that is often far more efficient than asking it to discover everything through broad search.

    In other words:

    Better for cost control

    • go directly to known job boards
    • go directly to known employer pages
    • go directly to known institutional sites
    • read and extract from those pages

    More expensive

    • open-ended “search the web for me” behavior
    • broad discovery across many unknown sources
    • forcing the system to do search, browse, compare, and synthesize all in one pass

    That does not mean search is bad.

    It means:

    search should be used deliberately, not lazily

    The strategic workflow pattern I now recommend

    For serious work, I recommend separating AI tasks into stages.

    Stage A — cheap discovery

    Use bounded prompts to gather:

    • top candidate sources
    • top current opportunities
    • top institutions to inspect
    • top documents worth deeper review

    Keep the output compact.

    Stage B — selective deeper work

    Only after you have a shortlist should you ask the system to:

    • compare
    • interpret
    • research context
    • identify implications
    • support messaging or recommendations

    This dramatically improves the economics of the workflow.

    It also improves quality.

    Why?

    Because most of the world is noise.

    The expensive part of AI should be reserved for the small slice that is actually worth serious attention.

    If you are involved in education, training, or curriculum design, this matters immediately.

    We are entering a period where the real differentiator is not simply:

    • using AI
    • mentioning AI
    • adding AI to a slide deck

    It is being able to design economically coherent AI-supported learning systems.

    That includes:

    • knowing which research tasks should be agent-assisted
    • knowing which content tasks should be staged
    • knowing when NotebookLM, retrieval systems, browser automation, or structured search should be used
    • knowing how to build learning workflows that are not just impressive, but sustainable

    That is part of the reason I have been doing this kind of competitive analysis work in the first place.

    The institutions that understand this early will not just save time.

    They will design better systems.

    The deeper lesson

    The deeper lesson here is that AI is beginning to look less like “software you use” and more like:

    an operational resource that must be budgeted, structured, and governed

    That changes the conversation.

    Now the real questions become:

    • What is the workflow?
    • What is the value of the workflow?
    • What is the cost structure of the workflow?
    • What part should be automated?
    • What part should remain human?
    • What part should be staged?

    That is a much more mature conversation than simple AI enthusiasm.

    The future will not be won by the people who shout “AI” the loudest.

    It will be won by the people who learn how to make AI workflows:

    • useful
    • bounded
    • economically sane
    • operationally durable

    That is where real competency begins.

    And that is also where the next generation of consulting, curriculum design, and institutional strategy work will increasingly live.

    Because the question is no longer:

    “Can AI do this?”

    The better question is:

    “Can we design the workflow so the value justifies the spend?”


  • What Perplexity Agent Credits Taught Me About the Real Economics of AI Work

    At first glance, this sounds like a straightforward research task.

    It is not.

    It is actually a perfect example of a much bigger issue that organizations are only beginning to understand:

    AI value is not just about what a model can do. It is about what the workflow costs.

    That is where the real economics of agentic AI begin.

    The hidden lesson most people miss

    A lot of people still think of AI as a chatbot with a monthly subscription.

    That is already too simple.

    Once you begin using agents to research, browse, compare, summarize, and synthesize across many sources, you move into a different world. You are no longer just asking questions. You are designing workflows, and workflows have architecture costs.

    That means the real skill is no longer just prompt writing.

    It is:

    workflow cost design

    Why a college competitive analysis is a perfect example

    Suppose you want to study how schools across Canada and the United States are approaching:

    • web development curricula
    • mobile development curricula
    • AI-assisted coding
    • software engineering education
    • implications of AI inference hardware
    • changing employer expectations for student skills

    A human being can imagine this as one broad question:

    “Go find out how colleges are doing all of this.”

    But an AI agent experiences that as many different tasks:

    • discovering institutions
    • finding current program pages
    • comparing course outlines
    • looking for curriculum shifts
    • extracting themes
    • comparing countries
    • avoiding duplicates
    • identifying which findings actually matter

    That is not one task.

    That is a stack of expensive subtasks.

    The lesson is immediate:

    Broad, fuzzy, open-ended work is expensive work.

    The real operational insight

    If you ask an agent to do all of the following in one run:

    • scan widely
    • crawl deeply
    • deduplicate
    • compare
    • rank
    • summarize
    • produce executive insights

    you are effectively asking for a premium research pipeline, not a simple search.

    That is the first competency organizations need to build:
    they must learn to distinguish between:

    • a cheap discovery scan
    • and an expensive synthesis workflow

    Those are not the same thing.

    The economics of AI are architectural

    This is the point I think many business leaders, educators, and even technically sophisticated professionals are still underestimating.

    When using agent-based AI systems, cost is driven not just by “how much AI you use,” but by:

    • how many sources are searched
    • how broad the problem is framed
    • how much deep browsing is required
    • how many outputs are requested
    • whether the system is asked to compare, rank, or research recursively
    • whether the task is staged or all-in-one

    In other words:

    That is a very different mindset from casual chatbot usage.

    What the college-research example reveals

    Take the curriculum-analysis scenario again.

    There are at least three different ways to architect that research.

    Version 1: expensive and naive

    “Search colleges across Canada and the United States and tell me how they are teaching web development and mobile development in the age of AI inference chips.”

    This sounds intelligent.

    It is actually a cost bomb.

    Why? Because it invites:

    • open-ended crawling
    • vague inclusion criteria
    • lots of irrelevant institutions
    • lots of duplication
    • deep synthesis before the search has even been bounded

    Version 2: bounded and disciplined

    “Search only a selected list of colleges. Compare only current web development and mobile development programs. Return only whether AI-related tooling, deployment, or hardware-awareness appears in the curriculum.”

    This is much better.

    It introduces:

    • source limits
    • scope control
    • output discipline

    Version 3: staged and economically sane

    This is the best model.

    Stage A

    Find the top 10 institutions worth examining.

    Stage B

    Review only those top 10 for:

    • curriculum structure
    • evidence of AI-related adaptation
    • relevant toolchain shifts

    Stage C

    Do deep interpretation only on the top 3–5 most significant examples.

    That is the pattern I increasingly believe organizations need:

    Bound the search → shortlist → deepen only on winners

    That is not just good research design.

    That is good AI economics.

    In business settings, people often assume that once an agent exists, the smart move is to let it do everything.

    That is usually the wrong move.

    The better question is:

    Which parts of the work deserve premium agent execution, and which parts should remain bounded, staged, or even manual?

    In my own work, I have found this distinction extremely useful.

    For example:

    • broad market scanning should be narrow and cheap
    • deep analysis should happen only on shortlisted targets
    • manual intake channels should remain separate if the agent adds little value there

    This is a much more mature operating model than simply saying:
    “Use AI to research everything.”

    Why this matters for colleges, businesses, and consultants

    If you are:

    • a college administrator
    • a curriculum designer
    • a business leader
    • an L&D strategist
    • an AI consultant

    then this matters because agentic AI is beginning to behave like a real operating expense.

    And as soon as something becomes an operating expense, it requires:

    • budgeting
    • governance
    • architecture
    • policy
    • ROI discipline

    That is where the conversation changes.

    The question is no longer:

    “Can AI help us do this?”

    The better question is:

    “Can we structure the workflow so the value exceeds the cost?”

    That is a much more serious question.

    The new professional skill: agentic workflow cost design

    I think this is becoming a real capability area.

    Not just AI usage.
    Not just prompting.
    Not just tool familiarity.

    A more advanced and valuable skill is:

    Agentic workflow cost design

    That means knowing how to:

    • narrow scope
    • break a problem into stages
    • separate discovery from synthesis
    • avoid unnecessary browsing
    • reduce duplication
    • reserve deeper agent work for the highest-value subset
    • control spend without destroying value

    That is a real consulting and advisory competency.

    The practical framework I now recommend

    When approaching research-heavy AI work, I would suggest five design questions:

    1. What is the true decision goal?

    Are you trying to:

    • discover options
    • compare competitors
    • rank alternatives
    • produce executive recommendations

    Those are different tasks and should not be blended thoughtlessly.

    2. What can be narrowed?

    Can you reduce:

    • sources
    • time window
    • geography
    • institution list
    • title clusters
    • output count

    Every reduction helps.

    3. What can be staged?

    What belongs in:

    • Stage A: cheap scan
    • Stage B: shortlist review
    • Stage C: deep analysis

    This is usually the biggest cost lever.

    4. What should remain manual?

    Not everything needs agent work.

    Sometimes:

    • a known email feed
    • a saved search
    • a manually curated source list
      is cheaper and better.

    5. What is the maximum acceptable spend?

    This is the governance question.

    If you cannot answer this, you are not really managing AI operations.
    You are improvising.

    What I think organizations need to learn now

    The organizations that will use AI best are not the ones that shout the loudest about AI.

    They are the ones that learn to ask:

    • What is the workflow?
    • What is the value of this workflow?
    • What is the cost structure?
    • How do we redesign it so the economics make sense?

    That is the beginning of maturity.

    Final thought

    The most important realization from doing serious AI-assisted research is this:

    AI is not just intelligence. It is economics.

    If you are evaluating how colleges teach web and mobile development in the age of AI inference chips, or trying to redesign any other serious information workflow, that lesson matters immediately.

    Because once you understand the economics of agentic work, you stop being dazzled by what AI can do.

    And you start focusing on the thing that actually matters:

    whether the workflow is designed well enough to justify the spend

    That is where competency begins.

  • The Medium Is the Message: Writing Documents for the Age of AI Reasoning

    How Vectorless RAG Changes What Every Business Writer Needs to Know

    Published on AI with Peter · aiwithpeter.com


    “The medium is the message.”
    — Marshall McLuhan, Understanding Media: The Extensions of Man, 1964

    Marshall McLuhan said it in 1964, and it has never been more literally true than right now.

    McLuhan argued that the form of a medium shapes meaning more profoundly than its content — that the structure through which a message travels transforms the message itself. Television didn’t just carry news; it turned all news into entertainment. The telegraph didn’t just speed up messages; it redefined what counted as urgent. The medium is the message because the medium defines what gets noticed, what gets understood, and what gets ignored.

    In 2026, a new medium has quietly taken its place at the centre of professional communication: the AI reasoning engine. And it is reading your documents.

    The question is — does your document speak its language?


    A Lesson from Web Development

    Teachers of web application development have been making this argument for years about a different audience. When students first learn HTML, they naturally imagine a human reader sitting at a browser, scrolling through their page. But experienced instructors push back: the consumer of your web content is increasingly not a human — it is an API call driven by an AI agent.

    This changed everything about how developers write code. REST APIs, JSON-LD schema markup, semantic HTML, structured data endpoints — all of these exist because machines need to be able to read and extract the meaning of web content programmatically. A web developer who ignores this reality builds beautiful pages that AI agents cannot understand. Their content becomes invisible to the new infrastructure of the internet.

    The exact same paradigm shift is now happening to business documents. And most business educators, writing instructors, and professional communicators have not yet caught up.


    What Is RAG — and Why Should Business Writers Care?

    Before we can understand what has changed, we need a brief introduction to the technology reshaping how AI systems read documents.

    Retrieval-Augmented Generation (RAG) is the architectural approach that powers most AI document assistants today. Rather than relying solely on memorized training data, a RAG system connects a language model to an external knowledge source — your documents, your company’s reports, your policy manuals. When a question arrives, the system first retrieves relevant sections from those documents, then uses the LLM to generate an answer based on what it found.

    RAG is why AI tools can answer questions about your specific company’s data, your internal policies, your client contracts, or your quarterly filings. Without RAG, AI would have no access to private, current, or proprietary information.

    For years, the dominant approach was Vector RAG: documents were sliced into small chunks, each chunk was converted into a numerical vector (a kind of mathematical fingerprint), and retrieval was performed by finding the chunks whose fingerprints most closely resembled the query. This worked adequately for short, informal documents. For structured professional documents, it consistently failed — because similarity and relevance are not the same thing.

    Then came a better approach.


    Vectorless RAG: The System That Reads Like a Human Expert

    Vectorless RAG — exemplified by the open-source framework PageIndex, developed by VectifyAI — abandons similarity search entirely. Instead of chunking and embedding, it does something far more intelligent: it reads the document’s structure, builds a reasoning tree from that structure, and navigates to the exact section that logically contains the answer.

    The benchmark that announced this shift came from FinanceBench, the industry standard for testing AI on complex financial documents such as SEC 10-K and 10-Q filings:

    ApproachAccuracy on FinanceBench
    Traditional Vector RAG~50%
    Optimized Vector RAG (with reranking)~91%
    PageIndex Vectorless RAG98.7%

    The difference is not a better algorithm in the narrow sense — it is a fundamentally different philosophy of reading. Vector RAG asks: what text looks most like this query? Vectorless RAG asks: which section of this document logically contains the answer to this query?

    The vectorless pipeline works like this:

    1. Ingest — The document is analyzed for its structural hierarchy (sections, subsections, headings, logical divisions)
    2. Index — A tree of named, summarized nodes is built — essentially a deep, intelligent table of contents
    3. Reason — When a query arrives, an LLM examines the tree and identifies which nodes logically contain the answer
    4. Retrieve — Only those precise sections are extracted — no noisy chunks, no adjacent irrelevancies
    5. Generate — The LLM produces a cited, traceable answer from that precise context

    This process mimics exactly how a skilled human analyst reads a complex report: consult the table of contents, identify the relevant section, navigate there, read it, answer the question. The AI doesn’t scan the whole document looking for similar-sounding text. It reasons about where the answer lives.

    And this is where McLuhan’s insight hits with full force.


    The Medium Is the Message — Now More Than Ever

    When he wrote that “the content of any medium is always another medium,” he was pointing to the fact that form shapes interpretation at every level. Television’s message was not any particular broadcast — it was the transformation of time, attention, and the relationship between viewer and event. The internet’s message was not any particular website — it was the restructuring of human association and the collapse of distance.

    Vectorless RAG’s message is this: the structure of your document is now as semantically meaningful as its words.

    A document with a clear, logical heading hierarchy tells an AI reasoning system: “I have organized my ideas hierarchically. You can navigate me like a structured body of knowledge.” A document with no headings, buried topic sentences, and sprawling paragraphs tells the same system: “I am a wall of text. You will have to guess what I am about.”

    The AI reasoning system does not just read your document’s structure. It reasons from your document’s structure. The structure is not decoration. It is not formatting. It is the semantic map the AI uses to navigate your ideas.

    McLuhan saw this coming sixty years ago. He just didn’t know it would be called vectorless RAG.


    Five Questions That Guide You to the Core Insight

    Q1: What specifically does a vectorless RAG system “see” when it looks at a document?

    It sees the document’s organizational logic — the hierarchy of ideas as expressed through structural signals. Specifically, it identifies:

    • Headings and their levels (H1, H2, H3) — which ideas are primary, which are subordinate, and what the overall information architecture looks like
    • Section boundaries — where one idea ends and another begins
    • Topic sentences — the first sentence of each section or paragraph, which signals what follows
    • Explicit labels — “Executive Summary,” “Recommendations,” “Key Findings,” “Background,” “Appendix”
    • Hierarchical relationships — which sections are subsections of which parent sections

    From these signals, the system builds its reasoning tree. Each node in the tree gets a title and a brief summary derived from the section’s actual content. When a query arrives, the system asks: “Which node’s title and summary suggest this section contains the answer?” It is a reasoning process — not a search process.

    The practical implication: a document whose sections are clearly labelled, logically named, and properly nested gives the AI a rich, navigable map. A document that uses generic headings like “Introduction,” “Discussion,” and “Conclusion” gives the AI almost no navigational information — every document has those sections, so they distinguish nothing.


    Q2: How is this analogous to how web developers now write for AI agents?

    The parallel is almost exact. Web developers discovered years ago that Google’s crawlers, and later AI search engines like Perplexity, ChatGPT, and Gemini, do not read web pages the way humans do. They parse structure. They follow semantic HTML hierarchy. They extract meaning from <h1><h2>, schema markup, JSON-LD metadata, and OpenGraph tags.

    As a result, modern web development education teaches students to write for two audiences simultaneously: the human reader and the machine reader. A page that is beautiful but structurally opaque — heavy on images, light on semantic markup, with long unbroken text blocks — is effectively invisible to AI search infrastructure.

    Business documents face the same bifurcation. The human reader of a well-structured report benefits from clear headings and logical organization — this is not new. But now the machine reader — the AI agent parsing your document through a vectorless RAG pipeline — depends on that structure to find, extract, and reason about your content. A report written with disciplined heading hierarchies, descriptive section titles, and self-contained paragraphs becomes easily navigable by AI agents. A report written as flowing prose — however elegant — becomes, from the AI’s perspective, a nearly unreadable wall of text.

    The web developer’s lesson translates directly: structure your document so its logical architecture is explicit, not implied.


    Q3: What specific writing practices make a business document “AI-navigable” under vectorless RAG?

    There are six evidence-backed practices that dramatically improve how a vectorless RAG system can navigate and reason about a document:

    1. Use a genuine heading hierarchy — not decorative headings.
    Each heading level must represent a real semantic distinction: H1 for the document title, H2 for major sections, H3 for subsections within those sections. The headings must be descriptive and specific. “Q3 Revenue Performance” is navigable. “Financial Discussion” is not. AI reasoning systems use heading titles to build their tree nodes — a vague title produces a vague node, which the reasoning system cannot confidently navigate to.

    2. Front-load every section with a direct statement of its purpose.
    The first one or two sentences of any section should clearly state what that section covers and what its key conclusion or data point is. This is what vectorless RAG uses to generate each node’s “summary.” If the opening sentences of a section are contextual preamble rather than direct statements, the AI’s summary of that node will be weak — and it will be less likely to be selected during reasoning.

    3. Make each section self-contained.
    A well-structured document for AI navigation allows any section to be read in isolation and still make sense. References to earlier content (“as discussed above,” “building on the previous section”) are fine for human readers but reduce a section’s extractability for AI systems. Key facts, figures, and conclusions should be restated within the section where they are most relevant — not just in a preamble the AI may not retrieve.

    4. Use explicit structural markers and labelling.
    Terms like “Executive Summary,” “Key Findings,” “Recommendations,” “Background,” “Risk Factors,” and “Appendix” are high-value navigation signals. They tell the AI reasoning system what type of information a section contains, not just what it is about. This is especially powerful in standard business document formats (annual reports, proposals, policy documents, project reports) where these labels align with known section types the AI can reason about confidently.

    5. Use tables and lists for structured data — never embed them in paragraphs.
    Comparative data, key metrics, pros/cons, recommendations, and step-by-step processes should always be presented as tables or bulleted/numbered lists. When structured data is buried in paragraph prose, vectorless RAG may extract the paragraph but have difficulty parsing the data within it. A table gives the AI a clearly bounded, extractable unit of structured information.

    6. Write descriptive document metadata.
    Document titles, subtitles, date, author, version, and a brief abstract or executive summary at the top of the document serve as the root node of the tree index. A document that begins with a well-labelled executive summary is dramatically easier for a vectorless system to navigate, because the root summary tells the system what the entire document is about — giving context for all subsequent node reasoning.


    Q4: Does this mean business writing has to become mechanical or robotic?

    Absolutely not — and this is the most important nuance for writing instructors to communicate. The practices above are not new inventions created for AI. They are the foundational principles of good professional writing that have always existed: clarity, hierarchy, signposting, self-contained paragraphs, specific headings. Every business communication textbook teaches these principles.

    What has changed is the consequence of ignoring them. Previously, a poorly structured document frustrated human readers. Now, it also makes your document invisible or unreliable for AI systems operating through vectorless RAG pipelines. The cost of structural laziness has increased dramatically.

    Think of it this way: a skilled human analyst can read a rambling, poorly organized report and still extract meaning through patience, inference, and contextual knowledge. An AI vectorless RAG system cannot. If the heading doesn’t tell it where to look, it will either guess wrong or fail to retrieve the relevant section at all. The gap between what the document means and what the AI can find in it is entirely determined by structure.

    The positive framing for students: writing with disciplined structure is not a concession to machines. It is the mark of a professional communicator who respects their audience — whether that audience is a CFO scanning a 50-page report, or an AI agent ingesting 10,000 documents in a compliance pipeline. Good structure serves both.


    Q5: What does McLuhan’s insight tell us about the deeper cultural shift at work here?

    McLuhan argued that new media do not merely add new capabilities — they restructure how we think, communicate, and organize knowledge. The telegram didn’t just make communication faster; it changed what counted as information. Television didn’t just add pictures to radio; it changed the relationship between audience and event. Each new medium, by altering the form of communication, altered the substance of what gets communicated and what gets valued.

    Contemporary scholars applying McLuhan to AI argue that generative AI is not just a tool — it is a medium. Its message is not any particular output. Its message is the transformation of how humans and institutions communicate, organize, and transmit knowledge. As one 2026 analysis puts it: “The form of engagement — fluid, interactive, immediate — becomes the true message. McLuhan would have seen me not as a tool but as a shift in the ecology of consciousness.”

    Vectorless RAG is a particularly vivid instance of this shift. It doesn’t just change how AI reads documents — it changes what a well-written document means. Structure, always important, becomes architecturally decisive. A document that communicates clearly to an AI reasoning system is not just better formatted — it is a different kind of document, one that participates in a new communication ecology.

    For students of business communication, this is not a technical side note. It is the central insight of our moment: the format of your document is now part of its message. The medium, once again, is the message.


    The Practical Checklist: Writing AI-Navigable Business Documents

    Below is a ready-to-use checklist for business writing students, based on the principles above. This can be posted on Moodle, printed as a rubric, or included in any business communications curriculum.

    Document-Level Structure

    • Document begins with a labelled Executive Summary or Abstract that states the document’s purpose, key findings, and recommendations in 2–4 sentences
    • A single, descriptive H1 title that includes the subject, scope, and date (e.g., “Q3 2025 Revenue Analysis — Acme Corp — October 2025”)
    • Table of contents for documents longer than 5 pages, using the exact heading text from the document
    • Document metadata visible at the top: author, date, version, document type

    Heading Architecture

    • H2 headings are descriptive and specific — not generic labels like “Discussion” but specific labels like “Revenue Growth Drivers in Q3” or “Three Risk Factors Identified”
    • H3 headings signal content type — use labels like “Key Metrics,” “Recommendations,” “Background,” “Analysis” to tell the AI what kind of information follows
    • No orphan headings — every heading is followed by substantive content, not immediately by another heading
    • No more than 3 heading levels in most business documents

    Section-Level Writing

    • Every section opens with a direct statement of its topic and key point within the first two sentences
    • Sections are self-contained — key data, figures, and conclusions are stated within the relevant section, not only in a preamble
    • No section exceeds 400 words without a subheading to re-orient the reader (and the AI)
    • Cross-references are explicit — “See Section 4.2: Risk Factors” not “as discussed earlier”

    Data and Evidence Presentation

    • Comparative data is in a table, not embedded in paragraph prose
    • Sequential processes use numbered lists, not narrative prose
    • Each list item is parallel in structure — all starting with the same grammatical form
    • Tables have descriptive captions that state what the table shows

    Language and Clarity

    • Topic sentences use Subject-Verb-Object order for maximum extractability
    • Avoid pronoun-heavy references to previous sections (“this,” “the above,” “the latter”)
    • Define acronyms and technical terms on first use within each major section, not just once per document
    • Key metrics are stated with full context in the sentence where they appear (e.g., “Net revenue increased 23%, from $36.7M in FY2023 to $45.2M in FY2024”)

    Lab Workbook: Build a Proof of Concept Vectorless RAG System in Python

    This lab walks you through building a working Vectorless RAG MVP that you can deploy free on Hugging Face Spaces. You will see firsthand how document structure determines AI retrieval quality — making the abstract lesson concrete through hands-on experience.

    What you will build: An interactive Q&A app that:

    • Parses a pasted document into a structural section tree (the vectorless index)
    • Uses LLM reasoning to select which section logically contains the answer
    • Returns a cited answer showing exactly which section was used

    Tech stack: Python, Gradio, rank-bm25, Hugging Face InferenceClient — no GPU, no vector database, no embedding model required.


    Step 1: Create Your Hugging Face Space

    1. Go to huggingface.co and sign up for a free account
    2. Click “+ New Space”
    3. Name it vectorless-rag-lab
    4. Select Gradio as the SDK
    5. Set visibility to Public
    6. Click “Create Space”

    Step 2: Project Files

    You need three files:

    textvectorless-rag-lab/
    ├── app.py              ← Main application
    ├── requirements.txt    ← Dependencies
    └── README.md           ← Space configuration

    Step 3: requirements.txt

    textgradio>=4.0.0
    rank-bm25>=0.2.2
    huggingface_hub>=0.20.0

    Step 4: app.py — The Complete Application

    python"""
    Vectorless RAG Lab — AI with Peter (aiwithpeter.com)
    Demonstrates how document STRUCTURE determines AI retrieval quality.
    The clearer your structure, the better the AI navigates your document.
    
    This is the core lesson: the medium IS the message.
    """
    
    import gradio as gr
    import json
    import re
    from rank_bm25 import BM25Okapi
    from huggingface_hub import InferenceClient
    
    
    # ════════════════════════════════════════════════════════════
    # PHASE 1: DOCUMENT TREE BUILDER
    # This is what vectorless RAG does instead of chunking:
    # it reads the document's STRUCTURE and builds a navigation tree.
    # A well-structured document produces a rich, navigable tree.
    # A poorly structured document produces a flat, useless tree.
    # This is the McLuhan lesson made visible in code.
    # ════════════════════════════════════════════════════════════
    
    def build_section_tree(text: str) -> list[dict]:
        """
        Parse a document into a structured section tree.
        Detects headings by common patterns: markdown ##, numbered sections,
        ALL CAPS lines, and lines followed by blank lines.
        
        Key insight: the QUALITY of this tree directly determines
        retrieval accuracy. Good headings → good tree → good answers.
        Bad headings → flat tree → wrong answers.
        """
        lines = text.strip().split("\n")
        sections = []
        current_title = "Document Opening"
        current_content = []
        node_id = 0
    
        heading_pattern = re.compile(
            r'^(#{1,4}\s.+|[A-Z][A-Z\s\d:]{4,60}$|(\d+[\.\d]*)\s+[A-Z].+)',
            re.MULTILINE
        )
    
        for line in lines:
            stripped = line.strip()
            if not stripped:
                continue
    
            is_heading = (
                bool(heading_pattern.match(stripped)) and
                len(stripped) < 100 and
                not stripped.endswith('.')
            )
    
            if is_heading and current_content:
                body = " ".join(current_content)
                sections.append({
                    "node_id": f"N{node_id:03d}",
                    "title": current_title,
                    "summary": body[:180] + "..." if len(body) > 180 else body,
                    "content": body,
                    "word_count": len(body.split())
                })
                node_id += 1
                current_title = stripped.lstrip('#').strip()
                current_content = []
            else:
                current_content.append(stripped)
    
        # Capture final section
        if current_content:
            body = " ".join(current_content)
            sections.append({
                "node_id": f"N{node_id:03d}",
                "title": current_title,
                "summary": body[:180] + "..." if len(body) > 180 else body,
                "content": body,
                "word_count": len(body.split())
            })
    
        return sections
    
    
    # ════════════════════════════════════════════════════════════
    # PHASE 2: REASONING-BASED NODE SELECTION
    # The AI reads the tree (titles + summaries) and REASONS
    # about which section logically contains the answer.
    # This is "retrieval as reasoning" — not similarity search.
    # Notice: it reads section TITLES and SUMMARIES, not full text.
    # Good titles are the difference between correct and wrong retrieval.
    # ════════════════════════════════════════════════════════════
    
    def select_nodes_by_reasoning(tree: list[dict], query: str, hf_token: str):
        """
        Pass the document tree to an LLM and ask it to REASON
        about which sections logically contain the answer.
        Returns: (list of node IDs, reasoning explanation)
        """
        # Build compact tree — only titles and summaries, not full content
        tree_map = "\n".join([
            f"[{n['node_id']}] SECTION: {n['title']}\n   CONTENT SUMMARY: {n['summary']}"
            for n in tree
        ])
    
        prompt = f"""You are a document navigation specialist.
    Given the document structure below, identify which sections 
    contain the answer to the user's question.
    
    Think step by step:
    1. What information type does the question seek?
    2. Which section TITLES suggest they contain that information?
    3. Which section SUMMARIES confirm relevant content?
    
    DOCUMENT STRUCTURE:
    {tree_map}
    
    USER QUESTION: {query}
    
    Respond ONLY with valid JSON in this exact format:
    {{"reasoning": "Your step-by-step logic here", "node_ids": ["N001", "N002"]}}"""
    
        try:
            client = InferenceClient(
                model="mistralai/Mistral-7B-Instruct-v0.3",
                token=hf_token
            )
            response = client.text_generation(
                prompt, max_new_tokens=250, temperature=0.1
            )
            json_match = re.search(r'\{.*\}', response, re.DOTALL)
            if json_match:
                result = json.loads(json_match.group())
                return result.get("node_ids", []), result.get("reasoning", "")
        except Exception as e:
            print(f"LLM call failed: {e}")
    
        return bm25_fallback(tree, query), "BM25 keyword fallback (no LLM token)"
    
    
    def bm25_fallback(tree: list[dict], query: str) -> list[str]:
        """BM25 fallback — still vectorless (no embeddings)."""
        corpus = [
            (n["title"] + " " + n["content"]).lower().split()
            for n in tree
        ]
        bm25 = BM25Okapi(corpus)
        scores = bm25.get_scores(query.lower().split())
        top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2]
        return [tree[i]["node_id"] for i in top_indices]
    
    
    # ════════════════════════════════════════════════════════════
    # PHASE 3: ANSWER GENERATION
    # Only the precisely retrieved sections are passed to the LLM.
    # No noisy chunks. No hallucination bait.
    # The answer includes which section it came from — full traceability.
    # ════════════════════════════════════════════════════════════
    
    def generate_answer(query: str, selected_nodes: list[dict], hf_token: str) -> str:
        context = "\n\n---\n".join([
            f"[SOURCE SECTION: {n['title']} | ID: {n['node_id']}]\n{n['content']}"
            for n in selected_nodes
        ])
    
        prompt = f"""You are a precise document analyst.
    Answer the question using ONLY the provided source sections.
    Cite which section contains your answer.
    If the answer is not present, say exactly that.
    
    SOURCE SECTIONS:
    {context}
    
    QUESTION: {query}
    
    ANSWER (with section citation):"""
    
        try:
            client = InferenceClient(
                model="mistralai/Mistral-7B-Instruct-v0.3",
                token=hf_token
            )
            return client.text_generation(prompt, max_new_tokens=400, temperature=0.2)
        except Exception:
            # Extractive fallback
            for node in selected_nodes:
                sentences = node["content"].split(".")
                for s in sentences:
                    if any(w in s.lower() for w in query.lower().split()[:3]):
                        return f"{s.strip()}.\n\n[Extracted from: **{node['title']}** | {node['node_id']}]"
            return f"[Extracted from: **{selected_nodes[0]['title']}**]\n\n{selected_nodes[0]['content'][:600]}..."
    
    
    # ════════════════════════════════════════════════════════════
    # COMPARISON DOCUMENTS
    # The pedagogical heart of the lab:
    # Two versions of the SAME business report —
    # one poorly structured, one well-structured.
    # Ask the same question to both. Compare the results.
    # This IS McLuhan's lesson: the medium IS the message.
    # ════════════════════════════════════════════════════════════
    
    POORLY_STRUCTURED_DOC = """
    Annual Business Report
    
    This document covers our company performance. We had a good year overall and 
    there were several things that happened. Revenue went up and we hired more people.
    The company was founded in 2015 and has grown a lot since then. 
    We have offices in Toronto, Vancouver, and Calgary. Our main products are software 
    and consulting services.
    
    There was also some discussion about risks. Competition is increasing. We have to 
    worry about other companies doing similar things. There are also regulatory issues 
    that could affect us. Currency risk is something we think about too because of 
    international sales.
    
    Numbers-wise, we made $45.2 million in revenue which was more than last year 
    which was $36.7 million. Net income was $8.1 million. Gross margin was 67%. 
    In the third quarter specifically the revenue was $12.3 million. We added 47 
    new enterprise customers during that same period. R&D spending was $4.2 million 
    in Q3.
    
    Going forward we think revenue will be between $54 million and $58 million 
    next year. We are planning to spend more on R&D, about 30% more. We will 
    also try to expand into Asia in the second quarter of next year.
    """
    
    WELL_STRUCTURED_DOC = """
    ## EXECUTIVE SUMMARY
    Acme Corporation delivered strong fiscal year 2024 performance, with revenue 
    growing 23% to $45.2M and net income rising to $8.1M. Gross margin expanded 
    to 67%. Full-year 2025 guidance is set at $54M–$58M revenue.
    
    ## COMPANY PROFILE
    Founded in 2015, Acme Corporation provides enterprise software and consulting 
    services. Headquarters: Toronto, Ontario. Regional offices in Vancouver and 
    Calgary. Current headcount: 287 employees.
    
    ## FULL YEAR 2024 FINANCIAL RESULTS
    Total revenue reached $45.2 million in FY2024, a 23% increase from $36.7 million 
    in FY2023. Net income was $8.1 million, up from $5.4 million. Gross margin 
    improved from 61% to 67% year over year.
    
    ## Q3 2024 QUARTERLY RESULTS
    Q3 2024 revenue: $12.3 million (+18% vs Q3 2023). Operating expenses: $9.1M, 
    including $4.2M in R&D investment. New enterprise customers added in Q3: 47.
    
    ## RISK FACTORS
    Three primary risks identified: (1) Competitive pressure from Microsoft, 
    Salesforce, and emerging startups; (2) Regulatory changes in data privacy 
    law affecting operations; (3) Currency fluctuation risk from European and 
    Asian sales exposure.
    
    ## 2025 FORWARD GUIDANCE
    Revenue guidance: $54M–$58M for fiscal year 2025. R&D investment to increase 
    30% year-over-year. Asia-Pacific market entry targeted for Q2 2025.
    """
    
    
    # ════════════════════════════════════════════════════════════
    # MAIN PIPELINE
    # ════════════════════════════════════════════════════════════
    
    def run_rag(document_text: str, question: str, hf_token: str):
        if not document_text.strip():
            return "⚠️ Please paste a document.", "", ""
        if not question.strip():
            return "⚠️ Please enter a question.", "", ""
    
        # Build tree
        tree = build_section_tree(document_text)
        if not tree:
            return "⚠️ No structure detected in document.", "", ""
    
        tree_display = "### 🌳 Document Tree Index\n\n"
        tree_display += "*This is what the AI uses to navigate your document.*\n\n"
        for n in tree:
            tree_display += f"📂 **[{n['node_id']}] {n['title']}** ({n['word_count']} words)\n"
            tree_display += f"   ↳ *{n['summary'][:120]}...*\n\n"
    
        # Node selection
        if hf_token.strip():
            selected_ids, reasoning = select_nodes_by_reasoning(tree, question, hf_token)
        else:
            selected_ids = bm25_fallback(tree, question)
            reasoning = "BM25 keyword retrieval (add HF token for LLM reasoning)"
    
        selected_nodes = [n for n in tree if n["node_id"] in selected_ids]
    
        retrieval_display = f"### 🔍 Retrieval Reasoning\n\n"
        retrieval_display += f"**Method:** {reasoning}\n\n"
        retrieval_display += f"**Selected sections:** {', '.join(selected_ids)}\n\n"
        for n in selected_nodes:
            retrieval_display += f"- ✅ **[{n['node_id']}] {n['title']}**\n"
    
        # Generate answer
        if hf_token.strip() and selected_nodes:
            answer = generate_answer(question, selected_nodes, hf_token)
        elif selected_nodes:
            answer = f"**[Add HF token for full LLM answers. Showing extracted text:]**\n\n"
            answer += f"**From '{selected_nodes[0]['title']}':**\n\n{selected_nodes[0]['content'][:600]}"
        else:
            answer = "❌ No relevant sections found. Try rephrasing your question."
    
        return answer, tree_display, retrieval_display
    
    
    # ════════════════════════════════════════════════════════════
    # GRADIO UI
    # ════════════════════════════════════════════════════════════
    
    with gr.Blocks(title="Vectorless RAG Lab — AI with Peter", theme=gr.themes.Soft()) as demo:
    
        gr.Markdown("""
        # 🧠 Vectorless RAG Lab — AI with Peter
        ### *"The Medium Is the Message"* — How Document Structure Determines AI Retrieval
        *From [aiwithpeter.com](https://aiwithpeter.com) · Educational Proof of Concept*
    
        ---
        
        **The core lesson:** Two documents contain identical facts. 
        One is poorly structured. One is well-structured.
        Ask the same question to both — and watch how structure changes everything.
        
        This is Marshall McLuhan's insight made visible in code: **the medium IS the message.**
        """)
    
        with gr.Row():
            load_poor = gr.Button("📄 Load Poorly Structured Document", variant="secondary")
            load_good = gr.Button("✅ Load Well-Structured Document", variant="primary")
    
        with gr.Row():
            with gr.Column(scale=2):
                document_input = gr.Textbox(
                    label="📄 Document (paste any business document)",
                    lines=14,
                    placeholder="Paste your document here, or click one of the load buttons above..."
                )
                question_input = gr.Textbox(
                    label="❓ Question",
                    placeholder='Try: "What was Q3 2024 revenue?" or "What are the main risk factors?"',
                    lines=2
                )
                hf_token_input = gr.Textbox(
                    label="🔑 Hugging Face Token (optional — enables LLM reasoning mode)",
                    placeholder="hf_... (free at huggingface.co/settings/tokens)",
                    type="password",
                    lines=1
                )
                run_btn = gr.Button("🚀 Run Vectorless RAG", variant="primary", size="lg")
    
            with gr.Column(scale=3):
                with gr.Tabs():
                    with gr.Tab("💬 Answer"):
                        answer_output = gr.Markdown()
                    with gr.Tab("🌳 Document Tree"):
                        tree_output = gr.Markdown()
                    with gr.Tab("🔍 Retrieval Reasoning"):
                        retrieval_output = gr.Markdown()
    
        load_poor.click(lambda: POORLY_STRUCTURED_DOC, outputs=document_input)
        load_good.click(lambda: WELL_STRUCTURED_DOC, outputs=document_input)
        run_btn.click(
            fn=run_rag,
            inputs=[document_input, question_input, hf_token_input],
            outputs=[answer_output, tree_output, retrieval_output]
        )
    
        gr.Markdown("""
        ---
        ### 🎓 How to Use This Lab for Learning
    
        **Exercise 1 — The Structure Comparison:**
        1. Load the **Poorly Structured Document**. Ask: *"What was Q3 revenue?"*
        2. Check the 🌳 Tree tab — notice how few, vague nodes were built
        3. Load the **Well-Structured Document**. Ask the same question
        4. Check the 🌳 Tree tab again — notice the rich, navigable node structure
        5. Compare the answers. Same facts. Totally different retrieval quality.
        
        **Exercise 2 — Test Your Own Documents:**
        Paste a business report you have written. Ask a question about it.
        Review the Tree tab. If the tree is flat or has generic node names,
        your document is not AI-navigable. Apply the writing checklist and try again.
        
        **Exercise 3 — The McLuhan Reflection:**
        The tree IS the message. The structure of your document IS the message
        the AI receives. A document with no structure tells the AI: 
        *"I have no navigable logic. Guess."*
        
        **Without HF token:** BM25 keyword retrieval (still vectorless — no embeddings)  
        **With HF token:** Full Mistral-7B reasoning-based navigation
        
        ---
        *Built for [aiwithpeter.com](https://aiwithpeter.com) · [PageIndex GitHub](https://github.com/VectifyAI/PageIndex) · 
        Peter Sigurdson · Toronto, Ontario*
        """)
    
    if __name__ == "__main__":
        demo.launch()

    Step 5: README.md

    text---
    title: Vectorless RAG Lab — AI with Peter
    emoji: 🧠
    colorFrom: blue
    colorTo: green
    sdk: gradio
    sdk_version: 4.36.0
    app_file: app.py
    pinned: false
    ---
    
    # Vectorless RAG Lab — AI with Peter
    
    "The Medium Is the Message" — demonstrated in code.
    
    Two documents, same facts, different structure. Ask the same question.
    Watch how document structure determines AI retrieval quality.
    
    Built for [aiwithpeter.com](https://aiwithpeter.com) by Peter Sigurdson.

    Step 6: Deploy and Run the Comparison Exercise

    1. Upload all three files to your Space
    2. Wait 2–3 minutes for the build to complete
    3. Click “Load Poorly Structured Document” and ask: “What was Q3 2024 revenue?”
    4. Open the 🌳 Document Tree tab — observe the sparse, vague tree
    5. Click “Load Well-Structured Document” and ask the same question
    6. Open the 🌳 Document Tree tab again — observe the rich, navigable tree
    7. Compare the two answers

    The difference is not the facts — they are identical in both documents. The difference is structure. The medium is the message.


    Step 7: The Writing Improvement Loop

    Use this lab as a revision tool for your own documents:

    1. Paste a document you have written
    2. Ask questions you would expect an AI to answer about it
    3. Check the Tree tab — are the node titles descriptive or generic?
    4. Apply the writing checklist from Part 3
    5. Re-paste the revised document and run again
    6. Compare the tree quality before and after

    This loop makes the abstract principle concrete: every revision that improves structure produces a measurably better tree, which produces measurably better answers.


    The Takeaway for Educators

    Web developers learned this lesson years ago. They restructured their craft around the reality that their content has two audiences: humans and machines. The machine audience is now, in many contexts, the primary audience — the one that decides whether your content surfaces in an AI answer, an agent pipeline, or a RAG-powered knowledge system at all.

    Business writers, policy professionals, lawyers, analysts, and academics are at the same inflection point. The students who understand this now — who write with deliberate structural discipline, who treat headings as navigation systems, who make every section self-contained and explicitly labelled — will produce documents that perform extraordinarily well in the AI-augmented professional environments they are entering.

    Marshall McLuhan saw the pattern in every previous medium. Television. Radio. The printed book. Each one restructured not just communication, but cognition — how people organized, valued, and transmitted knowledge. AI reasoning systems are doing the same thing to the business document. The organizations and individuals who understand that the form of the document is its message to the machine will build knowledge systems that are radically more capable, more accurate, and more useful than those who do not.

    Stop chunking. Start structuring.

    The medium is the message.


    AI with Peter · aiwithpeter.com
    Peter Sigurdson · Toronto, Ontario, Canada

  • Today in AI: DeepSeek V4 and Moonshot Kimi K2.6 for Educators and Builders

    Let’s spotlight today’s big model stories:

    DeepSeek V4 and Moonshot Kimi K2.6,

    with concrete “how to access” steps and classroom‑ready activities. cometapi


    The last few days have felt like another “GPT‑4 moment” for the open and low‑cost side of AI. Two models in particular are reshaping what’s possible for educators, developers, and solo creators: DeepSeek V4 and Moonshot Kimi K2.6. latent

    In this post, I’ll unpack what they are, how you can access them today, and give you concrete, classroom‑tested style activities to start using them within the next hour. apidog


    DeepSeek V4: Million‑Token Context on a Budget

    DeepSeek V4 is the newest model family from DeepSeek, designed around two ideas: very long context (up to around a million tokens) and agent‑style workloads like multi‑step coding, research, and automation. The lineup typically includes DeepSeek‑V4‑Pro (bigger, more capable) and DeepSeek‑V4‑Flash (smaller, cheaper, faster), both tuned for reasoning over large document sets and powering tool‑using agents. api-docs.deepseek

    For teachers, data folks, and builders, three capabilities stand out. atlascloud

    • Million‑token context window
      DeepSeek V4 is engineered to handle around 1M tokens of context, which is on the frontier of what’s currently usable in production. That means you can drop in entire books, multi‑module course shells, long codebases, or large policy binders and still ask detailed questions without constant chunking. huggingface
    • Agent‑friendly design
      The model is post‑trained to coordinate tool calls and multi‑step reasoning, which makes it well‑suited for search agents, document‑analysis bots, coding assistants, and workflow automation. This is especially relevant if you are experimenting with orchestration tools or custom “AI TA” agents for your classes. cometapi
    • Two main flavors: Pro and Flash
    • V4‑Pro: higher quality reasoning, strong coding and analysis, better choice when quality matters more than latency or cost. api-docs.deepseek
    • V4‑Flash: optimized for speed and low cost, good for interactive apps, chatbots, or classroom deployments where you need many students hitting the model at once. apidog

    How to access DeepSeek V4 (chat and API)

    Most readers will care about three access paths: web chat, mobile app, and API. huggingface

    • 1) Web chat (zero‑code)
    • Go to DeepSeek’s official chat interface (commonly surfaced as a “chat.deepseek.com” style endpoint in docs and partner guides). iamdave
    • Choose Expert Mode (usually maps to V4‑Pro) for deep reasoning and longer tasks, or Instant/Flash mode for cheap, fast back‑and‑forth. cometapi
    • 2) Mobile app
      DeepSeek’s mobile app has quickly become popular in app stores, and the V4 line is being surfaced there alongside earlier models. Once installed and logged in, you can switch the model to V4‑Pro or V4‑Flash from the model selector in the chat UI. iamdave
    • 3) API (OpenAI‑style)
      DeepSeek V4 is exposed through OpenAI‑compatible APIs on several platforms, including aggregators that provide a model="deepseek-v4-pro" or model="deepseek-v4-flash" option. A typical Python call looks like this (from a recent quick‑start article): atlascloud
      import os
      from openai import OpenAI
    
      client = OpenAI(
          api_key=os.environ["COMETAPI_API_KEY"],
          base_url="https://api.cometapi.com"
      )
    
      response = client.chat.completions.create(
          model="deepseek-v4-pro",
          messages=[
              {"role": "system", "content": "You are a helpful assistant."},
              {"role": "user",
               "content": "Summarize the benefits of million-token context for educators."}
          ],
          extra_body={"thinking": {"type": "enabled"}},
          reasoning_effort="high"
      )
    
      print(response.choices.message.content)

    cometapi

    High‑value use cases for DeepSeek V4

    Here are concrete “winning” use cases that fit DeepSeek V4’s strengths. scmp

    ScenarioWhy V4 works wellPractical example
    Course‑scale syllabus & policy assistant1M context handles full LMS exports, syllabi, policies, rubrics in one prompt. api-docs.deepseekUpload a full term’s syllabus pack and ask: “Generate a student‑friendly FAQ, plus a one‑page ‘How to succeed in this course’ guide.” cometapi
    Research copilotLong context supports dozens of PDFs at once, plus multi‑step reasoning across them. cometapiPaste sections of 10–20 papers and have V4 produce an RQ‑aligned literature map with gaps and proposed experiments. cometapi
    Codebase navigator & refactorerDesigned for tool‑using, multi‑file reasoning, and large contexts. cometapiFeed an entire teaching repo (R/Python notebooks, scripts, data) and ask it to propose a refactor plan and generate tests. cometapi
    Assessment generator at scaleContext window lets you keep large banks of outcomes, exemplars, and constraints in‑prompt. cometapiProvide your CLOs, sample questions, and marking rubrics; ask for 50 new variations tagged by Bloom’s level. cometapi

    Here is a classroom‑ready activity you can run with DeepSeek V4 in 30 minutes. apidog

    1. Prepare the files
      Export your full course outline, syllabus, assignment instructions, and academic integrity policy as PDFs or docs. apidog
    2. Upload to DeepSeek V4‑Pro
      In the web chat, pick the model that maps to DeepSeek‑V4‑Pro (often “Expert Mode”). Upload your documents into a single conversation. huggingface
    3. Prompt template
      Paste this into the chat:

    “You are an AI teaching assistant helping college students skim a course fast.
    Using all documents I’ve uploaded, create:
    1) A 2‑page student‑friendly course overview
    2) A top‑10 FAQ
    3) A checklist of ‘First Week Actions’ for students.”

    1. Iterate with your students
      Ask students to critique the output: What’s missing? What’s unclear? Where could AI mislead newcomers? This turns DeepSeek into a live critical‑AI‑literacy exercise, not just a content factory. huggingface

    Moonshot AI’s Kimi K2.6 is the newest version of its flagship model line, building on K2.5 and pushing open‑weight coding and long‑horizon reasoning to a new level. It’s being positioned as a leading open model, with strong coding benchmarks, agent‑swarm capabilities, and native multimodality (text, image, video). platform.kimi

    Recent coverage describes K2.6 as a 1T‑parameter MoE (Mixture of Experts) model, with about 32B active parameters, 384 experts, and a 256K‑token context window, tuned for long sequences and efficient inference. It supports image understanding, video‑tool integration, and multi‑step “agent loops” for tasks like video analysis or long coding problems. platform.kimi

    From the latest docs and announcements, a few core traits stand out. apidog

    • Open‑weight, high‑end coding & reasoning
      K2.6 is open‑weight, with strong results on coding and agent benchmarks, including high scores on SWE‑Bench Verified and Terminal‑Bench 2.0, positioning it as a serious alternative to frontier proprietary models for software work. latent
    • Long context and multimodality
      It offers around 256K context tokens, plus support for text, images, and videos in one flow. Example docs show K2.6 taking an image plus text instructions and returning detailed descriptions or analyses. Similar patterns exist for video: the model can call tools that extract and analyze video clips. platform.kimi
    • Agent swarms and tool calling
      K2.6 is designed for tool calling and “agent swarm” patterns with hundreds of sub‑agents coordinating over thousands of steps, making it well‑suited for long‑running workflows, debugging, and research tasks. apidog

    How to access Moonshot Kimi K2.6

    You can experiment with K2.6 via chat, API, and even locally in some setups. reddit

    • 1) Kimi web app
    • Go to Moonshot’s Kimi web interface (Kimi is their ChatGPT‑style product). apidog
    • Sign in or create an account. apidog
    • Select the model option corresponding to Kimi K2.6; Moonshot is rolling it into the product as the high‑end option for complex tasks. latent
    • 2) Mobile apps
      Kimi is also available as a mobile app on major app stores; the same account gives you access to K2.6 in chat. This is helpful if you’re using your phone as a teaching or demo device. apidog
    • 3) API access
      Moonshot exposes K2.6 through an OpenAI‑compatible API under the model="kimi-k2.6" identifier. A quick‑start example for image understanding looks like: platform.kimi
      import os
      import base64
      from openai import OpenAI
    
      client = OpenAI(
          api_key=os.environ.get("MOONSHOT_API_KEY"),
          base_url="https://api.moonshot.ai/v1",
      )
    
      image_path = "kimi.png"
      with open(image_path, "rb") as f:
          image_data = f.read()
    
      image_url = (
          f"data:image/{os.path.splitext(image_path) [api-docs.deepseek](https://api-docs.deepseek.com/news/news260424).lstrip('.')};base64,"
          f"{base64.b64encode(image_data).decode('utf-8')}"
      )
    
      completion = client.chat.completions.create(
          model="kimi-k2.6",
          messages=[
              {"role": "system", "content": "You are Kimi."},
              {
                  "role": "user",
                  "content": [
                      {
                          "type": "image_url",
                          "image_url": {"url": image_url},
                      },
                      {
                          "type": "text",
                          "text": "Please describe the content of the image.",
                      },
                  ],
              },
          ],
      )
    
      print(completion.choices.message.content)


    platform.kimi

    • 4) Agents and video analysis
      The docs also show how to wire K2.6 into a simple “agent loop” that can call a custom tool (like watch_video_clip) to load specific segments of a video for analysis. This is powerful for media studies, sports analysis, or lecture‑recording review workflows. platform.kimi

    Interesting starter activity: “AI Video T.A.”

    Here’s an activity you can run with advanced students using Kimi K2.6’s video tooling. apidog

    1. Pick a short educational video
      Choose a 3–5 minute clip (a recorded mini‑lecture, a YouTube explainer, or a programming walkthrough) and save it locally as an MP4. platform.kimi
    2. Set up the tool function
      Use the watch_video_clip tool pattern from the K2.6 docs, which wraps a function that takes a path and (optionally) start/end times, then returns a base64‑encoded video snippet plus a descriptive text block. platform.kimi
    3. Wire Kimi K2.6 as an “analysis agent”
      Implement the simple agent loop from the docs:
    • When Kimi calls watch_video_clip, your code extracts that segment with ffmpeg.
    • You feed the clip back as a multimodal tool result, and Kimi responds with an analysis. platform.kimi
    1. Prompt template for students
      Give students a prompt like:

    The example in the docs uses an almost identical agent loop to analyze a segment of a local video file. platform.kimi

    1. Reflection & critique
      Ask students to compare their own notes with Kimi’s analysis:
    • Where did the AI over‑ or under‑emphasize content?
    • Did it miss any misconceptions?
    • How usable are its quiz questions “as‑is” in your LMS?

    This turns K2.6 into a hands‑on assistant for media‑rich course design and gives learners practice interrogating AI outputs, not just consuming them. apidog


    DeepSeek V4 vs Kimi K2.6: When to Use Which?

    Both models are exciting, but they lean in slightly different directions. atlascloud

    AspectDeepSeek V4Kimi K2.6
    Context lengthAround 1M tokens, designed for massive document and code contexts. api-docs.deepseekAround 256K tokens, still very large for multimodal workflows. platform.kimi
    Model typeOpen‑sourced model family with Pro and Flash variants for quality vs speed tradeoffs. api-docs.deepseekOpen‑weight MoE (~1T params, 32B active, 384 experts). latent
    StrengthsLong‑context document analysis, search agents, automation, codebase‑scale reasoning. cometapiHigh‑end coding, multimodal (text+image+video), agent swarms, tool‑calling. platform.kimi
    Best entry pathWeb chat + OpenAI‑compatible APIs through various platforms. cometapiKimi web/mobile apps and Moonshot’s OpenAI‑compatible API. platform.kimi
    Ideal for educatorsTurning entire course shells, readings, and policies into interactive assistants; large‑scale assessment generation. cometapiTurning videos and images into teaching assets; advanced coding and project‑style AI TAs. platform.kimi