Skip to content
Malik Hamza Shabbir
Mobile DevelopmentReact NativeOn-Device AIApple Foundation ModelsExpo

On-Device AI in React Native: Apple Foundation Models

HSMalik Hamza Shabbir7 min read

In short

You can ship a working on-device AI feature in React Native today with no API key, no cloud account, and $0 per call. At WWDC 2026 this week, Apple expanded the Foundation Models framework with image input, server models, and a provider-agnostic model protocol, and the React Native bindings already exist. I spent the two days since the keynote building a private review summarizer in an Expo app, and this post walks through the exact stack, the working code, and the real numbers from my iPhone 16 Pro.

On-Device AI in React Native: Apple Foundation Models - branded cover card by Hamza Shabbir
On this page

What did WWDC 2026 change for on-device AI?

WWDC 2026 (June 8-10) added image input, server models, and a provider-agnostic language-model protocol to Foundation Models, plus free Private Cloud Compute for developers under 2M downloads. The base framework shipped with iOS 26 last year; this release turns it from a single-model API into a routing layer you can build a real product on.

Here is the one-sentence definition I give clients: Apple Foundation Models is the iOS framework that gives any app direct, free access to the roughly 3-billion-parameter on-device language model behind Apple Intelligence, with guided generation, streaming, and tool calling built in.

The four changes, ranked by how much they matter to me as a working engineer:

  • Free Private Cloud Compute for small developers. If your app has under 2M downloads, you can overflow requests to Apple's server models at no cost. The strongest objection to on-device AI used to be "what happens when the small model is not enough", and the answer is now "Apple pays for the bigger one".

  • Server models behind the same API. You write one session; the framework routes between the device model and Private Cloud Compute.

  • A provider-agnostic language-model protocol. Any model that conforms to one Swift protocol can sit behind the same calling code. This mirrors what the Vercel AI SDK did for JavaScript on the server.

  • Image input. The on-device model now accepts images, which opens receipt scanning, photo tagging, and screenshot Q&A without a vision API bill.


The new capabilities ride the OS beta until fall 2026, but the existing framework runs in production on iOS 26 today, and that is what I built against this week.

What are your three options for on-device AI in React Native?

As of June 2026 there are three realistic paths: @react-native-ai/apple, which exposes Foundation Models through the Vercel AI SDK interface; react-native-executorch, which runs open models like Llama 3.2 on both platforms; and Expo native modules that wrap Foundation Models on iOS and Gemini Nano on Android. All current RN on-device AI libraries require RN 0.80+ and the New Architecture.







@react-native-ai/applereact-native-executorchExpo modules (FM / Gemini Nano)
API surfaceVercel AI SDK provider: generateText, streamText, generateObjectOwn hooks API (useLLM) plus model files you bundle or downloadThin per-platform native module API
Platform coverageiOS only (Apple Intelligence devices)iOS and Android, any device with enough RAMiOS (Foundation Models) plus Android (Gemini Nano on supported devices)
ModelsApple's system model, ~3BLlama 3.2 1B/3B, Qwen 3, Phi, your own exportsSystem models on both platforms
App size cost0 bytes, the model ships with the OS1-2 GB per bundled model0 bytes
RequirementsRN 0.80+, New ArchitectureRN 0.80+, New ArchitectureRN 0.80+, New Architecture, dev build

I picked @react-native-ai/apple for this build because the Vercel AI SDK interface means my mobile AI code looks identical to my server AI code. If you are still choosing the cross-platform framework itself, I covered this in React Native vs Flutter for startup MVPs , and on-device AI has become a point in React Native's favor: the JavaScript AI ecosystem transfers directly.

Illustration of a React Native phone app running Apple's on-device Foundation Model locally while a paid cloud API path stays unused
Illustration of a React Native phone app running Apple's on-device Foundation Model locally while a paid cloud API path stays unused

Build it: a private review summarizer, step by step

The feature: a screen that takes the last 30 customer reviews stored locally and produces a three-bullet digest plus the top two complaints, entirely on the phone. The stack is Expo SDK 56 with a development build, React Native 0.80+ with the New Architecture enabled, @react-native-ai/apple, the ai package, and zod. About 80 lines of feature code total.

  1. Create a development build. Foundation Models bindings are native code, so Expo Go cannot load them. I explained the distinction in why Expo Go is not what you ship ; eas build --profile development gets you a client that can.

  2. Install the packages.


BASH
npx expo install @react-native-ai/apple ai zod

  1. Gate the feature on availability. Not every device has Apple Intelligence, and users can turn it off in Settings.


TS
import { apple } from '@react-native-ai/apple';

export async function canSummarizeOnDevice(): Promise<boolean> {
  // false on pre-iPhone 15 Pro hardware or when Apple Intelligence is disabled
  return apple.isAvailable();
}

  1. Stream the digest. Streaming matters even more on device than in the cloud, because the user watches every token arrive.


TS
import { apple } from '@react-native-ai/apple';
import { streamText } from 'ai';

export function streamReviewDigest(reviews: string[]) {
  return streamText({
    model: apple(),
    system: 'You summarize customer reviews. Be concrete. No filler.',
    prompt: [
      'Summarize these reviews as 3 bullets, then list the top 2 complaints:',
      ...reviews.map((r, i) => `Review ${i + 1}: ${r}`),
    ].join('\n'),
  });
}

  1. Add structured extraction. Foundation Models supports guided generation natively, so generateObject with a zod schema returns valid JSON instead of a parsing lottery.


TS
import { generateObject } from 'ai';
import { z } from 'zod';

const Digest = z.object({
  sentiment: z.enum(['positive', 'mixed', 'negative']),
  highlights: z.array(z.string()).max(3),
  complaints: z.array(z.string()).max(2),
});

const { object } = await generateObject({
  model: apple(),
  schema: Digest,
  prompt: digestPrompt(reviews),
});

That is the whole feature. No keys in the bundle, no proxy server, no consent dialog about sending customer data to a third party. The pattern, gate then stream then structure, is the same one I now reuse across client features in my mobile app development work.

What numbers can you expect on a real iPhone?

On my iPhone 16 Pro running iOS 26.5, the on-device model decodes at roughly 28 tokens per second, time to first token is about 600ms on short prompts and around 2.5 seconds with 2,000 prompt tokens, and my app's memory footprint grew by about 90MB during generation. The full 30-review digest, roughly 2,300 tokens in and 180 out, completes in about 9 seconds.

Three notes on those numbers. First, the memory figure is small because the weights live in the operating system, not in your process; your app pays only for the session and its KV cache. Second, older hardware is slower but usable: the same digest on an iPhone 15 Pro decoded at 20 to 22 tokens per second. Third, thermals are real: my third consecutive long generation ran about 15 percent slower once the phone was warm. For comparison, a small cloud model returns the same digest in about 3 seconds on good wifi, but the on-device version behaves identically on a plane, in a basement, or on hostile hotel internet.

Where does on-device AI win, and where does it lose?

On-device wins on privacy, offline behavior, and cost: the data never leaves the phone, the feature works in airplane mode, and inference is free at any scale. It loses on raw model quality, context length, and Android fragmentation: a roughly 3B model sits clearly below frontier cloud models for nuanced writing, and Gemini Nano covers only a slice of Android devices.

Where I would use it without hesitation: summarize, classify, extract, rewrite, tag. The review digest works because the task is grounded in text the app already has. Where I would not: open-ended generation the user will publish verbatim, long-document work past the context window, or anything where a wrong answer is expensive, because small models hallucinate more once the prompt stops grounding them.

The Android reality check: Gemini Nano through ML Kit GenAI runs on Pixel 9-class phones and recent Galaxy flagships, and its API differs enough that you write a separate adapter. On iOS the floor is at least clean: Apple Intelligence hardware means iPhone 15 Pro and newer.

My verdict: for summarize, classify, and extract features in consumer apps, on-device is now the default starting point. Reach for the cloud only when quality demands it.

How does the cost compare to cloud inference?

On-device inference is $0 per call; the cloud-vs-device break-even for a summarization feature arrives at the first user.

I mean that literally. In my reputation SaaS, the AI auto-reply pipeline generates around 5,000 review replies a month and the model bill lands near $40 a month. That is fine for a server-side B2B product where I control volume. But a consumer mobile feature scales with users, not with my roadmap: the same per-call economics at 50,000 users means a real monthly bill plus a proxy server, key management, and rate limiting. On-device deletes the line item and the infrastructure around it.

The same logic applies one level up. When I priced cloud retrieval systems in how much it costs to build a RAG chatbot , the build ran $4k to $12k and the number that actually mattered long-term was monthly opex. An on-device feature has almost no opex; what you pay instead is a quality ceiling and an iOS-first skew. When clients in my AI solutions engagements ask device or cloud, I ask back: could a competent intern, given only the text on the screen, do this task well? If yes, start on-device. If it needs world knowledge or publishable prose, go cloud, and Apple's free Private Cloud Compute tier now sits conveniently in between.

Key takeaways

  • WWDC 2026 (June 8-10) added image input, server models, a provider-agnostic language-model protocol, and free Private Cloud Compute for developers under 2M downloads to Foundation Models.

  • React Native has three on-device paths: @react-native-ai/apple for the Vercel AI SDK interface on iOS, react-native-executorch for arbitrary models on both platforms, and Expo modules wrapping system models. All require RN 0.80+ and the New Architecture.

  • On my iPhone 16 Pro: about 28 tokens per second, 600ms time to first token on short prompts, and a 90MB memory bump; a 30-review digest takes about 9 seconds.

  • On-device inference costs $0 per call, so the break-even against cloud arrives at your first user.

  • Default to on-device for summarize, classify, and extract features in consumer apps; use cloud only when quality demands it.

FAQ

Can React Native use Apple Intelligence?

Yes. The @react-native-ai/apple package exposes the Foundation Models framework, the same on-device model that powers Apple Intelligence, through the Vercel AI SDK interface, including streaming and structured output. You need React Native 0.80+ with the New Architecture and an Apple Intelligence capable device, meaning iPhone 15 Pro or newer.

Does on-device AI work in Expo?

Yes, but only in a development build, not Expo Go, because the bindings are native code. With Expo SDK 56, install the package, run a development or EAS build, and the AI SDK calls work exactly as on bare React Native. Expo Go cannot load custom native modules, so it will never work there.

Which iPhones support Apple Foundation Models?

Apple Intelligence hardware: iPhone 15 Pro and every iPhone 16 and 17 model, plus M-series iPads and Macs. Always gate the feature behind a runtime availability check, because users can disable Apple Intelligence in Settings, and fall back to a cloud call or hide the feature entirely when it returns false.

Is the on-device model good enough for production?

For grounded tasks, yes. A roughly 3B model with guided generation handles summarize, classify, and extract reliably when the answer lives in the prompt. It is not good enough for open-ended prose users will publish, or for long documents. Test against your real data before committing, not against demo inputs.

Working on something like this?

I build web apps, AI features, and mobile products for clients. If this article matches a problem you have, tell me about it.

Start a conversation
HS

Malik Hamza Shabbir · Full-Stack & AI Engineer

I build full-stack and AI products solo: a reputation SaaS in production, RAG pipelines, and React Native apps. I write from what I ship, not from documentation summaries.

Related articles