In WWDC 2025, Apple introduced the Foundation Models Framework which allows developers to use the built-in LLM in iOS 26, iPadOS 26 and macOS 26 for fast, private and efficient on-device inference. The framework supports using custom adapters to adapt the built-in model to specific tasks. This can make the model more useful at a wider variety of tasks which it may not support well out of the box given its limited size. Datawizz makes it easy to train, evaluate and deploy custom adapters for the Apple Foundation Model. This guide will walk you through the process of training and evaluating a custom adapter using Datawizz, and then using it in your Swift application.

Why Use Apple Foundation Models?

Apple’s on-device models offer several key advantages:

Privacy: All inference happens locally on the device, keeping user data private
Cost Efficiency: No cloud API costs for AI inference
Offline Capability: Works without internet connection
Speed: Optimized for on-device performance with minimal battery drain

However, these models are optimized for efficiency over raw performance. Out of the box, they’re not as capable as larger cloud models like GPT-4o or Claude. This is where custom adapters become crucial.

Performance Comparison

To put Apple’s Foundation Model performance in perspective, here are benchmark results from MMLU (Massive Multitask Language Understanding) - a set of 15,000 multiple-choice questions across various subjects:

GPT-4o: 83.88% accuracy (but too large for on-device inference)
Meta Llama 3.2 (3B params): 50.7% accuracy
Microsoft Phi 3 Mini (4B params): 59.49% accuracy
Google Gemma 2 (2B params): 55.99% accuracy
Apple Foundation Model: 44.31% accuracy

While the Apple model performs below other small models initially, custom adapters can dramatically improve its performance for specific tasks - often matching or exceeding much larger models.

Understanding Adapters

What Are Adapters?

One common solution for specialized tasks with smaller models is to fine-tune them entirely. However, loading a custom model for each app isn’t feasible - even smaller models can take up multiple gigabytes of space. Adapters offer a lightweight alternative. Instead of training an entire model, you train just a few additional layers (an “adapter”) that loads on top of the base model. This provides a “best of both worlds” solution:

Quality: Adapters can improve performance enough to match much larger models for specific tasks
Efficiency: Adapters are only about 160MB in size, making them practical to bundle with apps
Flexibility: Complex apps can even load multiple adapters for different tasks

Data Collection

To train a custom adapter, you need to collect examples to train it with. We’ve found that if you are already using an LLM today and looking to replace it with on-device inference, a good starting point is using the prompts and responses you already send and receive from that LLM as a starting point. If you are using platforms like Humanloop, Langfuse or Langsmith, you can easily export the LLM logs from these platforms and import them into Datawizz. Learn more about importing logs into Datawizz in our documentation on datasets.

Alternatively, if you are calling LLMs like OpenAI or Anthropic directly, you can use Datawizz to record the requests and responses you send and receive from these LLMs. Learn more about collecting LLM logs with Datawizz.

Amount of Samples Required

Apple’s guidelines suggest using at least 100-1,000 samples for basic tasks, and at least 5,000 for more complex tasks. The actual amount of data will depend greatly on the specific task you are trying to adapt the model for. Generally, the more data you have the better your adapters will perform. However, there are a couple of things to keep in mind:

Quality over Quantity: It’s better to have a smaller set of high-quality examples than a large set of low-quality examples. Make sure your examples are representative of the task you are trying to adapt the model for.
Diversity: Make sure your examples cover a wide range of scenarios and edge cases. This will help the model generalize better to new inputs.
Relevance: Make sure your examples are relevant to the task you are trying to adapt the model for. If you are adapting the model for a specific domain, make sure your examples are from that domain.

For extremely complex or diverse use cases, it may make sense to train multiple adapters for different sub-tasks or domains. This can help the model perform better on each specific task, but will require more data and effort to maintain.

Evaluating the Vanilla Model

Before training an adapter, it’s important to establish a baseline by testing the Apple Foundation Model on your specific task. This helps you understand:

Whether the base model is already sufficient for your needs
How much improvement an adapter might provide
What specific areas need the most improvement

Deploy the Apple Foundation Model to the Datawizz Serverless provider in the providers screen
Open it for manual comparison - you can test it alongside other models for side-by-side evaluation
Try various prompts representative of your use case to get a feel for the baseline performance

Running Automated Evaluations

For more comprehensive testing, you should run automated evaluations:

Prepare Your Data

Go to the Dataset tab in Datawizz
If you imported logs from another system, they’ll already appear as a dataset
If you used Datawizz to record logs, create a dataset and import your logs
Create an evaluation split by clicking “create split” - 20% is usually sufficient for evaluation

Configure the Evaluation

Navigate to the Evaluation tab and click “New Evaluation”
Select the Apple Foundation Model as the model to evaluate
Choose your evaluation dataset
Select appropriate evaluation functions:
- String equality for exact matches
- LLM-as-judge for more nuanced evaluation
- Custom metrics specific to your use case

Training an Adapter

Once you’ve established your baseline performance, you can begin training a custom adapter to improve the model’s performance on your specific task.

Creating Training and Evaluation Datasets

Before training, ensure you have properly separated your data:

Training Dataset: Used to train the adapter (typically 80% of your data)
Evaluation Dataset: Used to test the adapter’s performance (typically 20% of your data)

This separation ensures you’re testing the adapter on data it hasn’t seen during training, giving you an accurate measure of its real-world performance.

Configuring the Training

Navigate to the Models section and click “New Model”
Choose the Apple Foundation Model as your base model
Select your training dataset
Configure training parameters:

Key Training Parameters

Epochs: Controls how many times the trainer runs over your dataset

More epochs = more training, but risk of overfitting
Apple models typically perform best with 3-5 epochs
Start with 3 epochs and adjust based on results

Learning Rate: Controls how fast the model learns

Higher learning rate = faster learning but potentially less stable
Lower learning rate = more stable but slower convergence
Use default settings initially, then experiment

Best Practices

Run multiple training sessions with different parameters to find optimal settings
Monitor training logs to watch for signs of overfitting or undertraining
Start with defaults and iterate based on evaluation results

Evaluating the Adapter

After training your adapter, it’s crucial to evaluate its performance to ensure it’s actually improving over the base model. To ready the adapter for evaluation, in the model page once it has finished training click “Deploy Model” and select “Datawizz Serverless” as the provider. This will deploy your adapter to the Datawizz Serverless provider, making it available for evaluation.

Running Comparative Evaluations

Return to the Evaluations tab
Select your previous evaluation of the base model
Click “Re-run” and add your newly trained adapter to the benchmark
This will run the same evaluation on both models, allowing direct comparison

Analyzing Results

As results stream in, you should see:

Improved accuracy on your specific task
Better consistency in responses
Enhanced performance on edge cases from your domain

Iterating on Training

If results aren’t satisfactory:

Adjust training parameters (epochs, learning rate)
Improve training data quality or add more examples
Refine evaluation metrics to better capture your needs
Consider training multiple specialized adapters for different aspects of your task

Using the Adapter

Once you have a well-performing adapter, the final step is integrating it into your iOS application. We’ll start with a simple example view using a model to generate content:

import SwiftUI
import FoundationModels

struct ChatView: View {
    @State private var input: String = ""
    @State private var response: String?
    
    @State private var session = LanguageModelSession()
    
    func sendMessage(_ msg: String) {
        Task {
            do {
                let modelResponse = try await session.respond(to: msg)
                DispatchQueue.main.async {
                  self.response = modelResponse.content
                }
            } catch {
                print("Error: \(error)")
            }
        }
    }
    
    var body: some View {
        VStack {
            if let response = response {
                Text(response)
            }
            HStack {
                TextField("Enter your message", text: $input)
                    .onSubmit {
                        sendMessage(input)
                    }
            }
        }
    }
}

This code sets up a simple chat interface where users can enter messages and receive responses from the Apple Foundation Model.

Downloading the Adapter

Go to your trained model in Datawizz
Download the .fmadapter file
This file contains your custom adapter weights

Integration in Swift

For Testing (Bundle with App)

To use the adapter in your Swift application, you can bundle the .fmadapter file with your app. Here’s how to do it:

Drag the .fmadapter file into your Xcode project
Ensure it’s included in the app bundle
Use the following code to load the adapter and create a new session with it:

.task {
        do {
            if let assetURL = Bundle.main.url(
                              forResource: "my-adapter", 
                              withExtension: "fmadapter") {
            let adapter = try SystemLanguageModel.Adapter(
                                      fileURL: assetURL)
            let adaptedModel = SystemLanguageModel(adapter: adapter)
            session = LanguageModelSession(model: adaptedModel)
          } else {
            print("Asset not found in the main bundle.")
          }
        } catch {
            print("Error: \(error)")
        }
    }

This is not recommended for production apps, as it requires bundling the adapter with the app, which can increase the app size and make updates more complex.

For Production (Asset Packs - Recommended)

For production apps, it’s better to use Asset Packs to manage your adapters. This allows you to download the adapter at runtime, keeping your app size smaller and allowing for easier updates. See the Apple documentation on Asset Packs for more details on how to implement this.

Get Started

Product

Training Apple Foundation Model Adapters

Why Use Apple Foundation Models?

Performance Comparison

Understanding Adapters

What Are Adapters?

Data Collection

Amount of Samples Required

Evaluating the Vanilla Model

Running Automated Evaluations

Prepare Your Data

Configure the Evaluation

Training an Adapter

Creating Training and Evaluation Datasets

Configuring the Training

Key Training Parameters

Best Practices

Evaluating the Adapter

Running Comparative Evaluations

Analyzing Results

Iterating on Training

Using the Adapter

Downloading the Adapter

Integration in Swift

For Testing (Bundle with App)

For Production (Asset Packs - Recommended)

Get Started

Product

​Why Use Apple Foundation Models?

​Performance Comparison

​Understanding Adapters

​What Are Adapters?

​Data Collection

​Amount of Samples Required

​Evaluating the Vanilla Model

​Running Automated Evaluations

​Prepare Your Data

​Configure the Evaluation

​Training an Adapter

​Creating Training and Evaluation Datasets

​Configuring the Training

​Key Training Parameters

​Best Practices

​Evaluating the Adapter

​Running Comparative Evaluations

​Analyzing Results

​Iterating on Training

​Using the Adapter

​Downloading the Adapter

​Integration in Swift

​For Testing (Bundle with App)

​For Production (Asset Packs - Recommended)

Why Use Apple Foundation Models?

Performance Comparison

Understanding Adapters

What Are Adapters?

Data Collection

Amount of Samples Required

Evaluating the Vanilla Model

Running Automated Evaluations

Prepare Your Data

Configure the Evaluation

Training an Adapter

Creating Training and Evaluation Datasets

Configuring the Training

Key Training Parameters

Best Practices

Evaluating the Adapter

Running Comparative Evaluations

Analyzing Results

Iterating on Training

Using the Adapter

Downloading the Adapter

Integration in Swift

For Testing (Bundle with App)

For Production (Asset Packs - Recommended)