Train, evaluate and deploy custom adapter for the Apple Foundation Model for on-device inference.
In WWDC 2025, Apple introduced the Foundation Models Framework which allows developers to use the built-in LLM in iOS 26, iPadOS 26 and macOS 26 for fast, private and efficient on-device inference.The framework supports using custom adapters to adapt the built-in model to specific tasks. This can make the model more useful at a wider variety of tasks which it may not support well out of the box given its limited size.Datawizz makes it easy to train, evaluate and deploy custom adapters for the Apple Foundation Model. This guide will walk you through the process of training and evaluating a custom adapter using Datawizz, and then using it in your Swift application.
Apple’s on-device models offer several key advantages:
Privacy: All inference happens locally on the device, keeping user data private
Cost Efficiency: No cloud API costs for AI inference
Offline Capability: Works without internet connection
Speed: Optimized for on-device performance with minimal battery drain
However, these models are optimized for efficiency over raw performance. Out of the box, they’re not as capable as larger cloud models like GPT-4o or Claude. This is where custom adapters become crucial.
To put Apple’s Foundation Model performance in perspective, here are benchmark results from MMLU (Massive Multitask Language Understanding) - a set of 15,000 multiple-choice questions across various subjects:
GPT-4o: 83.88% accuracy (but too large for on-device inference)
Meta Llama 3.2 (3B params): 50.7% accuracy
Microsoft Phi 3 Mini (4B params): 59.49% accuracy
Google Gemma 2 (2B params): 55.99% accuracy
Apple Foundation Model: 44.31% accuracy
While the Apple model performs below other small models initially, custom adapters can dramatically improve its performance for specific tasks - often matching or exceeding much larger models.
One common solution for specialized tasks with smaller models is to fine-tune them entirely. However, loading a custom model for each app isn’t feasible - even smaller models can take up multiple gigabytes of space.Adapters offer a lightweight alternative. Instead of training an entire model, you train just a few additional layers (an “adapter”) that loads on top of the base model. This provides a “best of both worlds” solution:
Quality: Adapters can improve performance enough to match much larger models for specific tasks
Efficiency: Adapters are only about 160MB in size, making them practical to bundle with apps
Flexibility: Complex apps can even load multiple adapters for different tasks
To train a custom adapter, you need to collect examples to train it with. We’ve found that if you are already using an LLM today and looking to replace it with on-device inference, a good starting point is using the prompts and responses you already send and receive from that LLM as a starting point. If you are using platforms like Humanloop, Langfuse or Langsmith, you can easily export the LLM logs from these platforms and import them into Datawizz. Learn more about importing logs into Datawizz in our documentation on datasets.Alternatively, if you are calling LLMs like OpenAI or Anthropic directly, you can use Datawizz to record the requests and responses you send and receive from these LLMs. Learn more about collecting LLM logs with Datawizz.
Apple’s guidelines suggest using at least 100-1,000 samples for basic tasks, and at least 5,000 for more complex tasks. The actual amount of data will depend greatly on the specific task you are trying to adapt the model for.Generally, the more data you have the better your adapters will perform. However, there are a couple of things to keep in mind:
Quality over Quantity: It’s better to have a smaller set of high-quality examples than a large set of low-quality examples. Make sure your examples are representative of the task you are trying to adapt the model for.
Diversity: Make sure your examples cover a wide range of scenarios and edge cases. This will help the model generalize better to new inputs.
Relevance: Make sure your examples are relevant to the task you are trying to adapt the model for. If you are adapting the model for a specific domain, make sure your examples are from that domain.
For extremely complex or diverse use cases, it may make sense to train multiple adapters for different sub-tasks or domains. This can help the model perform better on each specific task, but will require more data and effort to maintain.
Before training an adapter, it’s important to establish a baseline by testing the Apple Foundation Model on your specific task. This helps you understand:
Whether the base model is already sufficient for your needs
How much improvement an adapter might provide
What specific areas need the most improvement
Deploy the Apple Foundation Model to the Datawizz Serverless provider in the providers screen
Open it for manual comparison - you can test it alongside other models for side-by-side evaluation
Try various prompts representative of your use case to get a feel for the baseline performance
Before training, ensure you have properly separated your data:
Training Dataset: Used to train the adapter (typically 80% of your data)
Evaluation Dataset: Used to test the adapter’s performance (typically 20% of your data)
This separation ensures you’re testing the adapter on data it hasn’t seen during training, giving you an accurate measure of its real-world performance.
After training your adapter, it’s crucial to evaluate its performance to ensure it’s actually improving over the base model. To ready the adapter for evaluation, in the model page once it has finished training click “Deploy Model” and select “Datawizz Serverless” as the provider. This will deploy your adapter to the Datawizz Serverless provider, making it available for evaluation.
Once you have a well-performing adapter, the final step is integrating it into your iOS application.We’ll start with a simple example view using a model to generate content:
Copy
Ask AI
import SwiftUIimport FoundationModelsstruct ChatView: View { @State private var input: String = "" @State private var response: String? @State private var session = LanguageModelSession() func sendMessage(_ msg: String) { Task { do { let modelResponse = try await session.respond(to: msg) DispatchQueue.main.async { self.response = modelResponse.content } } catch { print("Error: \(error)") } } } var body: some View { VStack { if let response = response { Text(response) } HStack { TextField("Enter your message", text: $input) .onSubmit { sendMessage(input) } } } }}
This code sets up a simple chat interface where users can enter messages and receive responses from the Apple Foundation Model.
To use the adapter in your Swift application, you can bundle the .fmadapter file with your app. Here’s how to do it:
Drag the .fmadapter file into your Xcode project
Ensure it’s included in the app bundle
Use the following code to load the adapter and create a new session with it:
Copy
Ask AI
.task { do { if let assetURL = Bundle.main.url( forResource: "my-adapter", withExtension: "fmadapter") { let adapter = try SystemLanguageModel.Adapter( fileURL: assetURL) let adaptedModel = SystemLanguageModel(adapter: adapter) session = LanguageModelSession(model: adaptedModel) } else { print("Asset not found in the main bundle.") } } catch { print("Error: \(error)") } }
This is not recommended for production apps, as it requires bundling the adapter with the app, which can increase the app size and make updates more complex.
For production apps, it’s better to use Asset Packs to manage your adapters. This allows you to download the adapter at runtime, keeping your app size smaller and allowing for easier updates.See the Apple documentation on Asset Packs for more details on how to implement this.