I want a real local pipeline: image in, structured JSON out, no cloud dependency. Optimized to run Metal / ANE or whatever apple exposes ?
My goal is to infer a json-struct of variables from image using FM. Sounds simple, but it ain't so as of May 2026.
And I really want it.
After doing a bit of research, llama.cpp provides optimization and all the necesary low level work. I just need to make swift bindings that are worth the trouble...
This is a complete tutorial on how i did it. i will use something like quickbooks / wise.com receipt capture example to make it real and safe.
Bon courage!
What We’re Building
A local inference stack with clear separation of concerns:
-
llama.cppas an iOS XCFramework (vendor/llama.cpp/build-apple/llama.xcframework) - Objective-C++ bridge (
Controllers/LlamaBridge.h,Controllers/LlamaBridge.mm) - Swift-facing API in
Controllers/LLMFunctionsController.swift - Typed decode API:
let result: ReceiptResult = try await LLMFunctionsController.shared.predict(
image: receiptImage,
prompt: "Extract vendor and total.",
as: ReceiptResult.self
)
That is your chassis. Keep UI concerns away from inference state.
Prerequisites
- Latest XCode. Always.
- target iOS project
- Models present in
Models/. I am targeting iPad Pro M5 12GB onboard unified memory, so 4B in Q4 quantization should be snappy. But you can go find yourself a better model on https://huggingface.co/unsloth/Qwen3.5-4B-GGUF or another architecture as per your needs. For now, i believe gemma4 is lacking and apple foundation models are just non-functional in terms of image inference, so i go with qwen. Apache2.0 license is very nice to have; and otherwise it scores well on open-world benchmarks. so here we go:Qwen3.5-4B-Q4_K_M.ggufmmproj-F16.gguf
1) Add and Build llama.cpp for iOS Only
I am adding a git submodule under vendor/ , as i want to stay on top of latest releases llama.cpp makes in future.
vendor/llama.cpp
But i gotta patch build-xcframework.sh to:
- build iOS device + iOS simulator only and skip tvOS compilations, etc.
- And importantly! to include
mtmdin the final framework. Full
Run:
cd vendor/llama.cpp
./build-xcframework.sh
You should see:
Creating iOS-only XCFramework...
xcframework successfully written out to: .../vendor/llama.cpp/build-apple/llama.xcframework
Confirm mtmd symbols (critical):
nm -gU build-apple/llama.xcframework/ios-arm64/llama.framework/llama | rg mtmd_
If mtmd_ symbols are missing, your bridge cannot do real multimodal eval.
2) Link XCFramework in Xcode
The project is wired to:
vendor/llama.cpp/build-apple/llama.xcframework
Configured in:
project.ymlyourproject.xcodeproj/project.pbxproj
For this build style, static-style link behavior is preferred (cleaner than dynamic embedding in many app setups).
3) Add ObjC++ Bridge
Files:
Controllers/LlamaBridge.hControllers/LlamaBridge.mm
Responsibilities:
- one-time backend init (
llama_backend_init) - one-time model + projector load:
- text model (
Qwen3.5-4B-Q4_K_M.gguf) - projector (
mmproj-F16.gguf) viamtmd_init_from_file
- text model (
- image conversion (
UIImage -> RGB) - multimodal tokenization (
mtmd_tokenize) - multimodal chunk evaluation (
mtmd_helper_eval_chunks) - token-by-token generation via
llama_sampler_* - JSON validation before returning
This is your engine room. Swift should not manage raw llama/mtmd internals directly.
4) Expose Bridge to Swift
Bridging header:
Bridging-Header.h
With:
#import "Controllers/LlamaBridge.h"
Build setting added:
SWIFT_OBJC_BRIDGING_HEADER = Bridging-Header.h
in:
project.ymlyourproject.xcodeproj/project.pbxproj
5) Swift API in LLMFunctionsController.swift
Everything Swift-side for prediction stays in:
Controllers/LLMFunctionsController.swift
Implemented pieces:
-
StructuredOutputprotocol (Codable-based) predict<T: StructuredOutput>(image:prompt:as:) async throws -> T- lightweight JSON schema generation
- model path resolution (
Bundlefirst, cwd fallback) - bridge invocation with streaming token callback
- final decode to typed model
This gives you one clean call-site and typed results.
6) Use It
struct ReceiptResult: Codable, StructuredOutput {
let vendor: String
let total: Double
}
let parsed: ReceiptResult = try await LLMFunctionsController.shared.predict(
image: receiptImage,
prompt: "Extract receipt vendor and total as JSON.",
as: ReceiptResult.self
)
If output is malformed JSON, the controller throws with captured raw output context. Neat.
8) Troubleshooting
mtmd header import fails
Verify framework headers include:
mtmd.h-
mtmd-helper.h
nm -gU .../llama.framework/llama | rg mtmd_
10) Complete File Contents
Below are the full file contents used in this implementation so you can copy-paste end-to-end (or ask your agent to do so)
Bridging-Header.h
#import "Controllers/LlamaBridge.h"
Controllers/LlamaBridge.h
#import <Foundation/Foundation.h>
#import <UIKit/UIKit.h>
NS_ASSUME_NONNULL_BEGIN
typedef void (^LlamaTokenHandler)(NSString *token);
@interface LlamaBridge : NSObject
+ (instancetype)shared;
- (BOOL)configureWithModelPath:(NSString *)modelPath
mmprojPath:(NSString *)mmprojPath
error:(NSError * _Nullable * _Nullable)error;
- (NSString * _Nullable)predictWithImage:(UIImage *)image
prompt:(NSString *)prompt
jsonSchema:(NSString *)jsonSchema
maxTokens:(NSInteger)maxTokens
tokenHandler:(LlamaTokenHandler _Nullable)tokenHandler
error:(NSError * _Nullable * _Nullable)error;
@end
NS_ASSUME_NONNULL_END
Controllers/LlamaBridge.mm
#import "LlamaBridge.h"
#import <vector>
#import <mutex>
#import <llama/llama.h>
#import <llama/mtmd.h>
#import <llama/mtmd-helper.h>
static NSString * const LlamaBridgeErrorDomain = @"LlamaBridgeErrorDomain";
namespace {
struct RGBTensor {
int width = 0;
int height = 0;
std::vector<uint8_t> rgb;
};
static RGBTensor rgbTensorFromImage(UIImage *image, int targetWidth, int targetHeight) {
RGBTensor tensor;
tensor.width = targetWidth;
tensor.height = targetHeight;
tensor.rgb.resize(static_cast<size_t>(3 * targetWidth * targetHeight), 0);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
std::vector<uint8_t> pixels(static_cast<size_t>(targetWidth * targetHeight * 4), 0);
CGContextRef context = CGBitmapContextCreate(
pixels.data(),
targetWidth,
targetHeight,
8,
targetWidth * 4,
colorSpace,
kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big
);
CGColorSpaceRelease(colorSpace);
if (!context) {
return tensor;
}
CGContextSetInterpolationQuality(context, kCGInterpolationHigh);
CGContextDrawImage(context, CGRectMake(0, 0, targetWidth, targetHeight), image.CGImage);
CGContextRelease(context);
for (int y = 0; y < targetHeight; ++y) {
for (int x = 0; x < targetWidth; ++x) {
const size_t pixelIndex = static_cast<size_t>((y * targetWidth + x) * 4);
const size_t out = static_cast<size_t>((y * targetWidth + x) * 3);
tensor.rgb[out + 0] = pixels[pixelIndex + 0];
tensor.rgb[out + 1] = pixels[pixelIndex + 1];
tensor.rgb[out + 2] = pixels[pixelIndex + 2];
}
}
return tensor;
}
}
@interface LlamaBridge () {
std::mutex _lock;
struct llama_model *_model;
struct llama_context *_context;
struct mtmd_context *_mctx;
BOOL _isConfigured;
}
@end
@implementation LlamaBridge
+ (instancetype)shared {
static LlamaBridge *instance;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
instance = [[LlamaBridge alloc] init];
});
return instance;
}
- (instancetype)init {
self = [super init];
if (self) {
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
llama_backend_init();
});
_model = nullptr;
_context = nullptr;
_mctx = nullptr;
_isConfigured = NO;
}
return self;
}
- (void)dealloc {
if (_context != nullptr) {
llama_free(_context);
_context = nullptr;
}
if (_model != nullptr) {
llama_model_free(_model);
_model = nullptr;
}
if (_mctx != nullptr) {
mtmd_free(_mctx);
_mctx = nullptr;
}
}
- (BOOL)configureWithModelPath:(NSString *)modelPath
mmprojPath:(NSString *)mmprojPath
error:(NSError * _Nullable * _Nullable)error {
std::lock_guard<std::mutex> guard(_lock);
if (_isConfigured) {
return YES;
}
if (![[NSFileManager defaultManager] fileExistsAtPath:modelPath] ||
![[NSFileManager defaultManager] fileExistsAtPath:mmprojPath]) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:1 userInfo:@{
NSLocalizedDescriptionKey: @"Model or projector file not found."
}];
}
return NO;
}
struct llama_model_params modelParams = llama_model_default_params();
modelParams.n_gpu_layers = 999;
_model = llama_model_load_from_file(modelPath.UTF8String, modelParams);
if (_model == nullptr) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:2 userInfo:@{
NSLocalizedDescriptionKey: @"Failed to load main model GGUF."
}];
}
return NO;
}
struct mtmd_context_params mtmdParams = mtmd_context_params_default();
mtmdParams.use_gpu = true;
mtmdParams.n_threads = MAX(1, (int32_t)[NSProcessInfo processInfo].activeProcessorCount - 1);
_mctx = mtmd_init_from_file(mmprojPath.UTF8String, _model, mtmdParams);
if (_mctx == nullptr) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:3 userInfo:@{
NSLocalizedDescriptionKey: @"Failed to load vision projector GGUF."
}];
}
return NO;
}
struct llama_context_params contextParams = llama_context_default_params();
contextParams.n_ctx = 8192;
contextParams.n_threads = MAX(1, (int32_t)[NSProcessInfo processInfo].activeProcessorCount - 1);
contextParams.n_threads_batch = contextParams.n_threads;
_context = llama_init_from_model(_model, contextParams);
if (_context == nullptr) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:4 userInfo:@{
NSLocalizedDescriptionKey: @"Failed to initialize llama context."
}];
}
return NO;
}
_isConfigured = YES;
return YES;
}
- (NSString * _Nullable)predictWithImage:(UIImage *)image
prompt:(NSString *)prompt
jsonSchema:(NSString *)jsonSchema
maxTokens:(NSInteger)maxTokens
tokenHandler:(LlamaTokenHandler _Nullable)tokenHandler
error:(NSError * _Nullable * _Nullable)error {
std::lock_guard<std::mutex> guard(_lock);
if (!_isConfigured || _model == nullptr || _context == nullptr || _mctx == nullptr) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:5 userInfo:@{
NSLocalizedDescriptionKey: @"LlamaBridge is not configured."
}];
}
return nil;
}
RGBTensor rgb = rgbTensorFromImage(image, 448, 448);
std::string systemPrompt = "You are a strict JSON extractor. Return ONLY valid JSON matching schema.";
const char *marker = mtmd_default_marker();
std::string fullPrompt = systemPrompt + "\nSCHEMA:\n" + jsonSchema.UTF8String +
"\nUSER_PROMPT:\n" + prompt.UTF8String +
"\nIMAGE:\n" + std::string(marker) +
"\nJSON:";
llama_memory_clear(llama_get_memory(_context), true);
mtmd_bitmap *bitmap = mtmd_bitmap_init((uint32_t)rgb.width, (uint32_t)rgb.height, rgb.rgb.data());
if (bitmap == nullptr) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:6 userInfo:@{
NSLocalizedDescriptionKey: @"Failed to create mtmd bitmap."
}];
}
return nil;
}
const mtmd_bitmap *bitmaps[1] = { bitmap };
mtmd_input_text textInput = { fullPrompt.c_str(), true, true };
mtmd_input_chunks *chunks = mtmd_input_chunks_init();
int32_t tokenized = mtmd_tokenize(_mctx, chunks, &textInput, bitmaps, 1);
if (tokenized != 0) {
mtmd_input_chunks_free(chunks);
mtmd_bitmap_free(bitmap);
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:7 userInfo:@{
NSLocalizedDescriptionKey: @"mtmd_tokenize failed."
}];
}
return nil;
}
llama_pos nPast = 0;
int32_t evalStatus = mtmd_helper_eval_chunks(
_mctx,
_context,
chunks,
0,
0,
(int32_t)llama_n_batch(_context),
true,
&nPast
);
mtmd_input_chunks_free(chunks);
mtmd_bitmap_free(bitmap);
if (evalStatus != 0) {
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:8 userInfo:@{
NSLocalizedDescriptionKey: @"mtmd prompt evaluation failed."
}];
}
return nil;
}
const struct llama_vocab *vocab = llama_model_get_vocab(_model);
struct llama_sampler *sampler = llama_sampler_chain_init(llama_sampler_chain_default_params());
llama_sampler_chain_add(sampler, llama_sampler_init_top_k(40));
llama_sampler_chain_add(sampler, llama_sampler_init_top_p(0.95f, 1));
llama_sampler_chain_add(sampler, llama_sampler_init_temp(0.2f));
llama_sampler_chain_add(sampler, llama_sampler_init_dist(1337));
std::string generated;
generated.reserve(2048);
const int tokenLimit = (int)MAX(64, maxTokens);
for (int i = 0; i < tokenLimit; ++i) {
llama_token token = llama_sampler_sample(sampler, _context, -1);
if (llama_vocab_is_eog(vocab, token)) {
break;
}
llama_sampler_accept(sampler, token);
char pieceBuf[256];
int pieceLen = llama_token_to_piece(vocab, token, pieceBuf, sizeof(pieceBuf), 0, true);
if (pieceLen > 0) {
generated.append(pieceBuf, pieceLen);
if (tokenHandler) {
tokenHandler([[NSString alloc] initWithBytes:pieceBuf length:(NSUInteger)pieceLen encoding:NSUTF8StringEncoding] ?: @"");
}
}
struct llama_batch nextBatch = llama_batch_get_one(&token, 1);
if (llama_decode(_context, nextBatch) < 0) {
break;
}
}
llama_sampler_free(sampler);
NSString *raw = [NSString stringWithUTF8String:generated.c_str()] ?: @"";
NSData *rawData = [raw dataUsingEncoding:NSUTF8StringEncoding];
if (rawData != nil && [NSJSONSerialization JSONObjectWithData:rawData options:0 error:nil] != nil) {
return raw;
}
NSRange start = [raw rangeOfString:@"{"];
NSRange end = [raw rangeOfString:@"}" options:NSBackwardsSearch];
if (start.location != NSNotFound && end.location != NSNotFound && end.location > start.location) {
NSRange span = NSMakeRange(start.location, end.location - start.location + 1);
NSString *candidate = [raw substringWithRange:span];
NSData *candidateData = [candidate dataUsingEncoding:NSUTF8StringEncoding];
if (candidateData != nil && [NSJSONSerialization JSONObjectWithData:candidateData options:0 error:nil] != nil) {
return candidate;
}
}
if (error) {
*error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:9 userInfo:@{
NSLocalizedDescriptionKey: @"Model output is not valid JSON.",
@"rawOutput": raw
}];
}
return nil;
}
@end
Controllers/LLMFunctionsController.swift
import Foundation
public protocol StructuredOutput: Codable {}
extension StructuredOutput {}
@MainActor
final class LLMFunctionsController {
static let shared = LLMFunctionsController()
private let urlSession: URLSession
private let encoder: JSONEncoder
private let decoder: JSONDecoder
private let llamaBridge = LlamaBridge.shared()
private init() {
encoder = JSONEncoder()
decoder = JSONDecoder()
}
func predict<T: StructuredOutput>(
image: UIImage,
prompt: String,
as type: T.Type
) async throws -> T {
let schema = try JSONSchemaGenerator.schema(for: type)
let json = try await predict(image: image, prompt: prompt, schema: schema)
let data = Data(json.utf8)
return try decoder.decode(T.self, from: data)
}
private func predict(
image: UIImage,
prompt: String,
schema: String
) async throws -> String {
let modelPath = try resolveModelPath(fileName: "Qwen3.5-4B-Q4_K_M.gguf")
let mmprojPath = try resolveModelPath(fileName: "mmproj-F16.gguf")
return try await Task.detached(priority: .userInitiated) { [llamaBridge] in
var configureError: NSError?
let configured = llamaBridge.configure(withModelPath: modelPath, mmprojPath: mmprojPath, error: &configureError)
guard configured else {
throw FunctionsError.modelNotConfigured(configureError?.localizedDescription ?? "Unknown configuration error")
}
var streamBuffer = ""
var inferenceError: NSError?
let json = llamaBridge.predict(
with: image,
prompt: prompt,
jsonSchema: schema,
maxTokens: 512,
tokenHandler: { token in
streamBuffer += token
},
error: &inferenceError
)
if let json {
return json
}
if let inferenceError {
let raw = (inferenceError.userInfo["rawOutput"] as? String) ?? streamBuffer
throw FunctionsError.invalidModelOutput(raw.isEmpty ? inferenceError.localizedDescription : raw)
}
throw FunctionsError.invalidModelOutput(streamBuffer)
}.value
}
private func resolveModelPath(fileName: String) throws -> String {
if let bundled = Bundle.main.url(forResource: fileName, withExtension: nil, subdirectory: "Models") {
return bundled.path
}
let cwdPath = URL(fileURLWithPath: FileManager.default.currentDirectoryPath)
.appendingPathComponent("Models", isDirectory: true)
.appendingPathComponent(fileName)
.path
if FileManager.default.fileExists(atPath: cwdPath) {
return cwdPath
}
throw FunctionsError.modelNotConfigured("Missing \(fileName) in app bundle `Models/` and working directory.")
}
}
private extension JSONEncoder {
func encodeToDict<T: Encodable>(_ value: T) throws -> [String: Any] {
let data = try self.encode(value)
guard let obj = try JSONSerialization.jsonObject(with: data) as? [String: Any] else {
throw FunctionsError.decoding(NSError(domain: "Encoding", code: -1, userInfo: nil))
}
return obj
}
}
private enum JSONSchemaGenerator {
static func schema<T: StructuredOutput>(for type: T.Type) throws -> String {
let typeName = String(describing: type)
let jsonSchema: [String: Any] = [
"$schema": "http://json-schema.org/draft-07/schema#",
"title": typeName,
"type": "object",
"additionalProperties": true,
"description": "Return JSON strictly decodable to \(typeName)."
]
let data = try JSONSerialization.data(withJSONObject: jsonSchema, options: [.sortedKeys])
guard let schema = String(data: data, encoding: .utf8) else {
throw FunctionsError.decoding(NSError(domain: "SchemaEncoding", code: -1, userInfo: nil))
}
return schema
}
}
#!/usr/bin/env bash
#
# Options
IOS_MIN_OS_VERSION=16.4
MACOS_MIN_OS_VERSION=13.3
VISIONOS_MIN_OS_VERSION=1.0
TVOS_MIN_OS_VERSION=16.4
BUILD_SHARED_LIBS=OFF
LLAMA_BUILD_EXAMPLES=OFF
LLAMA_BUILD_TOOLS=ON
LLAMA_BUILD_TESTS=OFF
LLAMA_BUILD_SERVER=OFF
GGML_METAL=ON
GGML_METAL_EMBED_LIBRARY=ON
GGML_BLAS_DEFAULT=ON
GGML_METAL_USE_BF16=ON
GGML_OPENMP=OFF
COMMON_C_FLAGS="-Wno-macro-redefined -Wno-shorten-64-to-32 -Wno-unused-command-line-argument -g"
COMMON_CXX_FLAGS="-Wno-macro-redefined -Wno-shorten-64-to-32 -Wno-unused-command-line-argument -g"
# Common options for all builds
COMMON_CMAKE_ARGS=(
-DCMAKE_XCODE_ATTRIBUTE_CODE_SIGNING_REQUIRED=NO
-DCMAKE_XCODE_ATTRIBUTE_CODE_SIGN_IDENTITY=""
-DCMAKE_XCODE_ATTRIBUTE_CODE_SIGNING_ALLOWED=NO
-DCMAKE_XCODE_ATTRIBUTE_DEBUG_INFORMATION_FORMAT="dwarf-with-dsym"
-DCMAKE_XCODE_ATTRIBUTE_GCC_GENERATE_DEBUGGING_SYMBOLS=YES
-DCMAKE_XCODE_ATTRIBUTE_COPY_PHASE_STRIP=NO
-DCMAKE_XCODE_ATTRIBUTE_STRIP_INSTALLED_PRODUCT=NO
-DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
-DBUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}
-DLLAMA_BUILD_EXAMPLES=${LLAMA_BUILD_EXAMPLES}
-DLLAMA_BUILD_TOOLS=${LLAMA_BUILD_TOOLS}
-DLLAMA_BUILD_TESTS=${LLAMA_BUILD_TESTS}
-DLLAMA_BUILD_SERVER=${LLAMA_BUILD_SERVER}
-DGGML_METAL_EMBED_LIBRARY=${GGML_METAL_EMBED_LIBRARY}
-DGGML_BLAS_DEFAULT=${GGML_BLAS_DEFAULT}
-DGGML_METAL=${GGML_METAL}
-DGGML_METAL_USE_BF16=${GGML_METAL_USE_BF16}
-DGGML_NATIVE=OFF
-DGGML_OPENMP=${GGML_OPENMP}
)
check_required_tool() {
local tool=$1
local install_message=$2
if ! command -v $tool &> /dev/null; then
echo "Error: $tool is required but not found."
echo "$install_message"
exit 1
fi
}
echo "Checking for required tools..."
check_required_tool "cmake" "Please install CMake 3.28.0 or later (brew install cmake)"
check_required_tool "xcrun" "Please install Xcode and Xcode Command Line Tools (xcode-select --install)"
XCODE_VERSION=$(xcrun xcodebuild -version 2>/dev/null | head -n1 | awk '{ print $2 }')
MAJOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f1)
MINOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f2)
echo "Detected Xcode version: $XCODE_VERSION"
set -e
## Clean up previous builds
rm -rf build-apple
rm -rf build-ios-sim
rm -rf build-ios-device
rm -rf build-macos
rm -rf build-visionos
rm -rf build-visionos-sim
rm -rf build-tvos-sim
rm -rf build-tvos-device
# Setup the xcframework build directory structure
setup_framework_structure() {
local build_dir=$1
local min_os_version=$2
local platform=$3 # "ios", "macos", "visionos", or "tvos"
local framework_name="llama"
echo "Creating ${platform}-style framework structure for ${build_dir}"
if [[ "$platform" == "macos" ]]; then
# macOS versioned structure uses versioned directories
mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Headers
mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Modules
mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Resources
# Create symbolic links
ln -sf A ${build_dir}/framework/${framework_name}.framework/Versions/Current
ln -sf Versions/Current/Headers ${build_dir}/framework/${framework_name}.framework/Headers
ln -sf Versions/Current/Modules ${build_dir}/framework/${framework_name}.framework/Modules
ln -sf Versions/Current/Resources ${build_dir}/framework/${framework_name}.framework/Resources
ln -sf Versions/Current/${framework_name} ${build_dir}/framework/${framework_name}.framework/${framework_name}
# Set header and module paths
local header_path=${build_dir}/framework/${framework_name}.framework/Versions/A/Headers/
local module_path=${build_dir}/framework/${framework_name}.framework/Versions/A/Modules/
else
# iOS/VisionOS/tvOS use a flat structure
mkdir -p ${build_dir}/framework/${framework_name}.framework/Headers
mkdir -p ${build_dir}/framework/${framework_name}.framework/Modules
# Remove any existing structure to ensure clean build
rm -rf ${build_dir}/framework/${framework_name}.framework/Versions
# Set header and module paths
local header_path=${build_dir}/framework/${framework_name}.framework/Headers/
local module_path=${build_dir}/framework/${framework_name}.framework/Modules/
fi
# Copy all required headers (common for all platforms)
cp include/llama.h ${header_path}
cp ggml/include/ggml.h ${header_path}
cp ggml/include/ggml-opt.h ${header_path}
cp ggml/include/ggml-alloc.h ${header_path}
cp ggml/include/ggml-backend.h ${header_path}
cp ggml/include/ggml-metal.h ${header_path}
cp ggml/include/ggml-cpu.h ${header_path}
cp ggml/include/ggml-blas.h ${header_path}
cp ggml/include/gguf.h ${header_path}
cp tools/mtmd/mtmd.h ${header_path}
cp tools/mtmd/mtmd-helper.h ${header_path}
# Create module map (common for all platforms)
cat > ${module_path}module.modulemap << EOF
framework module llama {
header "llama.h"
header "ggml.h"
header "ggml-alloc.h"
header "ggml-backend.h"
header "ggml-metal.h"
header "ggml-cpu.h"
header "ggml-blas.h"
header "gguf.h"
header "mtmd.h"
header "mtmd-helper.h"
link "c++"
link framework "Accelerate"
link framework "Metal"
link framework "Foundation"
export *
}
EOF
# Platform-specific settings for Info.plist
local platform_name=""
local sdk_name=""
local supported_platform=""
case "$platform" in
"ios")
platform_name="iphoneos"
sdk_name="iphoneos${min_os_version}"
supported_platform="iPhoneOS"
local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
local device_family=' <key>UIDeviceFamily</key>
<array>
<integer>1</integer>
<integer>2</integer>
</array>'
;;
"macos")
platform_name="macosx"
sdk_name="macosx${min_os_version}"
supported_platform="MacOSX"
local plist_path="${build_dir}/framework/${framework_name}.framework/Versions/A/Resources/Info.plist"
local device_family=""
;;
"visionos")
platform_name="xros"
sdk_name="xros${min_os_version}"
supported_platform="XRPlatform"
local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
local device_family=""
;;
"tvos")
platform_name="appletvos"
sdk_name="appletvos${min_os_version}"
supported_platform="AppleTVOS"
local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
local device_family=' <key>UIDeviceFamily</key>
<array>
<integer>3</integer>
</array>'
;;
esac
# Create Info.plist
cat > ${plist_path} << EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>en</string>
<key>CFBundleExecutable</key>
<string>llama</string>
<key>CFBundleIdentifier</key>
<string>org.ggml.llama</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>llama</string>
<key>CFBundlePackageType</key>
<string>FMWK</string>
<key>CFBundleShortVersionString</key>
<string>1.0</string>
<key>CFBundleVersion</key>
<string>1</string>
<key>MinimumOSVersion</key>
<string>${min_os_version}</string>
<key>CFBundleSupportedPlatforms</key>
<array>
<string>${supported_platform}</string>
</array>${device_family}
<key>DTPlatformName</key>
<string>${platform_name}</string>
<key>DTSDKName</key>
<string>${sdk_name}</string>
</dict>
</plist>
EOF
}
# Create dynamic libraries from static libraries.
combine_static_libraries() {
local build_dir="$1"
local release_dir="$2"
local platform="$3" # "ios", "macos", "visionos", or "tvos"
local is_simulator="$4"
local base_dir="$(pwd)"
local framework_name="llama"
# Determine output path based on platform
local output_lib=""
if [[ "$platform" == "macos" ]]; then
# macOS uses versioned structure
output_lib="${build_dir}/framework/${framework_name}.framework/Versions/A/${framework_name}"
else
# iOS, visionOS, and tvOS use a directory flat structure
output_lib="${build_dir}/framework/${framework_name}.framework/${framework_name}"
fi
local libs=(
"${base_dir}/${build_dir}/src/${release_dir}/libllama.a"
"${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml.a"
"${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml-base.a"
"${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml-cpu.a"
"${base_dir}/${build_dir}/ggml/src/ggml-metal/${release_dir}/libggml-metal.a"
"${base_dir}/${build_dir}/ggml/src/ggml-blas/${release_dir}/libggml-blas.a"
"${base_dir}/${build_dir}/tools/mtmd/${release_dir}/libmtmd.a"
)
# Create temporary directory for processing
local temp_dir="${base_dir}/${build_dir}/temp"
mkdir -p "${temp_dir}"
# Since we have multiple architectures libtool will find object files that do not
# match the target architecture. We suppress these warnings.
xcrun libtool -static -o "${temp_dir}/combined.a" "${libs[@]}" 2> /dev/null
# Determine SDK, architectures, and install_name based on platform and simulator flag.
local sdk=""
local archs=""
local min_version_flag=""
local install_name=""
case "$platform" in
"ios")
if [[ "$is_simulator" == "true" ]]; then
sdk="iphonesimulator"
archs="arm64 x86_64"
min_version_flag="-mios-simulator-version-min=${IOS_MIN_OS_VERSION}"
else
sdk="iphoneos"
archs="arm64"
min_version_flag="-mios-version-min=${IOS_MIN_OS_VERSION}"
fi
install_name="@rpath/llama.framework/llama"
;;
"macos")
sdk="macosx"
archs="arm64 x86_64"
min_version_flag="-mmacosx-version-min=${MACOS_MIN_OS_VERSION}"
install_name="@rpath/llama.framework/Versions/Current/llama"
;;
"visionos")
if [[ "$is_simulator" == "true" ]]; then
sdk="xrsimulator"
archs="arm64 x86_64"
min_version_flag="-mtargetos=xros${VISIONOS_MIN_OS_VERSION}-simulator"
else
sdk="xros"
archs="arm64"
min_version_flag="-mtargetos=xros${VISIONOS_MIN_OS_VERSION}"
fi
# Use flat structure for visionOS, same as iOS
install_name="@rpath/llama.framework/llama"
;;
"tvos")
if [[ "$is_simulator" == "true" ]]; then
sdk="appletvsimulator"
archs="arm64 x86_64"
min_version_flag="-mtvos-simulator-version-min=${TVOS_MIN_OS_VERSION}"
else
sdk="appletvos"
archs="arm64"
min_version_flag="-mtvos-version-min=${TVOS_MIN_OS_VERSION}"
fi
install_name="@rpath/llama.framework/llama"
;;
esac
# Build architecture flags
local arch_flags=""
for arch in $archs; do
arch_flags+=" -arch $arch"
done
# Create dynamic library
echo "Creating dynamic library for ${platform}."
xcrun -sdk $sdk clang++ -dynamiclib \
-isysroot $(xcrun --sdk $sdk --show-sdk-path) \
$arch_flags \
$min_version_flag \
-Wl,-force_load,"${temp_dir}/combined.a" \
-framework Foundation -framework Metal -framework Accelerate \
-install_name "$install_name" \
-o "${base_dir}/${output_lib}"
# Platform-specific post-processing for device builds
if [[ "$is_simulator" == "false" ]]; then
if xcrun -f vtool &>/dev/null; then
case "$platform" in
"ios")
echo "Marking binary as a framework binary for iOS..."
xcrun vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
;;
"visionos")
echo "Marking binary as a framework binary for visionOS..."
if [[ "$MAJOR_VERSION" -gt 16 ]] || [[ "$MAJOR_VERSION" -eq 16 && "$MINOR_VERSION" -gt 2 ]]; then
echo "Xcode version greater than 16.2, using visionOS."
VISION_OS_BUILD_VERSION="visionos"
else
echo "Xcode version less than or equal to 16.2, using xros."
VISION_OS_BUILD_VERSION="xros"
fi
xcrun vtool -set-build-version ${VISION_OS_BUILD_VERSION} ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
;;
"tvos")
echo "Marking binary as a framework binary for tvOS..."
xcrun vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
;;
esac
else
echo "Warning: vtool not found. Binary may not pass App Store validation."
fi
fi
echo "Creating properly formatted dSYM..."
# Create a separate directory for dSYMs for all platforms
mkdir -p "${base_dir}/${build_dir}/dSYMs"
# iOS and visionOS style dSYM (flat structure)
if [[ "$platform" == "ios" || "$platform" == "visionos" || "$platform" == "tvos" ]]; then
# Generate dSYM in the dSYMs directory
xcrun dsymutil "${base_dir}/${output_lib}" -o "${base_dir}/${build_dir}/dSYMs/llama.dSYM"
# Create a copy of the binary that will be stripped
cp "${base_dir}/${output_lib}" "${temp_dir}/binary_to_strip"
# Strip debug symbols from the copy
xcrun strip -S "${temp_dir}/binary_to_strip" -o "${temp_dir}/stripped_lib"
# Replace the original with the stripped version
mv "${temp_dir}/stripped_lib" "${base_dir}/${output_lib}"
else
# macOS style dSYM
# First strip debug info to a separate file
xcrun strip -S "${base_dir}/${output_lib}" -o "${temp_dir}/stripped_lib"
# Generate dSYM in the dSYMs directory
xcrun dsymutil "${base_dir}/${output_lib}" -o "${base_dir}/${build_dir}/dSYMs/llama.dSYM"
# Replace original binary with stripped version
mv "${temp_dir}/stripped_lib" "${base_dir}/${output_lib}"
fi
# Remove any automatically generated dSYM files in the framework structure as they will
# otherwise case Invalid Bundle Structure validation errors.
if [ -d "${base_dir}/${output_lib}.dSYM" ]; then
echo "Removing generated dSYM file in framework structure: ${base_dir}/${output_lib}.dSYM"
rm -rf "${base_dir}/${output_lib}.dSYM"
fi
# Clean up
rm -rf "${temp_dir}"
}
echo "Building for iOS simulator..."
cmake -B build-ios-sim -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${IOS_MIN_OS_VERSION} \
-DIOS=ON \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_SYSROOT=iphonesimulator \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=iphonesimulator \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-DLLAMA_OPENSSL=OFF \
-S .
cmake --build build-ios-sim --config Release -- -quiet
echo "Building for iOS devices..."
cmake -B build-ios-device -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${IOS_MIN_OS_VERSION} \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_SYSROOT=iphoneos \
-DCMAKE_OSX_ARCHITECTURES="arm64" \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=iphoneos \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-DLLAMA_OPENSSL=OFF \
-S .
cmake --build build-ios-device --config Release -- -quiet
# Setup frameworks and copy binaries and headers
echo "Setting up iOS framework structures..."
setup_framework_structure "build-ios-sim" ${IOS_MIN_OS_VERSION} "ios"
setup_framework_structure "build-ios-device" ${IOS_MIN_OS_VERSION} "ios"
# Create dynamic libraries from static libraries
echo "Creating iOS dynamic libraries from static libraries..."
combine_static_libraries "build-ios-sim" "Release-iphonesimulator" "ios" "true"
combine_static_libraries "build-ios-device" "Release-iphoneos" "ios" "false"
# Create iOS-only XCFramework with correct debug symbols paths
echo "Creating iOS-only XCFramework..."
xcrun xcodebuild -create-xcframework \
-framework $(pwd)/build-ios-sim/framework/llama.framework \
-debug-symbols $(pwd)/build-ios-sim/dSYMs/llama.dSYM \
-framework $(pwd)/build-ios-device/framework/llama.framework \
-debug-symbols $(pwd)/build-ios-device/dSYMs/llama.dSYM \
-output $(pwd)/build-apple/llama.xcframework
Et Voila!
United States
NORTH AMERICA
Related News
What Does "Building in Public" Actually Mean in 2026?
19h ago
The Agentic Headless Backend: What Vibe Coders Still Need After the UI Is Done
19h ago
Why I’m Still Learning to Code Even With AI
21h ago
I gave Claude a persistent memory for $0/month using Cloudflare
1d ago
NYT: 'Meta's Embrace of AI Is Making Its Employees Miserable'
1d ago