Fetching latest headlines…
Local Multimodal LLM on iOS with `llama.cpp` (Swift + ObjC++)
NORTH AMERICA
🇺🇸 United StatesMay 11, 2026

Local Multimodal LLM on iOS with `llama.cpp` (Swift + ObjC++)

0 views0 likes0 comments
Originally published byDev.to

I want a real local pipeline: image in, structured JSON out, no cloud dependency. Optimized to run Metal / ANE or whatever apple exposes ?
My goal is to infer a json-struct of variables from image using FM. Sounds simple, but it ain't so as of May 2026.
And I really want it.

After doing a bit of research, llama.cpp provides optimization and all the necesary low level work. I just need to make swift bindings that are worth the trouble...

This is a complete tutorial on how i did it. i will use something like quickbooks / wise.com receipt capture example to make it real and safe.

Bon courage!

What We’re Building

A local inference stack with clear separation of concerns:

  • llama.cpp as an iOS XCFramework (vendor/llama.cpp/build-apple/llama.xcframework)
  • Objective-C++ bridge (Controllers/LlamaBridge.h, Controllers/LlamaBridge.mm)
  • Swift-facing API in Controllers/LLMFunctionsController.swift
  • Typed decode API:
let result: ReceiptResult = try await LLMFunctionsController.shared.predict(
    image: receiptImage,
    prompt: "Extract vendor and total.",
    as: ReceiptResult.self
)

That is your chassis. Keep UI concerns away from inference state.

Prerequisites

  • Latest XCode. Always.
  • target iOS project
  • Models present in Models/. I am targeting iPad Pro M5 12GB onboard unified memory, so 4B in Q4 quantization should be snappy. But you can go find yourself a better model on https://huggingface.co/unsloth/Qwen3.5-4B-GGUF or another architecture as per your needs. For now, i believe gemma4 is lacking and apple foundation models are just non-functional in terms of image inference, so i go with qwen. Apache2.0 license is very nice to have; and otherwise it scores well on open-world benchmarks. so here we go:
    • Qwen3.5-4B-Q4_K_M.gguf
    • mmproj-F16.gguf

1) Add and Build llama.cpp for iOS Only

I am adding a git submodule under vendor/ , as i want to stay on top of latest releases llama.cpp makes in future.

  • vendor/llama.cpp

But i gotta patch build-xcframework.sh to:

  • build iOS device + iOS simulator only and skip tvOS compilations, etc.
  • And importantly! to include mtmd in the final framework. Full

Run:

cd vendor/llama.cpp
./build-xcframework.sh

You should see:

Creating iOS-only XCFramework...
xcframework successfully written out to: .../vendor/llama.cpp/build-apple/llama.xcframework

Confirm mtmd symbols (critical):

nm -gU build-apple/llama.xcframework/ios-arm64/llama.framework/llama | rg mtmd_

If mtmd_ symbols are missing, your bridge cannot do real multimodal eval.

2) Link XCFramework in Xcode

The project is wired to:

  • vendor/llama.cpp/build-apple/llama.xcframework

Configured in:

  • project.yml
  • yourproject.xcodeproj/project.pbxproj

For this build style, static-style link behavior is preferred (cleaner than dynamic embedding in many app setups).

3) Add ObjC++ Bridge

Files:

  • Controllers/LlamaBridge.h
  • Controllers/LlamaBridge.mm

Responsibilities:

  • one-time backend init (llama_backend_init)
  • one-time model + projector load:
    • text model (Qwen3.5-4B-Q4_K_M.gguf)
    • projector (mmproj-F16.gguf) via mtmd_init_from_file
  • image conversion (UIImage -> RGB)
  • multimodal tokenization (mtmd_tokenize)
  • multimodal chunk evaluation (mtmd_helper_eval_chunks)
  • token-by-token generation via llama_sampler_*
  • JSON validation before returning

This is your engine room. Swift should not manage raw llama/mtmd internals directly.

4) Expose Bridge to Swift

Bridging header:

  • Bridging-Header.h

With:

#import "Controllers/LlamaBridge.h"

Build setting added:

  • SWIFT_OBJC_BRIDGING_HEADER = Bridging-Header.h

in:

  • project.yml
  • yourproject.xcodeproj/project.pbxproj

5) Swift API in LLMFunctionsController.swift

Everything Swift-side for prediction stays in:

  • Controllers/LLMFunctionsController.swift

Implemented pieces:

  • StructuredOutput protocol (Codable-based)
  • predict<T: StructuredOutput>(image:prompt:as:) async throws -> T
  • lightweight JSON schema generation
  • model path resolution (Bundle first, cwd fallback)
  • bridge invocation with streaming token callback
  • final decode to typed model

This gives you one clean call-site and typed results.

6) Use It

struct ReceiptResult: Codable, StructuredOutput {
    let vendor: String
    let total: Double
}

let parsed: ReceiptResult = try await LLMFunctionsController.shared.predict(
    image: receiptImage,
    prompt: "Extract receipt vendor and total as JSON.",
    as: ReceiptResult.self
)

If output is malformed JSON, the controller throws with captured raw output context. Neat.

8) Troubleshooting

mtmd header import fails

Verify framework headers include:

  • mtmd.h
  • mtmd-helper.h
nm -gU .../llama.framework/llama | rg mtmd_

10) Complete File Contents

Below are the full file contents used in this implementation so you can copy-paste end-to-end (or ask your agent to do so)

Bridging-Header.h

#import "Controllers/LlamaBridge.h"

Controllers/LlamaBridge.h

#import <Foundation/Foundation.h>
#import <UIKit/UIKit.h>

NS_ASSUME_NONNULL_BEGIN

typedef void (^LlamaTokenHandler)(NSString *token);

@interface LlamaBridge : NSObject

+ (instancetype)shared;

- (BOOL)configureWithModelPath:(NSString *)modelPath
                    mmprojPath:(NSString *)mmprojPath
                         error:(NSError * _Nullable * _Nullable)error;

- (NSString * _Nullable)predictWithImage:(UIImage *)image
                                  prompt:(NSString *)prompt
                              jsonSchema:(NSString *)jsonSchema
                               maxTokens:(NSInteger)maxTokens
                            tokenHandler:(LlamaTokenHandler _Nullable)tokenHandler
                                   error:(NSError * _Nullable * _Nullable)error;

@end

NS_ASSUME_NONNULL_END

Controllers/LlamaBridge.mm

#import "LlamaBridge.h"

#import <vector>
#import <mutex>

#import <llama/llama.h>
#import <llama/mtmd.h>
#import <llama/mtmd-helper.h>

static NSString * const LlamaBridgeErrorDomain = @"LlamaBridgeErrorDomain";

namespace {
struct RGBTensor {
    int width = 0;
    int height = 0;
    std::vector<uint8_t> rgb;
};

static RGBTensor rgbTensorFromImage(UIImage *image, int targetWidth, int targetHeight) {
    RGBTensor tensor;
    tensor.width = targetWidth;
    tensor.height = targetHeight;
    tensor.rgb.resize(static_cast<size_t>(3 * targetWidth * targetHeight), 0);

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    std::vector<uint8_t> pixels(static_cast<size_t>(targetWidth * targetHeight * 4), 0);

    CGContextRef context = CGBitmapContextCreate(
        pixels.data(),
        targetWidth,
        targetHeight,
        8,
        targetWidth * 4,
        colorSpace,
        kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big
    );
    CGColorSpaceRelease(colorSpace);

    if (!context) {
        return tensor;
    }

    CGContextSetInterpolationQuality(context, kCGInterpolationHigh);
    CGContextDrawImage(context, CGRectMake(0, 0, targetWidth, targetHeight), image.CGImage);
    CGContextRelease(context);

    for (int y = 0; y < targetHeight; ++y) {
        for (int x = 0; x < targetWidth; ++x) {
            const size_t pixelIndex = static_cast<size_t>((y * targetWidth + x) * 4);
            const size_t out = static_cast<size_t>((y * targetWidth + x) * 3);
            tensor.rgb[out + 0] = pixels[pixelIndex + 0];
            tensor.rgb[out + 1] = pixels[pixelIndex + 1];
            tensor.rgb[out + 2] = pixels[pixelIndex + 2];
        }
    }

    return tensor;
}

}

@interface LlamaBridge () {
    std::mutex _lock;
    struct llama_model *_model;
    struct llama_context *_context;
    struct mtmd_context *_mctx;
    BOOL _isConfigured;
}
@end

@implementation LlamaBridge

+ (instancetype)shared {
    static LlamaBridge *instance;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        instance = [[LlamaBridge alloc] init];
    });
    return instance;
}

- (instancetype)init {
    self = [super init];
    if (self) {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            llama_backend_init();
        });
        _model = nullptr;
        _context = nullptr;
        _mctx = nullptr;
        _isConfigured = NO;
    }
    return self;
}

- (void)dealloc {
    if (_context != nullptr) {
        llama_free(_context);
        _context = nullptr;
    }
    if (_model != nullptr) {
        llama_model_free(_model);
        _model = nullptr;
    }
    if (_mctx != nullptr) {
        mtmd_free(_mctx);
        _mctx = nullptr;
    }
}

- (BOOL)configureWithModelPath:(NSString *)modelPath
                    mmprojPath:(NSString *)mmprojPath
                         error:(NSError * _Nullable * _Nullable)error {
    std::lock_guard<std::mutex> guard(_lock);

    if (_isConfigured) {
        return YES;
    }

    if (![[NSFileManager defaultManager] fileExistsAtPath:modelPath] ||
        ![[NSFileManager defaultManager] fileExistsAtPath:mmprojPath]) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:1 userInfo:@{
                NSLocalizedDescriptionKey: @"Model or projector file not found."
            }];
        }
        return NO;
    }

    struct llama_model_params modelParams = llama_model_default_params();
    modelParams.n_gpu_layers = 999;

    _model = llama_model_load_from_file(modelPath.UTF8String, modelParams);
    if (_model == nullptr) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:2 userInfo:@{
                NSLocalizedDescriptionKey: @"Failed to load main model GGUF."
            }];
        }
        return NO;
    }

    struct mtmd_context_params mtmdParams = mtmd_context_params_default();
    mtmdParams.use_gpu = true;
    mtmdParams.n_threads = MAX(1, (int32_t)[NSProcessInfo processInfo].activeProcessorCount - 1);
    _mctx = mtmd_init_from_file(mmprojPath.UTF8String, _model, mtmdParams);
    if (_mctx == nullptr) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:3 userInfo:@{
                NSLocalizedDescriptionKey: @"Failed to load vision projector GGUF."
            }];
        }
        return NO;
    }

    struct llama_context_params contextParams = llama_context_default_params();
    contextParams.n_ctx = 8192;
    contextParams.n_threads = MAX(1, (int32_t)[NSProcessInfo processInfo].activeProcessorCount - 1);
    contextParams.n_threads_batch = contextParams.n_threads;

    _context = llama_init_from_model(_model, contextParams);
    if (_context == nullptr) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:4 userInfo:@{
                NSLocalizedDescriptionKey: @"Failed to initialize llama context."
            }];
        }
        return NO;
    }

    _isConfigured = YES;
    return YES;
}

- (NSString * _Nullable)predictWithImage:(UIImage *)image
                                  prompt:(NSString *)prompt
                              jsonSchema:(NSString *)jsonSchema
                               maxTokens:(NSInteger)maxTokens
                            tokenHandler:(LlamaTokenHandler _Nullable)tokenHandler
                                   error:(NSError * _Nullable * _Nullable)error {
    std::lock_guard<std::mutex> guard(_lock);

    if (!_isConfigured || _model == nullptr || _context == nullptr || _mctx == nullptr) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:5 userInfo:@{
                NSLocalizedDescriptionKey: @"LlamaBridge is not configured."
            }];
        }
        return nil;
    }

    RGBTensor rgb = rgbTensorFromImage(image, 448, 448);
    std::string systemPrompt = "You are a strict JSON extractor. Return ONLY valid JSON matching schema.";
    const char *marker = mtmd_default_marker();
    std::string fullPrompt = systemPrompt + "\nSCHEMA:\n" + jsonSchema.UTF8String +
        "\nUSER_PROMPT:\n" + prompt.UTF8String +
        "\nIMAGE:\n" + std::string(marker) +
        "\nJSON:";

    llama_memory_clear(llama_get_memory(_context), true);
    mtmd_bitmap *bitmap = mtmd_bitmap_init((uint32_t)rgb.width, (uint32_t)rgb.height, rgb.rgb.data());
    if (bitmap == nullptr) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:6 userInfo:@{
                NSLocalizedDescriptionKey: @"Failed to create mtmd bitmap."
            }];
        }
        return nil;
    }
    const mtmd_bitmap *bitmaps[1] = { bitmap };
    mtmd_input_text textInput = { fullPrompt.c_str(), true, true };
    mtmd_input_chunks *chunks = mtmd_input_chunks_init();
    int32_t tokenized = mtmd_tokenize(_mctx, chunks, &textInput, bitmaps, 1);
    if (tokenized != 0) {
        mtmd_input_chunks_free(chunks);
        mtmd_bitmap_free(bitmap);
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:7 userInfo:@{
                NSLocalizedDescriptionKey: @"mtmd_tokenize failed."
            }];
        }
        return nil;
    }

    llama_pos nPast = 0;
    int32_t evalStatus = mtmd_helper_eval_chunks(
        _mctx,
        _context,
        chunks,
        0,
        0,
        (int32_t)llama_n_batch(_context),
        true,
        &nPast
    );
    mtmd_input_chunks_free(chunks);
    mtmd_bitmap_free(bitmap);
    if (evalStatus != 0) {
        if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:8 userInfo:@{
                NSLocalizedDescriptionKey: @"mtmd prompt evaluation failed."
            }];
        }
        return nil;
    }

    const struct llama_vocab *vocab = llama_model_get_vocab(_model);
    struct llama_sampler *sampler = llama_sampler_chain_init(llama_sampler_chain_default_params());
    llama_sampler_chain_add(sampler, llama_sampler_init_top_k(40));
    llama_sampler_chain_add(sampler, llama_sampler_init_top_p(0.95f, 1));
    llama_sampler_chain_add(sampler, llama_sampler_init_temp(0.2f));
    llama_sampler_chain_add(sampler, llama_sampler_init_dist(1337));

    std::string generated;
    generated.reserve(2048);
    const int tokenLimit = (int)MAX(64, maxTokens);

    for (int i = 0; i < tokenLimit; ++i) {
        llama_token token = llama_sampler_sample(sampler, _context, -1);
        if (llama_vocab_is_eog(vocab, token)) {
            break;
        }
        llama_sampler_accept(sampler, token);

        char pieceBuf[256];
        int pieceLen = llama_token_to_piece(vocab, token, pieceBuf, sizeof(pieceBuf), 0, true);
        if (pieceLen > 0) {
            generated.append(pieceBuf, pieceLen);
            if (tokenHandler) {
                tokenHandler([[NSString alloc] initWithBytes:pieceBuf length:(NSUInteger)pieceLen encoding:NSUTF8StringEncoding] ?: @"");
            }
        }

        struct llama_batch nextBatch = llama_batch_get_one(&token, 1);
        if (llama_decode(_context, nextBatch) < 0) {
            break;
        }
    }

    llama_sampler_free(sampler);

    NSString *raw = [NSString stringWithUTF8String:generated.c_str()] ?: @"";
    NSData *rawData = [raw dataUsingEncoding:NSUTF8StringEncoding];
    if (rawData != nil && [NSJSONSerialization JSONObjectWithData:rawData options:0 error:nil] != nil) {
        return raw;
    }

    NSRange start = [raw rangeOfString:@"{"];
    NSRange end = [raw rangeOfString:@"}" options:NSBackwardsSearch];
    if (start.location != NSNotFound && end.location != NSNotFound && end.location > start.location) {
        NSRange span = NSMakeRange(start.location, end.location - start.location + 1);
        NSString *candidate = [raw substringWithRange:span];
        NSData *candidateData = [candidate dataUsingEncoding:NSUTF8StringEncoding];
        if (candidateData != nil && [NSJSONSerialization JSONObjectWithData:candidateData options:0 error:nil] != nil) {
            return candidate;
        }
    }

    if (error) {
            *error = [NSError errorWithDomain:LlamaBridgeErrorDomain code:9 userInfo:@{
                NSLocalizedDescriptionKey: @"Model output is not valid JSON.",
                @"rawOutput": raw
            }];
    }
    return nil;
}

@end

Controllers/LLMFunctionsController.swift

import Foundation

public protocol StructuredOutput: Codable {}
extension StructuredOutput {}

@MainActor
final class LLMFunctionsController {
    static let shared = LLMFunctionsController()

    private let urlSession: URLSession
    private let encoder: JSONEncoder
    private let decoder: JSONDecoder
    private let llamaBridge = LlamaBridge.shared()

    private init() {

        encoder = JSONEncoder()
        decoder = JSONDecoder()

    }


    func predict<T: StructuredOutput>(
        image: UIImage,
        prompt: String,
        as type: T.Type
    ) async throws -> T {
        let schema = try JSONSchemaGenerator.schema(for: type)
        let json = try await predict(image: image, prompt: prompt, schema: schema)
        let data = Data(json.utf8)
        return try decoder.decode(T.self, from: data)
    }

    private func predict(
        image: UIImage,
        prompt: String,
        schema: String
    ) async throws -> String {
        let modelPath = try resolveModelPath(fileName: "Qwen3.5-4B-Q4_K_M.gguf")
        let mmprojPath = try resolveModelPath(fileName: "mmproj-F16.gguf")

        return try await Task.detached(priority: .userInitiated) { [llamaBridge] in
            var configureError: NSError?
            let configured = llamaBridge.configure(withModelPath: modelPath, mmprojPath: mmprojPath, error: &configureError)
            guard configured else {
                throw FunctionsError.modelNotConfigured(configureError?.localizedDescription ?? "Unknown configuration error")
            }

            var streamBuffer = ""
            var inferenceError: NSError?
            let json = llamaBridge.predict(
                with: image,
                prompt: prompt,
                jsonSchema: schema,
                maxTokens: 512,
                tokenHandler: { token in
                    streamBuffer += token
                },
                error: &inferenceError
            )

            if let json {
                return json
            }
            if let inferenceError {
                let raw = (inferenceError.userInfo["rawOutput"] as? String) ?? streamBuffer
                throw FunctionsError.invalidModelOutput(raw.isEmpty ? inferenceError.localizedDescription : raw)
            }
            throw FunctionsError.invalidModelOutput(streamBuffer)
        }.value
    }

    private func resolveModelPath(fileName: String) throws -> String {
        if let bundled = Bundle.main.url(forResource: fileName, withExtension: nil, subdirectory: "Models") {
            return bundled.path
        }
        let cwdPath = URL(fileURLWithPath: FileManager.default.currentDirectoryPath)
            .appendingPathComponent("Models", isDirectory: true)
            .appendingPathComponent(fileName)
            .path
        if FileManager.default.fileExists(atPath: cwdPath) {
            return cwdPath
        }
        throw FunctionsError.modelNotConfigured("Missing \(fileName) in app bundle `Models/` and working directory.")
    }
}




private extension JSONEncoder {
    func encodeToDict<T: Encodable>(_ value: T) throws -> [String: Any] {
        let data = try self.encode(value)
        guard let obj = try JSONSerialization.jsonObject(with: data) as? [String: Any] else {
            throw FunctionsError.decoding(NSError(domain: "Encoding", code: -1, userInfo: nil))
        }
        return obj
    }
}

private enum JSONSchemaGenerator {
    static func schema<T: StructuredOutput>(for type: T.Type) throws -> String {
        let typeName = String(describing: type)
        let jsonSchema: [String: Any] = [
            "$schema": "http://json-schema.org/draft-07/schema#",
            "title": typeName,
            "type": "object",
            "additionalProperties": true,
            "description": "Return JSON strictly decodable to \(typeName)."
        ]
        let data = try JSONSerialization.data(withJSONObject: jsonSchema, options: [.sortedKeys])
        guard let schema = String(data: data, encoding: .utf8) else {
            throw FunctionsError.decoding(NSError(domain: "SchemaEncoding", code: -1, userInfo: nil))
        }
        return schema
    }
}
#!/usr/bin/env bash
#
# Options
IOS_MIN_OS_VERSION=16.4
MACOS_MIN_OS_VERSION=13.3
VISIONOS_MIN_OS_VERSION=1.0
TVOS_MIN_OS_VERSION=16.4

BUILD_SHARED_LIBS=OFF
LLAMA_BUILD_EXAMPLES=OFF
LLAMA_BUILD_TOOLS=ON
LLAMA_BUILD_TESTS=OFF
LLAMA_BUILD_SERVER=OFF
GGML_METAL=ON
GGML_METAL_EMBED_LIBRARY=ON
GGML_BLAS_DEFAULT=ON
GGML_METAL_USE_BF16=ON
GGML_OPENMP=OFF

COMMON_C_FLAGS="-Wno-macro-redefined -Wno-shorten-64-to-32 -Wno-unused-command-line-argument -g"
COMMON_CXX_FLAGS="-Wno-macro-redefined -Wno-shorten-64-to-32 -Wno-unused-command-line-argument -g"

# Common options for all builds
COMMON_CMAKE_ARGS=(
    -DCMAKE_XCODE_ATTRIBUTE_CODE_SIGNING_REQUIRED=NO
    -DCMAKE_XCODE_ATTRIBUTE_CODE_SIGN_IDENTITY=""
    -DCMAKE_XCODE_ATTRIBUTE_CODE_SIGNING_ALLOWED=NO
    -DCMAKE_XCODE_ATTRIBUTE_DEBUG_INFORMATION_FORMAT="dwarf-with-dsym"
    -DCMAKE_XCODE_ATTRIBUTE_GCC_GENERATE_DEBUGGING_SYMBOLS=YES
    -DCMAKE_XCODE_ATTRIBUTE_COPY_PHASE_STRIP=NO
    -DCMAKE_XCODE_ATTRIBUTE_STRIP_INSTALLED_PRODUCT=NO
    -DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
    -DBUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}
    -DLLAMA_BUILD_EXAMPLES=${LLAMA_BUILD_EXAMPLES}
    -DLLAMA_BUILD_TOOLS=${LLAMA_BUILD_TOOLS}
    -DLLAMA_BUILD_TESTS=${LLAMA_BUILD_TESTS}
    -DLLAMA_BUILD_SERVER=${LLAMA_BUILD_SERVER}
    -DGGML_METAL_EMBED_LIBRARY=${GGML_METAL_EMBED_LIBRARY}
    -DGGML_BLAS_DEFAULT=${GGML_BLAS_DEFAULT}
    -DGGML_METAL=${GGML_METAL}
    -DGGML_METAL_USE_BF16=${GGML_METAL_USE_BF16}
    -DGGML_NATIVE=OFF
    -DGGML_OPENMP=${GGML_OPENMP}
)

check_required_tool() {
    local tool=$1
    local install_message=$2

    if ! command -v $tool &> /dev/null; then
        echo "Error: $tool is required but not found."
        echo "$install_message"
        exit 1
    fi
}
echo "Checking for required tools..."
check_required_tool "cmake" "Please install CMake 3.28.0 or later (brew install cmake)"
check_required_tool "xcrun" "Please install Xcode and Xcode Command Line Tools (xcode-select --install)"

XCODE_VERSION=$(xcrun xcodebuild -version 2>/dev/null | head -n1 | awk '{ print $2 }')
MAJOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f1)
MINOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f2)
echo "Detected Xcode version: $XCODE_VERSION"

set -e

## Clean up previous builds
rm -rf build-apple
rm -rf build-ios-sim
rm -rf build-ios-device
rm -rf build-macos
rm -rf build-visionos
rm -rf build-visionos-sim
rm -rf build-tvos-sim
rm -rf build-tvos-device

# Setup the xcframework build directory structure
setup_framework_structure() {
    local build_dir=$1
    local min_os_version=$2
    local platform=$3  # "ios", "macos", "visionos", or "tvos"
    local framework_name="llama"

    echo "Creating ${platform}-style framework structure for ${build_dir}"

    if [[ "$platform" == "macos" ]]; then
        # macOS versioned structure uses versioned directories
        mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Headers
        mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Modules
        mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Resources

        # Create symbolic links
        ln -sf A ${build_dir}/framework/${framework_name}.framework/Versions/Current
        ln -sf Versions/Current/Headers ${build_dir}/framework/${framework_name}.framework/Headers
        ln -sf Versions/Current/Modules ${build_dir}/framework/${framework_name}.framework/Modules
        ln -sf Versions/Current/Resources ${build_dir}/framework/${framework_name}.framework/Resources
        ln -sf Versions/Current/${framework_name} ${build_dir}/framework/${framework_name}.framework/${framework_name}

        # Set header and module paths
        local header_path=${build_dir}/framework/${framework_name}.framework/Versions/A/Headers/
        local module_path=${build_dir}/framework/${framework_name}.framework/Versions/A/Modules/
    else
        # iOS/VisionOS/tvOS use a flat structure
        mkdir -p ${build_dir}/framework/${framework_name}.framework/Headers
        mkdir -p ${build_dir}/framework/${framework_name}.framework/Modules

        # Remove any existing structure to ensure clean build
        rm -rf ${build_dir}/framework/${framework_name}.framework/Versions

        # Set header and module paths
        local header_path=${build_dir}/framework/${framework_name}.framework/Headers/
        local module_path=${build_dir}/framework/${framework_name}.framework/Modules/
    fi

    # Copy all required headers (common for all platforms)
    cp include/llama.h             ${header_path}
    cp ggml/include/ggml.h         ${header_path}
    cp ggml/include/ggml-opt.h     ${header_path}
    cp ggml/include/ggml-alloc.h   ${header_path}
    cp ggml/include/ggml-backend.h ${header_path}
    cp ggml/include/ggml-metal.h   ${header_path}
    cp ggml/include/ggml-cpu.h     ${header_path}
    cp ggml/include/ggml-blas.h    ${header_path}
    cp ggml/include/gguf.h         ${header_path}
    cp tools/mtmd/mtmd.h           ${header_path}
    cp tools/mtmd/mtmd-helper.h    ${header_path}

    # Create module map (common for all platforms)
    cat > ${module_path}module.modulemap << EOF
framework module llama {
    header "llama.h"
    header "ggml.h"
    header "ggml-alloc.h"
    header "ggml-backend.h"
    header "ggml-metal.h"
    header "ggml-cpu.h"
    header "ggml-blas.h"
    header "gguf.h"
    header "mtmd.h"
    header "mtmd-helper.h"

    link "c++"
    link framework "Accelerate"
    link framework "Metal"
    link framework "Foundation"

    export *
}
EOF

    # Platform-specific settings for Info.plist
    local platform_name=""
    local sdk_name=""
    local supported_platform=""

    case "$platform" in
        "ios")
            platform_name="iphoneos"
            sdk_name="iphoneos${min_os_version}"
            supported_platform="iPhoneOS"
            local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
            local device_family='    <key>UIDeviceFamily</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
    </array>'
            ;;
        "macos")
            platform_name="macosx"
            sdk_name="macosx${min_os_version}"
            supported_platform="MacOSX"
            local plist_path="${build_dir}/framework/${framework_name}.framework/Versions/A/Resources/Info.plist"
            local device_family=""
            ;;
        "visionos")
            platform_name="xros"
            sdk_name="xros${min_os_version}"
            supported_platform="XRPlatform"
            local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
            local device_family=""
            ;;
        "tvos")
            platform_name="appletvos"
            sdk_name="appletvos${min_os_version}"
            supported_platform="AppleTVOS"
            local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
            local device_family='    <key>UIDeviceFamily</key>
    <array>
        <integer>3</integer>
    </array>'
            ;;
    esac

    # Create Info.plist
    cat > ${plist_path} << EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>CFBundleDevelopmentRegion</key>
    <string>en</string>
    <key>CFBundleExecutable</key>
    <string>llama</string>
    <key>CFBundleIdentifier</key>
    <string>org.ggml.llama</string>
    <key>CFBundleInfoDictionaryVersion</key>
    <string>6.0</string>
    <key>CFBundleName</key>
    <string>llama</string>
    <key>CFBundlePackageType</key>
    <string>FMWK</string>
    <key>CFBundleShortVersionString</key>
    <string>1.0</string>
    <key>CFBundleVersion</key>
    <string>1</string>
    <key>MinimumOSVersion</key>
    <string>${min_os_version}</string>
    <key>CFBundleSupportedPlatforms</key>
    <array>
        <string>${supported_platform}</string>
    </array>${device_family}
    <key>DTPlatformName</key>
    <string>${platform_name}</string>
    <key>DTSDKName</key>
    <string>${sdk_name}</string>
</dict>
</plist>
EOF
}

# Create dynamic libraries from static libraries.
combine_static_libraries() {
    local build_dir="$1"
    local release_dir="$2"
    local platform="$3"  # "ios", "macos", "visionos", or "tvos"
    local is_simulator="$4"
    local base_dir="$(pwd)"
    local framework_name="llama"

    # Determine output path based on platform
    local output_lib=""
    if [[ "$platform" == "macos" ]]; then
        # macOS uses versioned structure
        output_lib="${build_dir}/framework/${framework_name}.framework/Versions/A/${framework_name}"
    else
        # iOS, visionOS, and tvOS use a directory flat structure
        output_lib="${build_dir}/framework/${framework_name}.framework/${framework_name}"
    fi

    local libs=(
        "${base_dir}/${build_dir}/src/${release_dir}/libllama.a"
        "${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml.a"
        "${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml-base.a"
        "${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml-cpu.a"
        "${base_dir}/${build_dir}/ggml/src/ggml-metal/${release_dir}/libggml-metal.a"
        "${base_dir}/${build_dir}/ggml/src/ggml-blas/${release_dir}/libggml-blas.a"
        "${base_dir}/${build_dir}/tools/mtmd/${release_dir}/libmtmd.a"
    )

    # Create temporary directory for processing
    local temp_dir="${base_dir}/${build_dir}/temp"
    mkdir -p "${temp_dir}"

    # Since we have multiple architectures libtool will find object files that do not
    # match the target architecture. We suppress these warnings.
    xcrun libtool -static -o "${temp_dir}/combined.a" "${libs[@]}" 2> /dev/null

    # Determine SDK, architectures, and install_name based on platform and simulator flag.
    local sdk=""
    local archs=""
    local min_version_flag=""
    local install_name=""

    case "$platform" in
        "ios")
            if [[ "$is_simulator" == "true" ]]; then
                sdk="iphonesimulator"
                archs="arm64 x86_64"
                min_version_flag="-mios-simulator-version-min=${IOS_MIN_OS_VERSION}"
            else
                sdk="iphoneos"
                archs="arm64"
                min_version_flag="-mios-version-min=${IOS_MIN_OS_VERSION}"
            fi
            install_name="@rpath/llama.framework/llama"
            ;;
        "macos")
            sdk="macosx"
            archs="arm64 x86_64"
            min_version_flag="-mmacosx-version-min=${MACOS_MIN_OS_VERSION}"
            install_name="@rpath/llama.framework/Versions/Current/llama"
            ;;
        "visionos")
            if [[ "$is_simulator" == "true" ]]; then
                sdk="xrsimulator"
                archs="arm64 x86_64"
                min_version_flag="-mtargetos=xros${VISIONOS_MIN_OS_VERSION}-simulator"
            else
                sdk="xros"
                archs="arm64"
                min_version_flag="-mtargetos=xros${VISIONOS_MIN_OS_VERSION}"
            fi
            # Use flat structure for visionOS, same as iOS
            install_name="@rpath/llama.framework/llama"
            ;;
        "tvos")
            if [[ "$is_simulator" == "true" ]]; then
                sdk="appletvsimulator"
                archs="arm64 x86_64"
                min_version_flag="-mtvos-simulator-version-min=${TVOS_MIN_OS_VERSION}"
            else
                sdk="appletvos"
                archs="arm64"
                min_version_flag="-mtvos-version-min=${TVOS_MIN_OS_VERSION}"
            fi
            install_name="@rpath/llama.framework/llama"
            ;;
    esac

    # Build architecture flags
    local arch_flags=""
    for arch in $archs; do
        arch_flags+=" -arch $arch"
    done

    # Create dynamic library
    echo "Creating dynamic library for ${platform}."
    xcrun -sdk $sdk clang++ -dynamiclib \
        -isysroot $(xcrun --sdk $sdk --show-sdk-path) \
        $arch_flags \
        $min_version_flag \
        -Wl,-force_load,"${temp_dir}/combined.a" \
        -framework Foundation -framework Metal -framework Accelerate \
        -install_name "$install_name" \
        -o "${base_dir}/${output_lib}"

    # Platform-specific post-processing for device builds
    if [[ "$is_simulator" == "false" ]]; then
        if xcrun -f vtool &>/dev/null; then
            case "$platform" in
                "ios")
                    echo "Marking binary as a framework binary for iOS..."
                    xcrun vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
                        -output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
                    ;;
                "visionos")
                    echo "Marking binary as a framework binary for visionOS..."
                    if [[ "$MAJOR_VERSION" -gt 16 ]] || [[ "$MAJOR_VERSION" -eq 16 && "$MINOR_VERSION" -gt 2 ]]; then
                        echo "Xcode version greater than 16.2, using visionOS."
                        VISION_OS_BUILD_VERSION="visionos"
                    else
                        echo "Xcode version less than or equal to 16.2, using xros."
                        VISION_OS_BUILD_VERSION="xros"
                    fi
                    xcrun vtool -set-build-version ${VISION_OS_BUILD_VERSION} ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
                        -output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
                    ;;
                "tvos")
                    echo "Marking binary as a framework binary for tvOS..."
                    xcrun vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
                        -output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
                    ;;
            esac
        else
            echo "Warning: vtool not found. Binary may not pass App Store validation."
        fi
    fi

    echo "Creating properly formatted dSYM..."
    # Create a separate directory for dSYMs for all platforms
    mkdir -p "${base_dir}/${build_dir}/dSYMs"

    # iOS and visionOS style dSYM (flat structure)
    if [[ "$platform" == "ios" || "$platform" == "visionos" || "$platform" == "tvos" ]]; then
        # Generate dSYM in the dSYMs directory
        xcrun dsymutil "${base_dir}/${output_lib}" -o "${base_dir}/${build_dir}/dSYMs/llama.dSYM"

        # Create a copy of the binary that will be stripped
        cp "${base_dir}/${output_lib}" "${temp_dir}/binary_to_strip"

        # Strip debug symbols from the copy
        xcrun strip -S "${temp_dir}/binary_to_strip" -o "${temp_dir}/stripped_lib"

        # Replace the original with the stripped version
        mv "${temp_dir}/stripped_lib" "${base_dir}/${output_lib}"
    else
        # macOS style dSYM
        # First strip debug info to a separate file
        xcrun strip -S "${base_dir}/${output_lib}" -o "${temp_dir}/stripped_lib"

        # Generate dSYM in the dSYMs directory
        xcrun dsymutil "${base_dir}/${output_lib}" -o "${base_dir}/${build_dir}/dSYMs/llama.dSYM"

        # Replace original binary with stripped version
        mv "${temp_dir}/stripped_lib" "${base_dir}/${output_lib}"
    fi

    # Remove any automatically generated dSYM files in the framework structure as they will
    # otherwise case Invalid Bundle Structure validation errors.
    if [ -d "${base_dir}/${output_lib}.dSYM" ]; then
        echo "Removing generated dSYM file in framework structure: ${base_dir}/${output_lib}.dSYM"
        rm -rf "${base_dir}/${output_lib}.dSYM"
    fi

    # Clean up
    rm -rf "${temp_dir}"
}

echo "Building for iOS simulator..."
cmake -B build-ios-sim -G Xcode \
    "${COMMON_CMAKE_ARGS[@]}" \
    -DCMAKE_OSX_DEPLOYMENT_TARGET=${IOS_MIN_OS_VERSION} \
    -DIOS=ON \
    -DCMAKE_SYSTEM_NAME=iOS \
    -DCMAKE_OSX_SYSROOT=iphonesimulator \
    -DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
    -DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=iphonesimulator \
    -DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
    -DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
    -DLLAMA_OPENSSL=OFF \
    -S .
cmake --build build-ios-sim --config Release -- -quiet

echo "Building for iOS devices..."
cmake -B build-ios-device -G Xcode \
    "${COMMON_CMAKE_ARGS[@]}" \
    -DCMAKE_OSX_DEPLOYMENT_TARGET=${IOS_MIN_OS_VERSION} \
    -DCMAKE_SYSTEM_NAME=iOS \
    -DCMAKE_OSX_SYSROOT=iphoneos \
    -DCMAKE_OSX_ARCHITECTURES="arm64" \
    -DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=iphoneos \
    -DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
    -DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
    -DLLAMA_OPENSSL=OFF \
    -S .
cmake --build build-ios-device --config Release -- -quiet

# Setup frameworks and copy binaries and headers
echo "Setting up iOS framework structures..."
setup_framework_structure "build-ios-sim" ${IOS_MIN_OS_VERSION} "ios"
setup_framework_structure "build-ios-device" ${IOS_MIN_OS_VERSION} "ios"

# Create dynamic libraries from static libraries
echo "Creating iOS dynamic libraries from static libraries..."
combine_static_libraries "build-ios-sim" "Release-iphonesimulator" "ios" "true"
combine_static_libraries "build-ios-device" "Release-iphoneos" "ios" "false"

# Create iOS-only XCFramework with correct debug symbols paths
echo "Creating iOS-only XCFramework..."
xcrun xcodebuild -create-xcframework \
    -framework $(pwd)/build-ios-sim/framework/llama.framework \
    -debug-symbols $(pwd)/build-ios-sim/dSYMs/llama.dSYM \
    -framework $(pwd)/build-ios-device/framework/llama.framework \
    -debug-symbols $(pwd)/build-ios-device/dSYMs/llama.dSYM \
    -output $(pwd)/build-apple/llama.xcframework

Et Voila!

Comments (0)

Sign in to join the discussion

Be the first to comment!