Porting inswapper_128 to Kotlin: A Practical Guide (Python to JavaCV/ONNX)

[Introductory Note / Disclaimer]

Before we dive into the technical details, I want to be transparent: While I am a professional web developer, I am not an expert in the inner workings of inswapper_128 or deep learning models. This implementation and the documentation below were developed in collaboration with a ChatGPT-5.2 (Agent), which assisted in the code conversion and technical analysis. My goal here is to share the practical findings from this AI-driven porting process.


If you have ever tried to move a deep learning project from the Python ecosystem to a JVM-based environment like Kotlin, you know the “last mile” is often the hardest. While Python’s insightface makes face swapping look easy, implementing the same logic in Kotlin requires a deep dive into manual pre-processing and tensor manipulation.

This post summarizes my findings and “gotchas” encountered while porting the inswapper_128 model to Kotlin using JavaCV and ONNX Runtime.


✅ The Objective

  • Real-time face swapping via USB camera feed.
  • Executing inswapper_128.onnx directly from Kotlin.
  • Ensuring 1:1 parity with Python’s pre-processing and post-processing results.

✅ The Required Stack

The inswapper_128 model does not work in isolation. To achieve a successful swap, you need a full pipeline:

  • Face Detection: YuNet (face_detection_yunet_2023mar.onnx)
  • Face Recognition (Embedding): ArcFace (w600k_r50.onnx)
  • Face Swapping: inswapper_128.onnx
  • Emap Data: Extracted from the inswapper_128 model weights

✅ Workflow Mapping: Python vs. Kotlin

To get identical results, the Kotlin implementation must mirror these Python steps:

ProcessPython (insightface)Kotlin Implementation
DetectionFaceAnalysis.get()YuNet + 5-point Landmarks
RecognitionArcFaceONNX.get()Execute w600k_r50.onnx directly
Swap InputblobFromImageimg / 255 + swapRB=True
EmbeddingembeddingL2 Normalization + Matrix dot product with emap
Output Formattingtranspose + clip + BGRNCHW to NHWC + BGR Conversion

✅ Critical Implementation Pitfalls

Through the porting process, I identified five specific areas where implementations usually fail:

1. ArcFace Input Names

The w600k_r50.onnx model expects the input name to be input.1. In Kotlin/ONNX Runtime, if you pass your data using a generic name like “data” or “input”, the model will fail to execute.

2. Precise Normalization

Python’s normalization typically follows (img - 127.5) / 127.5 with a BGR-to-RGB swap (swapRB=True). Your Kotlin code must match this math exactly to avoid distorted embeddings.

3. The “emap” Requirement

This is the biggest hurdle. In Python, the latent vector is transformed using a matrix called emap found inside the model. If you skip this, your output will likely be a solid black or grey box.

Python Snippet to Extract emap:

import onnx 
from onnx import numpy_helper

model = onnx.load("inswapper_128.onnx")
# Extract the last initializer as emap
emap = numpy_helper.to_array(model.graph.initializer[-1])
emap.astype("float32").tofile("inswapper_emap.bin")

4. NCHW Output Handling

The model outputs a tensor in NCHW format. To display this as an image in Kotlin, you must manually transpose it to NHWC and convert it back to BGR:

  1. Transpose: (0, 2, 3, 1)
  2. Scale: value * 255
  3. Clip: 0.0 to 255.0
  4. Cast: UInt8

5. Masking Issues (Black Borders)

Generating a mask based solely on “image intensity” often leaves ugly black borders. For Kotlin, I found that using a Geometric Mask (warping a white image) provides much more stable and cleaner results.


✅ Development Environment

  • IDE: IntelliJ IDEA 2025.3.2
  • Runtime: JDK 21 / Kotlin 1.9.22
  • Libraries: JavaCV 1.5.10 / ONNX Runtime 1.17.0 (GPU supported)

✅ Explore the Code

The full source code for this Kotlin implementation is now available on GitHub. Feel free to explore, star, or contribute to the project!

🚀 GitHub Repository: Kotlin-Face-Swapper


✅ Summary

Porting inswapper_128 to Kotlin is a tedious process because the “magic” handled by Python libraries must be manually reconstructed. However, ensuring that your pre-processing, embeddings, emap, and output transformations align perfectly is the only way to achieve high-quality results.

It’s a challenge, but building it “by hand” gives you a much deeper understanding of how these models actually function!