Contents
[Introductory Note / Disclaimer]
Before we dive into the technical details, I want to be transparent: While I am a professional web developer, I am not an expert in the inner workings of inswapper_128 or deep learning models. This implementation and the documentation below were developed in collaboration with a ChatGPT-5.2 (Agent), which assisted in the code conversion and technical analysis. My goal here is to share the practical findings from this AI-driven porting process.
If you have ever tried to move a deep learning project from the Python ecosystem to a JVM-based environment like Kotlin, you know the “last mile” is often the hardest. While Python’s insightface makes face swapping look easy, implementing the same logic in Kotlin requires a deep dive into manual pre-processing and tensor manipulation.
This post summarizes my findings and “gotchas” encountered while porting the inswapper_128 model to Kotlin using JavaCV and ONNX Runtime.
✅ The Objective
- Real-time face swapping via USB camera feed.
- Executing
inswapper_128.onnxdirectly from Kotlin. - Ensuring 1:1 parity with Python’s pre-processing and post-processing results.
✅ The Required Stack
The inswapper_128 model does not work in isolation. To achieve a successful swap, you need a full pipeline:
- Face Detection: YuNet (
face_detection_yunet_2023mar.onnx) - Face Recognition (Embedding): ArcFace (
w600k_r50.onnx) - Face Swapping:
inswapper_128.onnx - Emap Data: Extracted from the
inswapper_128model weights
✅ Workflow Mapping: Python vs. Kotlin
To get identical results, the Kotlin implementation must mirror these Python steps:
| Process | Python (insightface) | Kotlin Implementation |
|---|---|---|
| Detection | FaceAnalysis.get() | YuNet + 5-point Landmarks |
| Recognition | ArcFaceONNX.get() | Execute w600k_r50.onnx directly |
| Swap Input | blobFromImage | img / 255 + swapRB=True |
| Embedding | embedding | L2 Normalization + Matrix dot product with emap |
| Output Formatting | transpose + clip + BGR | NCHW to NHWC + BGR Conversion |
✅ Critical Implementation Pitfalls
Through the porting process, I identified five specific areas where implementations usually fail:
1. ArcFace Input Names
The w600k_r50.onnx model expects the input name to be input.1. In Kotlin/ONNX Runtime, if you pass your data using a generic name like “data” or “input”, the model will fail to execute.
2. Precise Normalization
Python’s normalization typically follows (img - 127.5) / 127.5 with a BGR-to-RGB swap (swapRB=True). Your Kotlin code must match this math exactly to avoid distorted embeddings.
3. The “emap” Requirement
This is the biggest hurdle. In Python, the latent vector is transformed using a matrix called emap found inside the model. If you skip this, your output will likely be a solid black or grey box.
Python Snippet to Extract emap:
import onnx
from onnx import numpy_helper
model = onnx.load("inswapper_128.onnx")
# Extract the last initializer as emap
emap = numpy_helper.to_array(model.graph.initializer[-1])
emap.astype("float32").tofile("inswapper_emap.bin")
4. NCHW Output Handling
The model outputs a tensor in NCHW format. To display this as an image in Kotlin, you must manually transpose it to NHWC and convert it back to BGR:
- Transpose:
(0, 2, 3, 1) - Scale:
value * 255 - Clip:
0.0 to 255.0 - Cast:
UInt8
5. Masking Issues (Black Borders)
Generating a mask based solely on “image intensity” often leaves ugly black borders. For Kotlin, I found that using a Geometric Mask (warping a white image) provides much more stable and cleaner results.
✅ Development Environment
- IDE: IntelliJ IDEA 2025.3.2
- Runtime: JDK 21 / Kotlin 1.9.22
- Libraries: JavaCV 1.5.10 / ONNX Runtime 1.17.0 (GPU supported)
✅ Explore the Code
The full source code for this Kotlin implementation is now available on GitHub. Feel free to explore, star, or contribute to the project!
🚀 GitHub Repository: Kotlin-Face-Swapper
✅ Summary
Porting inswapper_128 to Kotlin is a tedious process because the “magic” handled by Python libraries must be manually reconstructed. However, ensuring that your pre-processing, embeddings, emap, and output transformations align perfectly is the only way to achieve high-quality results.
It’s a challenge, but building it “by hand” gives you a much deeper understanding of how these models actually function!