- About Us
- Vendors
-
|
Storage
Gaming
Audio & Visual
- Our Affiliates
- News & Insights
Tech Blog — Dec 16, 2025
Behind every smooth on-device AI experience lies a hidden process that makes massive, server-trained models run seamlessly on a smartphone chip.
Share with
In general, customers’ AI models trained in the cloud or on servers are very large in size and optimized for GPU operation. To run these types of AI models on the Exynos NPU, it is essential to convert them into on-device AI models through processes such as graph optimization, quantization, and compilation.The On-device AI SDK Toolchain converts the customer’s original AI model into an on-device AI model capable of running in the on-device NPU environment through a lowering process. Ultimately, the AI SDK Toolchain is indispensable for supporting customers’ AI models. However, several technical challenges must be overcome to achieve this:[1] Supports various AI model IRsAs the number and complexity of supported AI models rapidly grows each year, the on-device AI SDK Toolchain must support a variety of response scenarios. By supporting a wide range of AI model IRs such as PyTorch²⁾, ONNX³⁾, TensorFlow⁴⁾, and TFLite⁵⁾, our SDK empowers developers to iterate faster and flexibly adapt. This is what makes for truly agile AI development.[2] Verification Methods for Each Toolchain StageDuring the lowering process of an AI model, the original model is progressively transformed into a hardware-executable model through graph optimization and quantization. It is critical to strengthen verification at each stage to ensure that the accuracy and performance of the original AI model are preserved as much as possible.[3] Advancement of Graph Optimization and Quantization AlgorithmsTo maximize the performance of on-device AI models, it is also necessary to continuously enhance graph optimization techniques and quantization algorithms tailored for highly complex models like LLMs.To this end, Exynos AI Studio — Samsung’s on-device AI SDK — addresses these key technical challenges and offers robust solutions to customers.
The Advancement Strategy for Exynos AI Studio, Exynos’ On-Device SDKSamsung has developed and distributed the Exynos AI Studio SDK to customers to become a global leader in the field of on-device AI, and is preparing for the future with a variety of advancement strategies.
Exynos AI Studio is largely composed of the Exynos AI Studio High Level Toolchain (EHT) and the Exynos AI Studio Low Level Toolchain (ELT). Respectively, these perform advanced graph optimization and quantization at the model level, as well as SoC-specialized algorithms and compilation.EHT takes open-source framework IRs such as ONNX and TFlite as inputs, converts them into an internal IR through the IR Converter, and then modifies the model structure via Graph Optimization to make it suitable for execution on the NPU. Through quantization, it reduces the model size to a level that can run efficiently on-device.ELT carries out lowering operations optimized for each NPU generation, converting the model into a form that’s executable on hardware. Finally, the model passes through the Compiler, generating an on-device AI model that can run on the NPU.
Designing SDK Features To Handle Various AI Model IRsTo enhance the scalability of the SDK, it is essential to support multiple AI model IR formats. Samsung’s SDK currently supports open-source framework IRs such as ONNX and TFLite, and it is developing a strategy to strengthen PyTorch support. In particular, for generative AI models, performing graph optimization and quantization within the PyTorch development environment can minimize unnecessary conversions during model lowering, which enables the delivery of a more stable and efficient SDK.
The output of the EHT module in Exynos AI Studio can be compared with the original model on an operator basis by using the Signal-to-Noise Ratio (SNR) metric through the simulation function. In the simulator, to process quantization information, specific operators are handled with de-quantize and quantize operations before and after inference, enabling computation through fake quantization. The results of the ELT module are validated for accuracy using the emulation function, in a manner similar to EHT verification. Since the emulator performs computations through emulation code that replicates the NPU hardware, it enables more precise validation.
Strategies for Advanced Graph Optimization and Quantization AlgorithmsAs AI models become more complex and larger in size, advancing the graph optimization and quantization algorithms supported by the SDK becomes even more essential.
In the graph optimization stage, processes can be classified into hardware-agnostic and hardware-specific forms. After applying optimizations that are suitable for general computing devices, specific algorithms tailored to the characteristics of the NPU hardware accelerator are executed. The quantization algorithm reduces an AI model trained on servers with fp32 bit width to a size that can run on NPU devices, such as int8, int16, or fp16 bit width. Through advanced graph optimization and quantization algorithms, it becomes possible to perform NPU optimization while preserving the original model’s accuracy performance as much as possible.
Driving the Future of On-Device IntelligenceOn-device AI has moved beyond its technical limitations and is now becoming a practical reality. With its Exynos AI Studio SDK, Samsung is delivering the speed, accuracy, and scalability that tomorrow’s AI demands. This ensures intelligence truly lives where people need it most: in their hands.On a technical level, Samsung’s Exynos AI Studio SDK adopts the structure of an on-device SDK toolchain, performing optimization, quantization, and compilation so that customers’ AI models run effectively on NPU hardware. Going forward, through the execution of comprehensive design and development strategies, the company will continue to hold up its reputation as a global leader in on-device AI technology.