DEEPX NPU SDK – DXNN is a comprehensive software framework that simplifies the development and deployment of deep
learning models for mass-market uses. DXNN provides compilation of inference DNN network models (code generation for DEEPX NPU)
and runtime software (code generation of CPU). This includes software tools for DNN model conversion,
model optimization, quantization, compilation, simulation, and deployment.
DXNN is built for seamless integration
of various DNN models built in AI SW platforms such as TensorFlow, Pytorch, Caffe, etc. DXNN enables users to unleash the power
of Deep Learning by executing the algorithms efficiently using its high-level software abstraction. The SDK
accommodates models built on all popular frameworks and includes an uncompromised model zoo with over 100 models.
FOR DNN MODELS
DEEPX compiler compiles the trained inference DNN models to generate
binaries for DEEPX NPU. The result is an optimized execution code in terms
of accuracy, latency, throughput, and efficiency. The execution binary
efficiently utilizes every element of NPU compute resources for optimizing
power consumption, processing performance, memory bandwidth, and memory
footprints. The SW tool explores numerous different schedules of
NPU operations and picks the best approach to generate an optimized runtime.
DXNN supports automatic quantization of DDN models trained in floating-point format. It receives a model description and representative inputs and automatically quantizes the model to fixed-point data types, thus greatly reducing execution time and increasing power efficiency. The SDK’s quantizer converts trained models from FP32 bit to INT8 or less bit integer representation. The DXNN quantizer provides extremely high AI accuracy in NPU solutions. The AI accuracy of DXNN quantization is almost similar to the level of DNN models in the FP32 bit representation of the GPU or even higher!
WORLD’S TOP OPTIMIZER STREAMLINES THE
MODEL INFERENCE PROCESS
The optimizer of DXNN is in charge of optimizing user DNN models.The optimizer exploits both the traditional optimization technique and the emerging graph-level optimization technique.The optimizer of DXNN can highly reduce an amount of computation without AI accuracy loss.
The optimizer of DXNN aggressively substitutes sub-graph with optimized version of the sub-graph, such as fusing operators or exchanging the order of operators.
USER FRIENDLY HOST COMMUNICATION
The SDK includes Linux (x86/Arm) and Windows (x86) drivers that support
communication between the host and DEEPX NPU. DEEPX’s runtime API
supports commands for model loading, inference execution, passing model
inputs, receiving inference data, and a set of functions to manage the devices.