TECHNOLOGY

Core competencies
Core
competencies
그래프1
Support
State-of-the-Art
DNN Algorithms
그래프2
Best-In-Class
NPU in
AI Accuracy
그래프2
Innovative
Technology for Minimal
Memory Usage
그래프3
World’s Top Power/
Performance
Efficiency in NPU
그래프4
Easy, Fast Edge
Deployment
of Your AI Model
PURSUING TECHNOLOGIES
PURSUING
TECHNOLOGIES

We saw fast-growing number of loT devices and believed the era of Artificial Intelligence is something inecitable. To contribute our talents to the world,
we are challenging the mose power-efficient and advanced NPC (Neural Processing Unit) for loT devices.


DEEPX KEY DIFFERENTIATORS

The realistic demand for Edge AI SoCs is to deliver GPU-level AI recognition accuracy while handling the latest,
more advanced and complex deep learning algorithms at several watts or even less. DEEPX NPU SoC achieves more than 10 TOPS/W computational
efficiency with SOC power consumption of less than 3 watts while providing higher AI recognition accuracy than Nvidia Jetson.

The realistic demand for Edge AI SoCs is to deliver
GPU-level AI recognition accuracy while handling the
latest, more advanced and complex deep learning
algorithms at several watts or even less.
DEEPX NPU SoC achieves more than 10 TOPS/W
computational efficiency with SOC power consumption
of less than 3 watts while providing higher
AI recognition accuracy than Nvidia Jetson.

1. SUPPORT S-O-T-A DNN ALGORITHMS

Since AI algorithms developed, the algorithms have been getting smarter, lighter and more complex.
In spite of complexity and more computing power required, the developers want to apply the newest algorithms to their applications.
To satisfy the needs, DEEPX optimizes and supports the latest algorithms ever faster than any competitors in the market.

1.
SUPPORT S-O-T-A DNN
ALGORITHMS

Since AI algorithms developed, the algorithms have been getting smarter, lighter and more complex.
In spite of complexity and more computing power required, the developers want to apply the newest algorithms to their applications.
To satisfy the needs, DEEPX optimizes and supports the latest algorithms ever faster than any competitors in the market.

SOTA DNN Algorithms
  • densenet121
  • densenet169
  • densenet201
  • googlenet
  • mnasnet
  • mobilenet
  • mobilenetv2
  • resnet101
  • resnet152
  • resnet18
  • resnet50
  • etc
  • resnext50
  • squeezenet
  • SSD
  • YOLOv3
  • YOLOv4
  • YOLOv5
  • YOLOv7
  • YOLOx
  • EfficientNet
  • EfficientDet
  • resnet34

2. TRULY STUNNING AI ACCURACY BEYOND THEORETICAL LIMITS2.
TRULY STUNNING AI ACCURACY
BEYOND THEORETICAL LIMITS

  • These days many AI application DNN models are produced by GPUs. However, GPUs consume orders of magnitude power consumption that leads to overheating and reliability issues. To open the door of AI, we need cost-effective Hardware solutions when it comes to mass production and DEEPX will provide the most cost-effective and powerful AI chip for the key in the AI era.
  • Many AI processors compress FP32 into Int8 in order to run AI algorithms with high power-efficiency. But they usually lose the accuracy around
    5-10% in comparison with the accuracy of GPU solution using FP32 bit. Unlike other AI processors on the market, DEEPX compresses FP32 into Int8 with
    the AC accuracy level of GPUs. This innovative AI technology will catalyze the pervasive AI era.
DEEPX NPU PERFORMANCE – INFERENCE ACCURACY(mAP)
DEEPX NPU PERFORMANCE –
INFERENCE ACCURACY(mAP)

3. Minimal DRAM access and footprints of NPUs
3.
Minimal DRAM access
and footprints of NPUs

Most NPUs require a lot of DRAM access for AI processing. DRAM access consumes high power and takes a substantially long time to move data.
Therefore, reducing DRAM access greatly helps to improve the energy consumption and performance of NPUs. DEEPX has developed an innovative
technology that dramatically lowers DRAM access for AI processing. Not only GPU, but also have dramatically lower DRAM access than global
competitors.Additionally, this further helps to lower the BOM of application systems by reducing the number of DRAM devices in real applications.

4. The most power-efficient and powerful performing NPU
4.
The most power-efficient and
powerful performing NPU

  • DEEPX’s NPU technology outperforms all global top tier competitors in terms of power consumption-to performance ratio and effective performance efficiency.
  • Most NPUs require a lot of DRAM access. DRAM access consumes more power and takes a substantially long time to move data.
    Therefore, reducing DRAM access greatly helps to improve the energy consumption and performance of NPUs.
    DEEPX has developed an innovative technology that dramatically lowers DRAM access for AI processing. Not only GPU, but also have much lower DRAM
    access than global competitors. This further helps to lower the BOM of application systems by reducing the number of DRAM devices in real applications.
  • For power consumption-to performance ratio, we now have comparison data based on EDA tool based estimation results.
    We will share details of comparison results based on the measurement of fabricated SoCs in early next year.
THROUGHPUT(FPS)/TOPS

그래프5

60 FPS/TOPS
TOPS/W

그래프6

2023. 2Q Ready
DEEPX NPU PERFORMANCE – RESNET50
그래프7
그래프7
5. Easy and Fast NPU Software Development Kit (DXNN)
with Integrated Development Environment

The HW architecture is not only important to run an AI Algorithm efficiently but also SW stack shall come along.
DEEPX’s DXNN compiler supports various soft framework PyTorch, Tensorflow, Caffe, like other competitors.
And the most important thing is that it allows developers to port AI models to the DEEPX NPU easy and fast.
DXNN with IDE will automatically facilitate model conversion, quantization, model optimization, compilation, debugging, and profiling analysis of the model operation in NPUs.

그래프10
그래프11
그래프12

MLPerf
Benchmark Result

Learn more

MLPerf
Benchmark Result

Learn more

MLPerf
Benchmark Result

Learn more

APPLICATION TESTThe Video illustrates how fast 50,000 pictures from the Imagenet 2012 dataset are inferred
by the DEEPX NPU in an FPGA device.
The demo system is composed of a Windows PC and Xilinx Alveo U250 FPGA board.
The FPGA-based implementation running at 320Mhz demonstrates 600 IPS
(1.6 ms per inference) for the MobileNet Version 1 in the MLPerf AI benchmark category.
The performance implemented in an ASIC will become over three times higher without any
NPU design change because of simple clock frequency improvement
(top-ranked in the MLPerf AI benchmark).
Applications
01Consumer
Electronics
Robot
Vacuums
Smart TV
Smart
Air conditioner
02Smart
Mobility
Self-driving car
Drone
AMR
(Autonomous
Mobile Robot)
03Automotive
ADAS
Driver Monitoring
System
Infotainment
04VR/AR
Entertainment
Device
Enterprise
Device
Persona
Assistant
05AI SERVER
Data Center
Edge Server
On-Premise