More Throughput on Tough Models, 
less $, Less Watts


See how reconfigurability enables high efficiency inference acceleration

Read Cheng Wang’s AI Hardware Summit talk HERE

InferX Software makes AI Inference easy!

See Jeremy Roberson’s Linley presentation HERE. Watch the video of the presentation HERE

X1M allows you to put high performance AI inference anywhere

See Cheng Wang’s Linley presentation HERE. Watch the video of the presentation HERE

InferX X1 Delivers More Throughput/$ Than Tesla T4, Xavier NX and Jetson TX2

InferX X1 is running neural network model layers. We are bringing up YOLOv3 as the first demonstration – stay tuned.

October 28th at the Linley Fall Processor Conference, Cheng Wang disclosed pricing, availability and the roadmap for PCIe and M.2 boards based on the InferX X1 chip, discussed in Cheng’s October 20th talk.  The X1P1 PCIe board is running now and will be sampling in Q1 and in production Q2: $499 pricing for 800MHz, down to $399.  See Cheng’s presentation and a board product brief below.

October 20th at the Linley Fall Processor Conference, Cheng Wang disclosed benchmarks that show InferX X1 outperforms Nvidia Xavier NX on YOLOv3 and real customer models. We are working with lead customers now; sampling new volume customers in Q1; production in Q2. 

The InferX Reconfigurable Tensor Processor architecture is 3 to 18 times more efficient on throughput/mm2 than Nvidia GPU. 

InferX X1 1KU pricing ranges from $99 to $199 depending on speed grade (Nvidia Xavier NX is $399).  We want to expand the market and bring high performance inference to higher volume, lower price point systems so we are announcing 1MU (one million unit) pricing from $34 to $69 depending on speed grade. 

See the details in Cheng’s October 20th Linley presentation below. Also an X1 chip product brief and InferX software product brief.

Cheng will present our PCIe/M.2 board products/roadmaps, availability and pricing on October 28th at the Linley Fall Processor Conference.

InferX/nnMAX Performance Estimation Demo

Click HERE to watch the video.  If you have a volume application and a neural network model in TensorFlow Lite or ONNX, we can benchmark your model on InferX X1. Contact our Sales VP, Andy Jaros, at

nnMAX also excels at key DSP functions

See the April 7, 2020 press release here.

View the April 7th Linley Processor Conference presentation on nnMAX for DSP here

Read Microprocessor Report’s article here


April 2021: at the Linley Spring Processor Conference we presented updates on our easy to use Inference Compiler Software and our low power M.2 Inference Board: see “buttons” to the left for the slides.

The InferX X1 Edge Inference Co-Processor is optimized for large models and large models at batch=1. It’s price/performance is 10-100x better than existing edge inference solutions. InferX X1 is programmed using TensorFlow Lite and ONNX and our software is easy to use.

nnMAX™ Reconfigurable Tensor Processor

High Precision, Modular & Scalable

nnMAX Reconfigurable Tensor Processor has 64 1-dimensional tensor processors which are reconfigured (in 4 millionths of a second) layer by layer for high utilization and high throughput/$. nnMAX is programmed with TensorFlow Lite and ONNX. Numerics supported are INT8, INT16 and BFloat16 and can be mixed layer by layer to maximize prediction accuracy. INT8/16 activations are processed at full rate; BFloat16 at half rate. Hardware converts between INT and BFloat as needed layer by layer. 3×3 Convolutions of Stride 1 are accelerated by Winograd hardware: YOLOv3 runs 1.7x faster. This is done at full precision. 

nnMAX has excellent Inference Efficiency, delivering more throughput on tough models for less $, less watts.

nnMAX is also excellent for DSP.

TOPS is a misleading marketing metric. It is the number of MACs times the frequency: it is a peak number. Having a lot of MACs increases cost but only delivers throughput if the rest of the architecture is right.

The right metric to focus on is Throughput: for your model, your image size, your batch size.  Even ResNet-50 is a better indicator of throughput than TOPS (ResNet-50 is not the best benchmark because of it’s small image size: real applications process megapixel images). Inference Efficiency is achieved by getting the most throughput for the least cost (and power).

In the absence of cost information we can get a sense of throughput/$ by plotting throughput/TOPS, throughput/number of DRAMs & throughput/MB of SRAM: the most efficient architecture will need to get good throughput from each of these major cost factors. See our Inference Efficiency slides for more information.