More Throughput on Tough Models,
InferX X1 Delivers More Throughput/$ Than Tesla T4, Xavier NX and Jetson TX2
InferX X1 is running neural network model layers. We are bringing up YOLOv3 as the first demonstration – stay tuned.
October 28th at the Linley Fall Processor Conference, Cheng Wang disclosed pricing, availability and the roadmap for PCIe and M.2 boards based on the InferX X1 chip, discussed in Cheng’s October 20th talk. The X1P1 PCIe board is running now and will be sampling in Q1 and in production Q2: $499 pricing for 800MHz, down to $399. See Cheng’s presentation and a board product brief below.
October 20th at the Linley Fall Processor Conference, Cheng Wang disclosed benchmarks that show InferX X1 outperforms Nvidia Xavier NX on YOLOv3 and real customer models. We are working with lead customers now; sampling new volume customers in Q1; production in Q2.
The InferX Reconfigurable Tensor Processor architecture is 3 to 18 times more efficient on throughput/mm2 than Nvidia GPU.
InferX X1 1KU pricing ranges from $99 to $199 depending on speed grade (Nvidia Xavier NX is $399). We want to expand the market and bring high performance inference to higher volume, lower price point systems so we are announcing 1MU (one million unit) pricing from $34 to $69 depending on speed grade.
See the details in Cheng’s October 20th Linley presentation below. Also an X1 chip product brief and InferX software product brief.
Cheng will present our PCIe/M.2 board products/roadmaps, availability and pricing on October 28th at the Linley Fall Processor Conference.
InferX/nnMAX Performance Estimation Demo
Click HERE to watch the video. If you have a volume application and a neural network model in TensorFlow Lite or ONNX, we can benchmark your model on InferX X1. Contact our Sales VP, Andy Jaros, at firstname.lastname@example.org
nnMAX also excels at key DSP functions
InferX™ X1 Edge Inference Co-Processor
High Throughput, Low Cost, Low Power
October 20th, Linley Fall Processor Conference: Cheng Wang introduced InferX X1, the fastest and most efficient edge AI inference processor. X1 is up and working running neural network model layers. InferX X1 is faster than Nvidia Xavier NX at much lower pricing. See benchmarks, pricing and availability in the slides to the left.
On October 28th, also at the Linley Conference, Cheng presented the pricing and availability of our PCIe and M.2 boards as well as on our software. See the presentation and the board product brief on the left.
The InferX X1 Edge Inference Co-Processor is optimized for large models and large models at batch=1. It’s price/performance is 10-100x better than existing edge inference solutions. InferX X1 is programmed using TensorFlow Lite and ONNX – we can benchmark your neural network models now.
nnMAX™ Reconfigurable Tensor Processor
High Precision, Modular & Scalable
nnMAX Reconfigurable Tensor Processor has 64 1-dimensional tensor processors which are reconfigured (in 4 millionths of a second) layer by layer for high utilization and high throughput/$. nnMAX is programmed with TensorFlow Lite and ONNX. Numerics supported are INT8, INT16 and BFloat16 and can be mixed layer by layer to maximize prediction accuracy. INT8/16 activations are processed at full rate; BFloat16 at half rate. Hardware converts between INT and BFloat as needed layer by layer. 3×3 Convolutions of Stride 1 are accelerated by Winograd hardware: YOLOv3 runs 1.7x faster. This is done at full precision.
nnMAX has excellent Inference Efficiency, delivering more throughput on tough models for less $, less watts.
nnMAX is also excellent for DSP.
Think Inference Throughput/$, not TOPS
TOPS is a misleading marketing metric. It is the number of MACs times the frequency: it is a peak number. Having a lot of MACs increases cost but only delivers throughput if the rest of the architecture is right.
The right metric to focus on is Throughput: for your model, your image size, your batch size. Even ResNet-50 is a better indicator of throughput than TOPS (ResNet-50 is not the best benchmark because of it’s small image size: real applications process megapixel images). Inference Efficiency is achieved by getting the most throughput for the least cost (and power).
In the absence of cost information we can get a sense of throughput/$ by plotting throughput/TOPS, throughput/number of DRAMs & throughput/MB of SRAM: the most efficient architecture will need to get good throughput from each of these major cost factors. See our Inference Efficiency slides for more information.