Unveiling new USA-made M.2 AI acceleration card, featuring decoupled AI architecture.
Gateworks joins forces with NXP to unveil new USA-made M.2 AI acceleration card, featuring decoupled AI architecture.
(Source: Gateworks)
In collaboration with NXP, Gateworks is releasing a new M.2 AI Acceleration Card, the GW16168, with NXP's passively cooled Discrete NPU (DNPU), the Ara240. Designed, tested and assembled in the USA, the GW16168 is built to industrial-grade standards with an emphasis on their "Decoupled AI Architecture" philosophy.
At Embedded World, Gateworks and NXP are announced The GW16168 AI acceleration card. Using and deploying your own AI is a costly process, whether you are talking about time, manpower or even financial costs. Incorporating newer AI acceleration hardware often entails re-thinking your entire hardware stack, from Single Board Computers (SBCs) to custom cooling systems. This is typically, with current market options, a costly and complex process. A process which is exacerbated by the frequent need to replace or update hardware. Gateworks have spotted this gap in the market and introduced the GW16168, to remove some of the hurdles that engineers and businesses tackle when deciding to run in-house AI.
"Decoupled AI Architecture" based design philosophy
"We are ending the era where you must choose your entire compute platform based on the AI chip". A powerful sentiment from the team at Gateworks. The design decisions from the engineers at Gateworks have been made to decouple their M.2 card from specific hardware or environmental constraints; from the power profile, the M.2 2280 M-Key form factor and passively cooled Ara240 DNPU.
So what are these changes, and why do they matter?
Normally, there are not many options for hardware to support high-performance AI. You were forced to choose between repurposed GPUs that required a full system redesign or running inference directly on embedded CPUs and NPUs at the cost of severe thermal limits and high latency. Earlier USB and M.2 accelerators offered a more modular path at the [large] cost of limited compute and memory capacity. This left developers with an expensive balancing act where they must consider performance, power consumption and flexibility, often sacrificing one or more in the process.
Gateworks' new M.2 card revives the modularity of earlier M.2 accelerators while significantly advancing the underlying technology. Future upgrades and revisions no longer require replacing otherwise capable industrial SBCs. For example, dedicated AI acceleration can be added directly to platforms such as the i.MX 8M Plus or i.MX 95 applications processors via the M.2 interface. Typically, these same SBC systems would reach 100% utilization when running inference workloads, but not with the GW16168. With its 16GB of LPDDR4 memory, the GW16168 allows these tasks to be offloaded to the card, freeing the host CPU to focus on system logic and I/O. As an added benefit, the common out-of-memory errors when trying to run Vision transformers or LLMs on standard edge modules are no longer an issue.
Through collaboration with NXP, the GW16168 is backed by the mature Ara240 SDK ecosystem, offering a full compiler toolchain, support for TensorFlow, PyTorch and ONNX, and integrated model-conversion utilities that simplify the transition from existing AI models to edge deployment. "The ARA SDK & Compiler could be looked at as an abstraction layer", said by the CTO at Gateworks. He continues, "It acts as the middleware between high-level AI frameworks like PyTorch or TensorFlow and the proprietary NXP Ara hardware. The SDK handles model conversion, quantization, graph optimization, etc., to simplify software development". This is the key behind the modularity of the GW16168, however, even with a working full stack, there might be other reasons to consider the GW16168.
"Gateworks' GW16168 illustrates exactly why decoupled AI architectures are the future of edge computing. By combining NXP's Ara240 DNPU with Gateworks' industrial-grade design, customers can scale AI performance without redesigning their entire hardware platform. This brings flexibility, longevity and cost efficiency to real-world AI deployments." Said by Ravi Annavajjhala, Vice President and General Manager, Neural Processing Units, NXP Semiconductors.
Thanks to the GW16168, you might just lose your best fans
One of the biggest challenges in AI deployment is thermal management. High-performance AI systems can draw significant power, with demand often spiking during complex tensor operations. As a result, thermals frequently become the limiting factor, especially in space-constrained industrial designs where advanced cooling solutions can quickly become costly and impractical. Gateworks has designed its M.2 card with this in mind using a passively cooled Ara240 DNPU together with carefully engineered power circuitry, to enable a typical power consumption of 6.6 W. This lower power envelope reduces heat build-up, enabling reliable operation in sealed, fanless environments while maintaining thermal characteristics aligned with industrial-grade AI hardware. Gateworkalso reports a decade-long lifespan for the GW16168 modules, with advanced thermal management reducing wear on the modules.
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
The Performance
The GW16168 is designed to enhance overall capability rather than simply rebalance it. Delivering up to 40 eTOPS, the module reaches what can reasonably be described as "GPU-class" AI performance within a far smaller power envelope. Rather than following the traditional limitations of current or legacy edge accelerators, the design focuses on sustained throughput, supported by ruggedized power delivery that maintains stability even during peak inference loads approaching 40 TOPS.