External GPU with Thunderbolt 3

The AKiTiO Node Thunderbolt 3 eGFX Box is a 7Kg black metal powered external enclosure for connecting a full-size graphical processing unit (GPU) to any computer with a Thunderbolt 3 port. The front 12cm fan can be removed to make room for a water cooling radiator.

It arrived in a large box. Dell XPS 13 for scale.
Unboxed.
Includes cables, cable ties, screws, and stickers.
The included Thunderbolt 3 cable is very short.
Two 8-pin power cables and plenty of room inside.
NVIDIA GTX 680

The mesh side window is for cold air intake, not visibility.
According to the Windows 10 Thunderbolt driver software, this is not supported.
NVIDIA helpfully tells me I have a GPU, but it isn’t doing anything when I run programs on the laptop’s inbuilt display.

Thunderbolt 3

Unlike other external GPU (eGPU) solutions which provide conveniences like additional USB ports and Ethernet, the Node provides no extra features. This may actually be good for performance by eliminating competition for bandwidth on the Thunderbolt 3 cable, which is limited to 40 Gigabit per second (Gbps) including either 2 or 4 lanes of PCI Express 3.0 (approximately 985MB/s per lane, or up to 32Gbps).

I will be testing the Node connected to a Dell XPS 13 9350 with Intel Core i7-6560U CPU, Intel Iris Graphics 540 GPU, and 8GB DDR3 RAM. The Thunderbolt 3 driver software says my system does not support external GPU.

The GPU is a NVIDIA GeForce GTX 680 4GB. For comparison I have also borrowed a GTX Titan 6GB. Both of these are officially unsupported for use in an external configuration according to NVIDIA and AKiTiO.

GTX Titan

But why?

Once I was a PC gamer. I played Deus Ex: Human Revolution on three 1080p monitors with an overclocked water-cooled AMD Phenom II CPU, a pair of AMD Radeon GPUs, and two WD Velociraptors in RAID. My current laptop exceeds the performance of that system in nearly every way (except the GPU), in a much smaller package, while consuming significantly less electricity. Something about Moore’s law. Today’s large power-hungry desktop PCs will be surpassed by tomorrow’s laptops and next year’s smartphones.

One good reason to buy a top-of-the-line gaming PC is new display technology like 4K, HDR, high frame rate, and of course Virtual Reality. None of these interest me at this time – I do most of my gaming on a large 1080p 60Hz LCD TV (set to ‘game mode’ to reduce latency) while I sit on the couch with a Steam Controller. When I’m not gaming, I need a slim laptop with plenty of battery life. The Node allows me to harness more gaming power when I’m at home, without the extra weight of a specialised gaming laptop.

The Node also provides a nice upgrade path – I can easily replace the GPU later, and it can be used with any other Thunderbolt 3 computer such as Intel’s NUC Skull Canyon.

Starting up

After updating Windows 10 and the laptop’s drivers and BIOS, then updating the Node’s firmware, then installing the latest NVIDIA driver, the Node just works! I was surprised that no other steps were required. However, booting (or rebooting) the laptop while the Node is connected and switched on nearly always causes the laptop’s charging light to flash an error code in white and orange LED before shutting down. I have to unplug the Node until after the laptop has begun booting. This seems like a problem with the laptop rather than the Node, maybe a future BIOS update will fix it?
Update: This is fixed in the 1.4.17 laptop BIOS.

Although the NVIDIA driver software can see the GPU when I plug it in, it won’t actually do anything without a display connected to it. Plugging in a HDMI dummy device unlocked the full suite of settings in the NVIDIA control panel and made the GPU available to other applications for use as a co-processor with OpenCL or CUDA.

In all of these tests, higher scores are better.

I have previously tested the GTX 680 and Titan in an AMD PC connected to a 16-lane PCIe 2.0 slot on the motherboard. The older PCIe 2.0 standard provides 500MB/s per lane for a maximum transfer rate of 7.8GB/s, while 4-lane PCIe 3.0 Thunderbolt 3 provides up to 3.8GB/s. OpenCl performance in LuxMark was exactly the same as before because it compares processing performance after the data has been copied into GPU RAM, and is not restricted by PCIe speed.

In these next tests I have also compared the Titan in Double Precision mode (DP) which increases its 64-bit floating point operations per second (FLOPS) at the expense of lower 32-bit FLOPS.

More than one terraflop of FP64 performance on the Titan.

While the computation performance is similar, transferring data to and from the GPU over Thunderbolt 3 is significantly slower than when it’s connected to a PCIe slot on a motherboard. Hopefully this won’t have too much impact on real gaming performance.

Dell lied?

Let’s take a look at that transfer speed. Both GPUs reported a rate of about 1.4GB/s, lower than the 5.8GB/s reported when connected to the motherboard, but also much lower than the 3.8GB/s I expected from Thunderbolt 3.

CUDA-Z confirms it. 1.4GB/s is half the speed I expected from Thunderbolt 3.

Other owners of the XPS 13 9350 have alleged that Dell lied about the Thunderbolt 3 port being capable of 40Gbps, and instead used a half-speed implementation with only 2 lanes of PCIe 3.0 – a maximum speed of 1.9GB/s (about 16Gbps). That is consistent with my results.

Did Dell mislead customers with their up to 40Gbps claim? Well, not really…

Lies?

HWiNFO shows that the Intel DSL6340 Thunderbolt Bridge is capable of 4 lanes of PCIe 3.0 (x4) but it is connected to the Skylake-U CPU by only 2 lanes. That explains the speed problem.

The x4 bus is connected via the x2 bus.

Taking a look at Intel’s spec sheet for this CPU we can see why. The supported PCIe lane configurations are: four x1, or two x2, or two x1 and one x2 and one x4. In this case the third option has been used. The two x1 lanes connect to WiFi and the SD card reader, the x2 lanes connect the Thunderbolt Bridge, and the x4 lanes connect the NVMe controller for the internal Solid State Drive (SSD).

According to the Thunderbolt 3 Technology Brief, calculating the total bandwidth of the interface can be complicated. PCIe is just one of the things this magical port can do. For example:

Two 4K 60Hz displays and one USB 3.1 device will use about 40Gbps of bandwidth.

The Intel Iris 540 GPU in this laptop is capable of running two displays at 4K 30Hz or one at 4K 60Hz, which requires about 15Gbps of interface bandwidth on the Thunderbolt 3 cable.

Let’s do the math:

USB 3.1 = 10Gbps
One 4k 60Hz display = 15Gbps
PCIe 3.0 x2 = 16Gbps
Total possible bandwidth = 41Gbps

So Dell’s decision to use only 2 PCIe lanes instead of 4 for their Thunderbolt 3 Bridge does not violate their claim that the Thunderbolt 3 interface is capable of “up to 40Gbps”. You just need to drive a 4K display from the inbuilt GPU, and also use a Thunderbolt PCIe device, and also a high-end USB 3.1 storage drive all from the one cable.

That PCIe bandwidth…

Dell’s decision to allocate 4 lanes of PCIe to the hard disk allows upgradeability that many users would appreciate more than extra Thunderbolt 3 bandwidth. At least I can have high-speed storage, right?

The fastest NVMe SSD today is capable of up to 3,500MB/s read speed, which is pretty close to the limit of PCIe 3.0 x4. Unfortunately the Samsung PM951 NVMe SSD that came with this laptop reports a read speed less than x2 according to CrystalDiskMark:

This hard disk is connected to PCIe x4 but runs at x2 speed.

Replacing that with the faster Samsung 960 Evo – which is supposedly capable of 3200 MB/s read and 1900 MB/s write – revealed something I cannot explain: The NVMe slot is running at the speed of 2 PCIe lanes instead of 4, despite HWiNFO indicating that it has 4 lanes available.

At least the write speed is improved.

So Thunderbolt 3 PCIe devices and NVMe drives are both running at half their potential speed in the XPS 13 9350. What the hell, Dell?

Optimus Prime

Graphics benchmarks from Unigine and 3DMark claimed to be using the external GPU, however the results show the GTX 680 performing slightly worse than the internal Iris 540 in every test.

Slightly worse in every way.

So what’s going on here? It is common to find two GPUs inside a laptop, for example a low-power Intel GPU plus a high-performance NVIDIA or AMD GPU also on the motherboard but not directly connected to the internal display. The more powerful GPU is automatically called when needed by the driver software thanks to Intel Switchable Graphics working with NVIDIA Optimus or AMD XConnect. This is clearly not working for me, probably because the GTX 680 is not on the list of GPUs that work with NVIDIA Optimus.

External display required

Connecting an external display to the eGPU – and using that instead of the internal display – allowed applications to use the eGPU as the primary display device. This would not be necessary with a newer GPU.

These tests are not a direct comparison because the AMD FX-8350 CPU used in the motherboard was more powerful than the Intel i7-6560U CPU in this laptop, so those scores should be a bit higher.

The Titan was rendering more than 1000 frames per second in this simple graphics test.
The Titan in the Node surpassed the GTX 680 on the motherboard, proving that Thunderbolt is not limiting performance in this test.
Unigine Heaven benchmark

These results indicate that despite the restricted interface bandwidth, both GPUs performed similarly in the Node than when they were in a motherboard, and much better than the laptop’s inbuilt GPU on benchmark tests simulating game graphics and physics.

Finally, I’ll compare the GTX 680 with the Laptop’s inbuilt Iris 540 to see how much it improves performance (frames per second) in real games.

Batman: Arkham Knight would not run on the Iris GPU.
The GTX 680 is too old to run Deus Ex: Mankind Divided at a high frame rate.

30 frames per second (fps) is considered the minimum ‘playable’ rate for gaming, and 60 fps is preferred.

The AKiTiO Node and GTX 680 provides enough extra processing power to play many modern high-end games at an acceptable visual quality on the XPS 13. Great success!

Mankind Divided, 1080p Low quality