Tuesday, March 27, 2012

Intel's IGP evolution: Ticking and tocking its way to the mainstream



I can still remember the day when I came across the term "bottleneck" for the first time! It was not that long ago- I was stuck with my Pentium IV powered PC which won't run most of my favourite games of that time. Back then I was just a clueless student and thought my CPU was the culprit. But that was not the case as one of my better 'informed' friends rightfully indicated toward my iGPU (Integrated Graphics Processing Unit) which happened to be the lackluster (even by the standards of that time) Intel Extreme graphics. Even though my CPU wasn't the best in class, it still could run those games but my inbuilt graphics was the limiting factor or "bottleneck" as the term implies. Of-course I could've bypassed the whole scenario by simply switching to a separate video card or dGPU (Discrete Graphics Processing Unit) but that would require more space, more power, more hustle and surely more spending which I couldn't afford. This pretty much was the story with most systems back then and while the integrated graphics parts were sufficient to drive the 2D desktop components and images, 3D workloads were beyond both their purpose and capacity. Since then a lot of things have changed in the world of personal computing, but it is only recently that the IGP performance of mainstream PC's has reached a level which is more or less acceptable not only in terms of general computing but also gaming. Today, we'll have a quick look at how Intel's graphics solutions have evolved over time.

Intel and Graphics


Intel has been in a very interesting position in the PC industry for quite sometimes now, when it comes to computer graphics. While the company don't offer any discrete solution (although once it did), it enjoys the largest market share, almost 60% according to reports from Jon Peddie Research, of total computer graphics. This is because of the fact that most of the PCs shipped and sold all over the world features Intel CPU and only a small portion of those make use of a discrete graphics card. AMD and Nvidia fights for the remaining 40% with the later relying only on its discrete solutions (Geforce video cards). AMD currently remains the only company offering both integrated (in APUs) and discrete graphics (Radeons). Coming back to Intel, the irony is - despite of being the largest graphics vendor (at least in numbers if not in revenue), Intel's graphics has always been considered to be the weakest among the lot! The company has designed some of the best x86 microprocessors in the history, has arguably the best manufacturing technology, yet for most of the time Intel failed to deliver acceptable graphics performance. This apparent lack of IGP performance is well known among enthusiast circles and even made its way into Steve Job's biography! My experience was not an exception, it was the very reality.

Intel's problems with graphics can be attributed to its own attitude and culture. With Intel, everything happens around the CPU. CPU is always the king and everything else comes second, including the graphics. Intel's CPU-centric mentality can be somewhat compared with IBM's system-centric attitude. While this has helped Intel to excel in pure x86 CPU performance, the graphics subsystem remained neglected and as a result couldn't catch up with CPU. Also over time, computer graphics has become a degree of magnitude more complex and vastly advanced compared to what it used to be just a decade ago. It's not so easy to design a competitive GPU nowadays without investing significant amount of resource and effort. On the other hand the importance of graphics subsystem has grown tremendously in recent years as many workloads in a modern PC depend heavily on its graphics processing capability. Today, having a fast CPU won't guarantee the overall fastness of a system unless it has a fast enough GPU. AMD realized this and got hold of the necessary graphics expertise when it acquired ATI. But Intel has been rather conservative in its approach and relied mostly on its "Tick-Tock" development model.

The Tick-Tock


Intel Tick-Tock (click to enlarge)

Intel's Tick-Tock cadence has been one of the highlights of the computer industry since its inception. Derived from the principles of Moore's law, Intel's Tick-Tock is a relentless development cycle, where the company is supposed to deliver a new CPU micro-architecture every year and follow it up with a die shrink of the existing process node. Every shrink in the process technology is represented with a Tick and a new architecture is represented by a Tock. Intel has been keeping up with this model for quite sometime now and that's not an easy feat to achieve given the pace at which it has to operate. But the interesting thing here- Tick-Tock not only helped Intel to widen the CPU performance gap with its competitor but also played an equally important role in improving its graphics performance. Note that when we mention graphics, we're not talking about Larrabee- Intel's projected general purpose graphics chip which never quite materialized. For the purpose of this article, we'll focus strictly on Intel's IGP. In CPUs, higher performance can be achieved by switching to a new and more efficient architecture, tweaking the existing one for more throughput, adding new feature sets or simply by increasing the number of hardware resources when moving to a smaller process node. All of these are also true when it comes to the GPU though its mechanism and algorithms vastly differs from that of a CPU. The building-blocks of Intel's GPU are called EUs (Execution Units). A single EU can be compared to a Stream Processor (SP) found in ATI Radeon GPUs (and similarly to a CUDA core in Nvidia Geforce), but only for a reference purpose. These programmable shader-like units along with other fixed-function logic form the very core of Intel's iGPU.


Intel IGP evolution (click to enlarge)

The overall performance of Intel GPUs has improved thus far with almost every tick or tock but with improvement being rather moderate at times. Now lets have a quick run-down

The GMA era


With the introduction of its famous Core architecture back in 2006, Intel reclaimed (and since retained) the CPU performance crown from AMD. Conroe based Core2Duo and Core2Quad CPUs were great products but the (which used to be integrated into the motherboard at that time) performance of Intel's GMA (Graphics Media Accelerator) IGP was far from acceptable. The move to 45nm ('Penryn' Tick) didn't help much either as GPU in motherboard chipset was still being manufactured in an older 65nm fabrication process. At that time AMD 780G/790GX or Nvidia n-force based motherboards were considered to be better IGP/HTPC platforms and used to offer much better graphics performance than Intel's G41/G45 series. For mainstream gaming though, one had to look for a dGPU. With the 45nm 'Tock', Intel launched Nehalem micro-architecture and this was a very significant move from a technical point of view. With the IMC (Integrated Memory controller) built into the CPU die, Nehalem not only introduced technologies like QPI, NUMA and HT (since P4/NetBurst) but also set the bar for CPU performance. But all these architectural improvement did little for the graphics part as the Nehalem based i7 processors lacked any IGP. Up until this point it was the CPU- ripping most of the benefits from "tick-tock", but this trend was about to change.

Clarkdale 

 

Clarkdale was the product of Intel's 32nm Westmere Tick, a die shrink of the existing Nehalem micro-architecture. While it brought some of the Nehalem goodness into the mainstream dual-cores (1st gneration i3, i5), there were other significant changes made to the graphics subsystem.  Most important of those came with the integration of the GPU into the CPU chip instead of the mainboard where it used to reside previously. This was a major step-up from the traditional schematics and one that many thought would rather come from AMD since it was the first company to talk about CPU-GPU integration. But several delays from AMD to deliver its 'Fusion' line-up made sure Intel was the first company to achieve this feat. Interesting thing was that while integrated into the CPU chip, Clarkdale's GPU was not a part of the CPU die itself. So technically it was not a full integration but kind of a multi-chip organisation. The GPU used the QPI (Quick Path Interconnect) to connect to CPU. Also while Clarkdale's CPU part was made in a newer 32nm process node, the integrated GPU was still using 45nm (an upgrade from the 65nm G4x series)- another example of Intel's CPU centric mentality! The underlying architecture of Clarkdale's GPU, which Intel gave a new name - HD Graphics, wasn't much different from the older GMA X4500 but added couple of extra EUs along with some tweaks. It also had improved vertex processing capacity and HDMI support. This new HD graphics from Intel however didn't bring a radical performance improvement as it would lose to AMD's 790/890GX in most of the games, specially in higher resolutions. Here is one very interesting observation about Intel's IGP performance. Generally Intel's iGPUs tend to be heavily dependent on the CPU. As a result it becomes very difficult to determine the pure graphics performance under any workload that is affected by the CPU. Like in low resolution gaming, where Intel's superior CPU performance helps its GPU to achieve a better overall frame rate. But as you scale the resolution and quality-settings higher, the game becomes less CPU-bound and consequently the performance graph of Intel's iGPU goes downward. This is a known phenomenon among gamers and  hardware enthusiasts. Anyway, despite of not being the best solution for IGP gaming, Clarkdale could easily handle Windows Vista or 7 and was a decent choice for an HTPC.

SandyBridge 


Next stop was SandyBridge (SNB), which possibly was the most significant and interesting design from Intel in recent years, especially for the graphic part. SNB was an architectural overhaul and hence a 'Tock' in Intel's book. SNB based CPUs came out early last year and still reigning supreme as far as CPU performance goes, but for our purpose we'll stick to the graphics part. SandyBridge is arguably the first ever x86 CPU with the graphics subsystem (GPU) totally integrated into the processor die unlike the multi-chip package found in Clarkdale. Why we say arguably is because AMD also started to ship its Brazos line of APUs around the same time-frame which also had an on-die GPU. Whatever the case is, SNB marked a shift of focus in Intel's design approach as the integrated GPU not only had its fare share of silicon but also shared the same advanced 32nm process node along with the CPU. The number of Execution Units in high-end SandyBridge parts (i7 2600K, i5 2500K) remained the same (12 EUs) as with Clarkdale, but thanks to an efficient graphics architecture and improved IPC (Instruction Per Cycle) count, the performance delivered was much better than that of Clarkdale. Intel also deployed a hardware decode block better known as "Quick Sync" along with other improved fixed function resources. But the most interesting aspect of SandyBridge graphics architecture has to be it's ability to access the LLC (Last Level Cache, Level 3 in this case). In other words -  the iGPU in SNB, along with the decode/encode block and System Agents (previously known as 'Uncore'), has free access to the L3 cache just like the CPU cores. This was made possible by the implementation of Intel's unique 'Ring Bus' architecture and the result was a low latency, high bandwidth and highly coherent interconnect. Thus the GPU became a true core component and that too in a rather novel way. This was important since, unlike its discrete counterpart, an integrated GPU doesn't have the luxury of dedicated memory - a problem that AMD used to deal by incorporating dedicated graphics memory (side-port memory) into its older motherboards as some of you might remember.

Now combine all these and you have what you can call truly acceptable performance, first time from an Intel IGP may we add! Performance wise SNB's IGP is at least 50% faster than Clarkdale/Arrandale, in some cases even more than that. While not quite mainstream, it delivers much better gaming performance too, good enough to even challenge the likes of Radeon HD5450 - an entry level discrete graphics card! The main competition of SandyBridge graphics, however, is not the discrete cards, rather it's AMD's Llano APUs. Llano APU line-up consists of dual and quad core CPU for both desktops and laptops and just like SNB, features an on-die GPU. AMD's approach towards integration is much more GPU-centric when comparing with Intel's CPU-centric one and has a clear impact on application level. While Llano can't compete with SNB in pure x86 single threaded workloads, it sets the bar for integrated GPU performance thanks to the discrete level Radeon HD parts with full DX11 support. Specially in the games, Llano consistently delivers much higher (2x-3x at times) performance. But still, SNB's IGP is quite impressive a performer on its own, and a precursor of things to come.

IvyBridge  

 

It's no longer a secret that the world of personal computing and communication is undergoing a phase of transformation. While the concepts and principles remains the same, we're witnessing a major shift in areas like usability, functionality and form factor. The recent trend is to deliver smaller and thinner devices without compromising the performance, which in turn calls for smaller, more efficient components. Intel's next gen IvyBridge CPUs are derived from this very notion. IvyBridge (IVB) is the 'Tick' of SandyBridge (SNB), which means it's going to be a die shrink, 22nm in this case, of the SNB architecture and it is chalked for a early April, 2012 release. But there is more to it! IVB is not only going to be the first high performance 22nm microprocessor but also the first one to be manufactured on Tri-gate transistor stacks or 3-D Trigate architecture as Intel calls it. These new and cutting edge 3-d transistors are much more advanced than the traditional Planar or 2-D ones and ensures less energy leakage and faster switching between on/off states. This, Combined with the usual benefits of a smaller process node, will enable IVB to perform at a higher level while power consumption will drop significantly. That's why Intel is calling IvyBridge a "Tick+" instead of just a regular tick! Quite fittingly we believe. But with IVB, it's not the CPU performance that Intel is going after, which is quite understandable considering the fact that there isn't much of a competition in the high-end right now and SNB is doing admirably well. The integrated GPU though is going to get a significant boost, both in feature and functionality. Intel is increasing the number of EUs in IVB to a total of 16, where as SNB had 12. That's for the high-end parts though with the HD4000 nomenclature, for the rest it'll be 8 EUs. Also the graphics architecture is optimized for higher frequency and turbo boost. Not all the details are in as of the time of this writing, so we'll update once the reviews start to fly.

The graphics performance of IvyBridge should be higher and much more polished than its predecessor, specially in games. The full DX11 (after some initial misunderstanding) and OpenGL 3.1 support is there, for the first time in an Intel GPU. According to a preview done by Anandtech, the new HD4000 is about 30% faster than SNB's HD3000 on average. That's a good jump considering it's not an architectural overhaul, but still not enough to beat Llano. But surely Intel is closing in the gaps with AMD as far as graphics performance is concerned. Whether AMD can restore (or even extend) it or not, possibly with the arrival of Trinity, remains to be seen. Also we shouldn't forget that IVB's IGP is an important piece of Intel's Ultrabook puzzle as a discrete graphics card isn't the most elegant and efficient solution for the ultra-portable form factor. And to that purpose IVB seems to be quite potent.

Haswell  


Haswell is the codename of  Intel's future micro-architecture which is supposed to succeed and replace SandyBridge architecture at some point in year 2013. This is going to be the next 'Tock' in the cadence as it'll incorporate a new design. At this point there isn't much information available on architectural tidbits particularly because Intel has remained quite tight-lipped about Haswell. From what we know it'll be based on the same 22nm 3-D trigate process tech as IVB but with even more emphasis on power/thermal efficiency and optimization. Haswell will take integration into a whole new level by moving the PCH (Platform Controller Hub - similar to Llano's FCH or Fusion Controller Hub) into the CPU itself, making it a true single chip solution (with the North-bridge being already integrated). Also recently Intel has confirmed that Haswell will feature Transactional Memory or TCX ( Transactional Synchronization Extensions) according to the chip-maker. Shifting the focus on the graphics subsystem, it looks like there will be three derivatives of the new iGPU - GT1, GT2 and GT3, with the last one being the highest-end part. Haswell's IGP will not only support DX11.1 and OpenGL 3.2, but will also feature enhanced and refined execution units. Additionally it seems almost certain that Intel will increase number of those EUs in Haswell, but to what extent it'll able to do so remains to be seen. SemiAccurate believes that the high-end GT3 would be loaded with no less than 40EUs; a huge step-up if this turns out to be true! On the other hand, though, according to sources from VR-Zone - it'll be 20EUs, a moderate yet impressive increment. 

However we should remember that designing a competitive GPU is kind of a balancing act and increasing the numbers of shaders alone won't guarantee success. There are other notable factors to take into the account, like performance scaling, memory bandwidth, geometry and texture processing and more importantly compatibility and driver support. A good hardware design/logic can always under-perform severely without proper and optimized software support! Intel's track record in providing graphics driver and support has always been less than stellar. It will be interesting to see how much things improve with Ivy and Haswell. 


Intel's long journey to achieve graphics performance parity with its competitors has not always been smooth. It had its ups and downs but most of the time fell short of delivering expected performance. Experts believe this has more to do with the company's attitude and culture itself rather than lack of depth in resource and expertise. It's an undeniable fact that Intel has historically overlooked the importance of computer graphics for most of time . But thankfully that is changing! As we've shown in our analysis, Intel has awaken to the necessity of having better graphics in its products. Whether this realization came from self-assessment or as an effect of the advent of modern graphics technology is a debatable subject, but rest assured Intel will keep up putting more and more resource into designing and manufacturing better GPUs in future. Also if Intel is to popularize the ultra-portable (read Ultrabook) segment, it needs to have good enough graphics performance without depending on discrete solutions from AMD or Nvidia. Is it good enough? Well, that depends on how you see things. While Intel's IGP performance has evolved and matured overtime, it has been an incremental improvement. Right now, it's quite acceptable and capable but far from being regarded as a gold-standard. But what started with SandyBridge, will continue into IvyBridge and fianlly might just reach its full potential with Haswell. Whatever the results, this is going to be a very interesting development, no doubt!


(lots of info and stats for this analysis have been taken from various data sources in the web, like anandtech, wikipedia, techreport and many others.)

No comments: