Slide 15 of 30
Notes:
However, stride prefetchers have not typically been evaluated in the context of a modern superscalar processor that can issue several instructions per cycle. Pinter and Yoaz have done exactly this and proposed, Tango, an improved stride prefetching scheme targeted at superscalar processors.
The first problem is that superscalar processors can execute multiple instructions per cycle. A lookahead PC that is only incremented by one every cycle may therefore not be able to advance ahead of the PC fast enough to be of much benefit.
Secondly, an increased execution rate also increases the number of memory instructions and prefetches that are issued every cycle. Since both memory instructions and prefetches must check the cache for a hit or miss, the cache tags may become a bottleneck.
Tango addresses both of these issues.