Superscalar Challenges
- Lookahead PC (LA PC) is not fast enough when several instructions are issued every cycle
- Cache tag can become a bottleneck because the rate at which prefetches and memory references are issued increases
Notes:
However, stride prefetchers have not typically been evaluated in the context of a modern superscalar processor that can issue several instructions per cycle. Pinter and Yoaz have done exactly this and proposed, Tango, an improved stride prefetching scheme targeted at superscalar processors.
The first problem is that superscalar processors can execute multiple instructions per cycle. A lookahead PC that is only incremented by one every cycle may therefore not be able to advance ahead of the PC fast enough to be of much benefit.
Secondly, an increased execution rate also increases the number of memory instructions and prefetches that are issued every cycle. Since both memory instructions and prefetches must check the cache for a hit or miss, the cache tags may become a bottleneck.
Tango addresses both of these issues.