Slide 18 of 30
Notes:
The second major innovation of Tango is its ability to reduce tag memory lookups. Ordinarily, each prefetch is looked up in the tag memory of the L1 cache. If it hits, the prefetch can be discarded, because the data is already in the cache. On superscalar processors, the pressure on the tag memory can become a bottleneck.
Pinter and Yoaz found that their technique cuts the number of cache tag lookups approximately in half.
To achieve this, they keep a FIFO of the last few tag lookups that were found to hit in the cache. If future prefetches hit in the FIFO, they can be immediately discarded, without requiring a tag lookup.
This appears to be targeted particularly at memory reference patterns with very small stride where many prefetches for the same cache line may be generated in sequence.