Slide 23 of 30
Notes:
Let us briefly review this process. There are three steps that lead to prefetching in this hardware, recursive prefetcher.
First, an execution of a memory instruction will be recorded as a potential producer. Second, a later execution of a memory instruction will match a consumer to this producer and their relationship is entered in the correlation table. Third, when the producer is executed again, a prefetch is issued for the consumer.
Since we are talking about linked data structures, a consumer is frequently also another producer. That begs the question if a prefetch itself should be looked up in the correlation table to issue yet another prefetch. Roth, Moshovos, and Sohi deliberately dont do this in their hardware scheme.
They argue that there is no substantial benefit when traversing a linked list. Consider this: If the execution time from producer to consumer is larger than the memory latency, then we can fully hide the memory latency. Prefetching further ahead will provide no benefit. If, however, execution time from producer to consumer is less than the memory latency, then we cannot fully hide the memory latency. When the prefetch completes, the consumer is already waiting and it makes no difference if the next prefetch is issued by the consumer or the prefetch.
There is nothing wrong with this argument, however it does not hold for tree traversals. We will get back to this in just a minute.