Slide 25 of 30
Notes:
The hardware and software schemes prefetch all consumers which means that some overlap of prefetches is possible in data structures like trees. This overlap is important to better hide memory latencies. Let us take a look at how this works in the context of a pre-order tree traversal.
Our first reference, to node 1 will incur a cache miss. Assuming the hardware tables were setup in a previous traversal, prefetches will be issued for nodes 2 and 3. Because node 2 will be immediately referenced, it will probably incur a partial cache miss. Node 3, however, will not be visited until much later and is therefore likely to be a cache hit. This pattern repeats itself, where right children are expected to be hit on and and left children will be missed on.
Let us now again think about the impact of having a prefetch itself be treated as a producer to cause subsequent prefetches. For example, if the prefetch of node 3, where to cause subsequent prefetches of nodes 6 and 7, and their children, then all cache misses on the right half of the tree would be eliminated.
This effect has not been studied by Roth, Moshovos, and Sohi, although their prefetch technique has a potential of taking advantage of it. In contrast greedy prefetching cannot do this. The challenge, however, is that of proper timing of prefetches. With a very large tree, aggressively prefetching the right half can cause problems due to early prefetching. More work is certainly needed to exploit this.