Title: Future-pointing vectors Opinion, published in March 1996 issue of "@CSC@" (a journal published by the Centre of Scientific Computing Finland at that time) by Pekka Janhunen FMI/GEO POB 503, FIN-00101, Helsinki Pekka.Janhunen@fmi.fi --- In this text I try to shed some light on the ongoing debate and competition between vector supercomputers and the so-called massively parallel (MPP) machines. A typical situation is that the user community wants vector machines while computing centers would be eager to move to MPP's. The vector machine was originally designed for scientific computation whereas the microprocessor (usually RISC but also CISC; the difference between the two has all but disappeared) was designed to run popular word processing etc. applications quickly. This implies that the market for the microprocessor is orders of magnitude larger. But it also suggests that microprocessors are not necessarily very good in scientific computation because that was never a design goal. Nowadays the most important distinction between vector and scalar processors is the way memory references are handled. Scalar machines are based on a memory hierarchy with one or more levels of cache memory. This is efficient for all small problems and some types of large problems. Vector machines have interleaved memory banks. They are fast for moderate and large problems whenever bank conflicts can be avoided. Usually this means that powers of two should be avoided as a stride when traversing an array. It just happens to be the case that most scientific problems can avoid significant bank conflicts, but many problems are essentially impossible to program in such a way that a memory hierarchy would be efficient. The notable exception is, of course, dense linear algebra which can be programmed efficiently on any relevant machine. The performance loss due to overflowing the cache can be quite dramatic on RISC machines. When this happens, computing speed usually drops below ten megaflops. The performance when processing large arrays need not correlate with the theoretical peak performance. See http://perelandra.cms.udel.edu:80/~mccalpin/hpc/stream/index.html for performance data. One of the ingenious things that happens inside a vector machine is the nearly independent processing of scalar and vector instruction streams. In practice this means that the bookkeeping scalar instructions in many cases execute with zero overhead. Indeed, the scalar execution unit need not be very fast and still the vector pipelines can work at full steam. This situation is to be contrasted with RISC processors where overlapping of instructions is very limited and all extra instructions increase the execution time a little bit. A vector processor is, essentially, an easy-to-program parallel SIMD computer. Memory references and computations are overlapped to bring about a tenfold speed increase. This performance boost is of the same order of magnitude than what can be achieved in typical MPP applications: codes that are more than 90 percent parallelizable are not very common. In addition, it is also possible to increase the vector processor performance further by adding more execution units or increasing the vector length (the pipelining depth). For instance, some versions of NEC vector processors have equally many execution units as four C90 processors. These machines are efficient in regular problems where long vectors can be used. Correspondingly, one can add more processors to MPPs, but only few applications can benefit from these. The purpose of these considerations was to point out that there is no natural law that limits the power of a single processing unit, even though the speed of light and other factors may limit the clock frequency. If maximum performance is wanted, parallelism is the only way up, but the parallelism must be exploited at every level. It includes the pipelining of memory references and calculations, as well as parallel execution units within one processor. Exploiting only the last and the most trivial stage of parallelism by adding more processor can obviously not yield the best result, because Amdahl's law is at work at every stage. Based on this view we can conclude that vector processors have a bright future. Some day they will be mass products. ______________________________________________________________ Pekka Janhunen tel (+358) 0 1929 535 Finnish Meteorological Institute fax (+358) 0 1929 539 Department of Geophysics tlx 124436 EFKL FI P.O.Box 503, FIN-00101 Helsinki, Finland Internet email : Pekka.Janhunen@fmi.fi x.400 : /C=fi/ADMD=Mailnet/PRMD=IL/SUR=Janhunen/GIV=Pekka/ WWW home page (GEO) : http://www.geo.fmi.fi NEW!!!! http://www.geo.fmi.fi/prog/tela.html ______________________________________________________________