This paper presents a new approach to implement fast broadcast and multicast operations in bidirectional wormhole Multistage Interconnection Networks (MIN) with loopback, as used in IBM SP1/SP2 network. The novelty lies in using multidestination message passing mechanism instead of single destination (unicast) messages. For broadcast/multicast operation, it is shown that a single worm with multiple destinations is sufficient to allow pipelined replication of flits at appropriate intermediate switches and deliver copies to the required destinations. For higher communication start-up (ts), for an n-processor system, this new approach leads to an asymptotic improvement by a factor of dlog2ne compared to the unicast-based messagepassing. Two schemes for broadcast and multicast are presented together with the necessary architectural supports at a switch-level. Storage requirements at a switch to ensure deadlock freedom are also derived. These schemes are evaluated and compared with the unicast-based message-passing for different values of communication start-up time, link propagation time, switch delay, system size, and switch size. It is shown that the multidestination approach demonstrates superiority and scalability for higher ts and smaller switch size (< 16 ? 16). For example, with ts=10.0 microsec, a broadcast operation on a 4K processor MIN with 4 ? 4 switches and 256 flits of message can be implemented in just 15.8 microsec using multidestination approach compared to 138.4 microsec with unicast message-passing. Similarly, for a 1K processor MIN a multicast operation with arbitrary number of destinations takes a constant time of 20 microsec for a 256 flit message. These results indicate that the proposed scheme can be easily applied to current and future generation multistage systems like SP1/SP2 to provide fast and scalable collective communication operations, as defined by the Message Passing Interface (MPI) standard.