|
|||||
Genuine low latency pipeline parallel algorithm for numerical modelling of 1D evolution equations.
Abstract:
We present a novel algorithm for numerical integration of 1D evolutional Partial Differential Equations (PDE). This work is primarily motivated by applications in nonlinear optical fiber communications where generalized Non-Linear Schr\"odinger Equation (NLSE) is used for modelling. A conventional Split-Step Fourier Method (SSFM) is a proven and most efficient numerical technique widely adopted in numerical modelling. Recently, SSFM has been used as an impressive facilitator of digital reconstruction of initial signal by virtue of a Digital Back Propagation (DBP) from a receiver to a er end of a communication line as the most natural and comprehensive compensation of adverse dispersion and nonlinear effects. However, inherent complexity of Discrete Fourier Transform (DFT) for extremely large volumes of sampled data where the number of floating point operations scales as $N \log(N)$ and, most importantly,
the global coupling of meshpoints in DFT algorithm which dictates a relatively large frame data buffers hindered prevented the SSFM-based DBP from becoming a practical real time reconstruction technique with low latency. Our approach is essentially based on an explicit finite difference discretization of the PDE in question, where an algorithm is distilled to an explicit calculation of the unknown field(s) on the next evolution level (tier). Traditionally, evolution tiers are evaluated sequentially. Explicit methods are perfectly suitable for parallel implementation since a meshpoint in the next tier is based on a finite number of neighbors of its predecessor at previous evolution tier(s). Naturally, parallel implementation is trivial when the complete dataset is treated by a parallel unit, e.g. GPU, and a complete evolution tier can be treated simultaneously. Alternatively, a domain decomposition is required and it is often a bottleneck since usually data exchange between specialist units and main computer memory is limited. Our innovation is in reorganizing tier-to-tier explicit calculations in such a way that the way up is made as steep as possible and the evolution advance is performed not in a complete tier-to-tier fashion but in a slanted ant hill fashion when the top meshpoints at higher tiers are built on the narrowest possible base on the ancestor tiers. This approach reduces the need of data exchange between the storage and processing units to barely a few numbers per calculation cycle. Hence, the proposed algorithm requires no intrinsic latency in dealing with frames or data domains. Authors
(no additional information)(no additional information) (no additional information) |
|||||
© 2012, Landau Institute for Theoretical Physics RAS www.itp.ac.ru
Contact webmaster |