Research Papers: Techniques and Procedures

Computational Fluid Dynamics Computations Using a Preconditioned Krylov Solver on Graphical Processing Units

[+] Author and Article Information
Amit Amritkar

Department of Mechanical Engineering;
Department of Mathematics,
Virginia Tech,
226 SEB,
635 Prices Fork Road,
Blacksburg, VA 24061
e-mail: amritkar@vt.edu

Danesh Tafti

Department of Mechanical Engineering,
Virginia Tech,
213E SEB,
635 Prices Fork Road,
Blacksburg, VA 24061
e-mail: dtafti@exchange.vt.edu

1Corresponding author.

Contributed by the Fluids Engineering Division of ASME for publication in the JOURNAL OF FLUIDS ENGINEERING. Manuscript received August 25, 2014; final manuscript received July 22, 2015; published online August 21, 2015. Assoc. Editor: Zhongquan Charlie Zheng.

J. Fluids Eng 138(1), 011402 (Aug 21, 2015) (6 pages) Paper No: FE-14-1469; doi: 10.1115/1.4031159 History: Received August 25, 2014; Revised July 22, 2015

Graphical processing unit (GPU) computation in recent years has seen extensive growth due to advancement in both hardware and software stack. This has led to increase in the use of GPUs as accelerators across a broad spectrum of applications. This work deals with the use of general purpose GPUs for performing computational fluid dynamics (CFD) computations. The paper discusses strategies and findings on porting a large multifunctional CFD code to the GPU architecture. Within this framework, the most compute intensive segment of the software, the BiCGStab linear solver using additive Schwarz block preconditioners with point Jacobi iterative smoothing is optimized for the GPU platform using various techniques in CUDA Fortran. Representative turbulent channel and pipe flow are investigated for validation and benchmarking purposes. Both single and double precision calculations are highlighted. For a modest single block grid of 64 × 64 × 64, the turbulent channel flow computations showed a speedup of about eightfold in double precision and more than 13-fold for single precision on the NVIDIA Tesla GPU over a serial run on an Intel central processing unit (CPU). For the pipe flow consisting of 1.78 × 106 grid cells distributed over 36 mesh blocks, the gains were more modest at 4.5 and 6.5 for double and single precision, respectively.

Copyright © 2016 by ASME
Your Session has timed out. Please sign back in to continue.


Vanka, S. P. , 2013, “ 2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units,” ASME J. Fluids Eng., 135(6), p. 061401.
Thibault, J. C. , and Senocak, I. , 2009, “ CUDA Implementation of a Navier–Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows,” AIAA Paper No. 2009-758.
Jacobsen, D. A. , Thibault, J. C. , and Senocak, I. , 2010, “ An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters,” AIAA Paper No. 2010-522.
Jacobsen, D. A. , and Senocak, I. , 2011, “ A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters,” AIAA Paper No. 2011-946.
Jacobsen, D. A. , and Senocak, I. , 2011, “ Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism,” AIAA Paper No. 2011-947.
Elsen, E. , LeGresley, P. , and Darve, E. , 2008, “ Large Calculation of the Flow Over a Hypersonic Vehicle Using a GPU,” J. Comput. Phys., 227(24), pp. 10148–10161. [CrossRef]
Cohen, J. , and Molemaker, M. J. , 2009, “ A Fast Double Precision CFD Code Using CUDA,” Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, R. Biswas , ed., DEStech Publications, Lancaster, PA, pp. 414–429.
Kampolis, I. , Trompoukis, X. , Asouti, V. , and Giannakoglou, K. , 2010, “ CFD-Based Analysis and Two-Level Aerodynamic Optimization on Graphics Processing Units,” Comput. Methods Appl. Mech. Eng., 199(9), pp. 712–722. [CrossRef]
Patnaik, G. , Corrigan, A. , Obenschain, K. , Schwer, D. , and Fyfe, D. , 2012, “ Efficient Utilization of a CPU–GPU Cluster,” AIAA Paper No. 2012-0563.
Corrigan, A. , Camelli, F. F. , Löhner, R. , and Wallin, J. , 2011, “ Running Unstructured Grid‐Based CFD Solvers on Modern Graphics Hardware,” Int. J. Numer. Methods Fluids, 66(2), pp. 221–229. [CrossRef]
Le, H. P. , and Cambier, J.-L. , 2011, “ Development of a Flow Solver With Complex Kinetics on the Graphic Processing Units,” AIAA Paper No. 2012-721.
Chandar, D. D. , Sitaraman, J. , and Mavriplis, D. , 2012, “ GPU Parallelization of an Unstructured Overset Grid Incompressible Navier–Stokes Solver for Moving Bodies,” AIAA Paper No. 2012-0723.
Shimokawabe, T. , Aoki, T. , Muroi, C. , Ishida, J. , Kawano, K. , Endo, T. , Nukada, A. , Maruyama, N. , and Matsuoka, S. , 2010, “ An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code,” IEEE 2010 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), New Orleans, LA, Nov. 13–19, pp. 1–11.
Phillips, E. H. , and Fatica, M. , 2010, “ Implementing the Himeno Benchmark With CUDA on GPU Clusters,” IEEE International Symposium on Parallel and Distributed Processing (IPDPS), Atlanta, GA, Apr. 19–23, pp. 1–10.
Xiong, Q. , Li, B. , Zhou, G. , Fang, X. , Xu, J. , Wang, J. , He, X. , Wang, X. , Wang, L. , Ge, W. , and Li, J. , 2012, “ Large-Scale DNS of Gas–Solid Flows on Mole-8.5,” Chem. Eng. Sci., 71, pp. 422–430. [CrossRef]
Tafti, D. K. , 2001, “ GenIDLEST—A Scalable Parallel Computational Tool for Simulating Complex Turbulent Flows,” ASME–IMECE, American Society of Mechanical Engineers, pp. 347–356.
Tafti, D. , and Vanka, S. , 1991, “ A Numerical Study of the Effects of Spanwise Rotation on Turbulent Channel Flow,” Phys. Fluids A: Fluid Dyn., 3(4), p. 642. [CrossRef]
Tafti, D. K. , 2011, “ Time-Accurate Techniques for Turbulent Heat Transfer Analysis in Complex Geometries,” Computational Fluid Dynamics and Heat Transfer, R. Amano and B. Sunden , eds., WIT Press, Southampton, UK, pp. 217–264.
Gopalakrishnan, P. , and Tafti, D. K. , 2009, “ A Parallel Boundary Fitted Dynamic Mesh Solver for Applications to Flapping Flight,” Comput. Fluids, 38(8), pp. 1592–1607. [CrossRef]
Gopalakrishnan, P. , and Tafti, D. K. , 2010, “ Effect of Wing Flexibility on Lift and Thrust Production in Flapping Flight,” AIAA J., 48(5), pp. 865–877. [CrossRef]
Nagendra, K. , Tafti, D. , and Viswanath, K. , 2014, “ A New Approach for Conjugate Heat Transfer Problems Using Immersed Boundary Method for Curvilinear Grid Based Solvers,” J. Comput. Phys., 267, pp. 225–246. [CrossRef]
Amritkar, A. , Deb, S. , and Tafti, D. , 2014, “ Efficient Parallel CFD–DEM Simulations Using OpenMP,” J. Comput. Phys., 256, pp. 501–519. [CrossRef]
Amritkar, A. , Tafti, D. , Liu, R. , Kufrin, R. , and Chapman, B. , 2012, “ OpenMP Parallelism for Fluid and Fluid-Particulate Systems,” Parallel Comput., 38(9), pp. 501–517. [CrossRef]
Tafti, D. K. , 1995, “ A Study of Krylov Methods for the Solution of the Pressure-Poisson Equation on the CM-5,” Numerical Developments in CFD, pp. 1–8.
Van der Vorst, H. A. , 2003, Iterative Krylov Methods for Large Linear Systems, Cambridge University Press, Cambridge, UK.
Wang, G. , and Tafti, D. K. , 1999, “ Performance Enhancement on Microprocessors With Hierarchical Memory Systems for Solving Large Sparse Linear Systems,” Int. J. High Performance Comput. Appl., 13(1), pp. 63–79. [CrossRef]
Sathre, P. , Amritkar, A. , Chivukula, S. , Hou, K. , Tafti, D. , and Feng, W.-C. , 2015, “ A Methodology for Concurrent Co-Design of Accelerator-Aware Applications,” (submitted).
Baboulin, M. , Buttari, A. , Dongarra, J. , Kurzak, J. , Langou, J. , Langou, J. , Luszczek, P. , and Tomov, S. , 2009, “ Accelerating Scientific Computations With Mixed Precision Algorithms,” Comput. Phys. Commun., 180(12), pp. 2526–2533. [CrossRef]
Clark, M. A. , Babich, R. , Barros, K. , Brower, R. C. , and Rebbi, C. , 2010, “ Solving Lattice QCD Systems of Equations Using Mixed Precision Solvers on GPUs,” Comput. Phys. Commun., 181(9), pp. 1517–1528. [CrossRef]
Jespersen, D. C. , 2010, “ Acceleration of a CFD Code With a GPU,” Sci. Program., 18(3), pp. 193–201.
Moser, R. D. , Kim, J. , and Mansour, N. N. , 1999, “ Direct Numerical Simulation of Turbulent Channel Flow Up to Re = 590,” Phys. Fluids, 11(4), p. 943. [CrossRef]
den Toonder, J. M. J. , and Nieuwstadt, F. T. M. , 1997, “ Reynolds Number Effects in a Turbulent Pipe Flow for Low to Moderate Re,” Phys. Fluids, 9(11), pp. 3398–3409. [CrossRef]


Grahic Jump Location
Fig. 1

Data distribution for multilevel parallelism used in GenIDLEST

Grahic Jump Location
Fig. 2

RMS flow velocities along and perpendicular to the flow direction plotted against nondimensional channel half height starting from the center of the channel

Grahic Jump Location
Fig. 3

Radial variation (y+ starting from the pipe wall) in time averaged velocity along flow direction

Grahic Jump Location
Fig. 4

RMS flow velocity variation along the radial direction (y+) starting from the pipe wall




Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In