Abstract
Development and application of the open-source GPU-based fluid-thermal simulation code, NekRS, are described. Time advancement is based on an efficient kth-order accurate timesplit formulation coupled with scalable iterative solvers. Spatial discretization is based on the high-order spectral element method (SEM), which affords the use of fast, low-memory, matrix-free operator evaluation. Recent developments include support for nonconforming meshes using overset grids and for GPU-based Lagrangian particle tracking. Results of large-eddy simulations of atmospheric boundary layers for wind-energy applications as well as extensive nuclear energy applications are presented.
1 Introduction
NekRS [1] is a new exascale-ready thermal-fluids solver for incompressible and low-Mach number flows. Like its predecessor, Nek5000 [2], it is based on the spectral element method (SEM, Ref. [3]) and is scalable to 105 processors and beyond [4–6]. The principal differences between Nek5000 and NekRS are that the former is based on F77/C, coupled with MPI and fast matrix–matrix product routines for performance, while the latter is based on C++ coupled with MPI and open concurrent compute abstraction (OCCA) kernels for GPU support. OCCA [7,8] provides backends for CUDA, HIP, OpenCL, and DPC++, for performance portability across all the major GPU vendors. As we detail later, the key NekRS kernels are realizing anywhere from 2- to 12-TFLOPS per accelerator, with sustained performance >0.9 TFLOPS (FP64) on the NVIDIA A100-based Polaris platform at the Argonne Leadership Computing Facility (ALCF). For the pressure Poisson solver, which is critical to the performance of incompressible flow simulations, NekRS supports a wide variety of preconditioners tailored to the computational characteristics of GPU-based supercomputers [9]. While the feature list of NekRS is not as long as that for the 35+ year-old Nek5000 code, the number of GPU-supported features is rapidly increasing, including: moving meshes through an arbitrary Lagrangian–Eulerian formulation and an overset-grid capability; large eddy simulation (LES) subgrid-scale models [10], unsteady Reynolds-averaged Navier–Stokes (RANS) based on a k–τ formulation [11]; and Lagrangian/Stokes particle tracking [12]. We note that, at 80% parallel efficiency, which corresponds to 2–3 × 106 points per MPI rank on the NVIDIA V100 and AMD MI250X based hardware, NekRS typically realizes 0.1–0.3 s of wall clock time per time-step. These runtimes are nearly a 3× improvement over production runs of Nek5000 at a similar 80% strong-scale limit on the IBM BG/Q, Mira, which was the state-of-the-art for scalability at the start of the NekRS development [5,13,14].
In the following, we review some performance-critical features of NekRS and describe recent large-scale applications. The paper is organized as follows. Section 2 describes the governing equations and discretization formulations in space and time. Section 3 discusses the simulation capabilities. Section 4 demonstrates nuclear energy applications. Section 5 discusses wind energy applications. We then provide some conclusions in Sec. 6.
2 Formulation
2.1 Temporal Discretization.
For uniform , we have = [1] for k = 1, [2, –1] for k = 2, and [3, −1,3] for k = 3. For the semidiscretization in time, we drop the residual terms and use Eqs. (3) and (4) in Eq. (1). All other terms are evaluated implicitly at time tm.
Subject to appropriate boundary conditions on each component of . In the case of mixed boundary conditions or variable viscosity, we replace Eq. (9) by a system in which the velocity components are coupled. In either case, the viscous system will be diagonally dominant for modest to high Reynolds number applications given that . For these cases, Jacobi preconditioned conjugate gradient iteration is an efficient solution strategy for the fully discretized form of Eq. (9).
2.2 Spatial Discretization.
With nodal points ξj taken to be the Gauss–Lobatto–Legendre (GLL) quadrature points, this basis is stable and does not exhibit the Runge phenomenon often associated with high-order polynomials (e.g., Ref. [18]). Production values of N for NekRS are typically in the range 5–10, but we have run single element cases with Nek5000 up to N = 100. As noted in the early work of Orszag [19], restriction to tensor-product forms permits the use of fast matrix-free formulations in which operator evaluation is effected in operations and memory references for an E-element mesh, even in the case of deformed geometries. The number of grid points is , so the storage is only O(n) and the work only O(Nn), which is in sharp contrast to the standard FEM, where storage and work complexities typically compel a restriction to N < 4.
where is the vector of unknown basis coefficients, is the stiffness matrix, and is the mass matrix, which is diagonal because of the use of GLL quadrature.
where for d = 2, , and , with is the identity and is the dense differentiation matrix having entries . Each is a diagonal matrix comprising a combination of quadrature weights, Jacobians, and metric terms from the map [20]. Defining block-diag(Ae), we can formally write . Note that application of Di and amount to efficient tensor contractions that are simultaneously applicable to all elements. For parallel efficiency, all variables are stored in local form so that they can readily be differentiated, integrated, etc., which leads to the local only form . This approach requires redundant storage of shared interface values but allows Q and QT to be combined into a single gather-scatter (exchange-and-sum) operation when computing the matrix–vector product .
![Two-dimensional illustration of fundamental SEM mesh configurations for E = 2, N = 4: (left) global mesh, reflecting degrees-of-freedom u¯=[u1 … un]T; (center) collection of spectral elements, Ωe, each with local degrees-of-freedom, u¯e=uije; and (right) reference domain for local coordinates (r,s)∈Ω̂:=[−1,1]2. In this example, u¯L=Qu¯ would copy the values on the shared interface in Ω to the corresponding local edges on Ω1 and Ω2 (e.g., u5 would be copied to u4,01 and u0,02). Application of QT would yield the sum of the values on the shared interface (e.g., w¯=QTu¯L would yield w5=u4,01+u0,02).](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/fluidsengineering/146/4/10.1115_1.4064659/1/m_fe_146_04_041105_f001.png?Expires=1745961828&Signature=4RqCz8yI7~dlQ9TyzMuxS3IiUx80KnezTCvm3HSluDvu1HHdwRaPYaJvM8YOsacod0p8LLOp4uWjhXvFRr7ZdH3HQ8FkOlfeq7tSkvrGyX0HpwofCoYVc1lnWTMOMP4-HZvUfhAAfy~Xis~G-hSgbFIdFKiuXV1ir3KfjiKxgI-lK2OpJ0rTR52O7EOpTwzJhhElQ9tu0J5diOMWRqlZ7MHSaS8kxdyYIC3fYGX7CXyScFATpd8NoAUMIRrmisMXqe7wS89cdlD2Bafd33c4R99tA6JloVafqSvATfCza2nH2KOpPNk056VNUL4irLh2EI~UN6VFVMTSPJsrAc901g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Two-dimensional illustration of fundamental SEM mesh configurations for E = 2, N = 4: (left) global mesh, reflecting degrees-of-freedom ; (center) collection of spectral elements, , each with local degrees-of-freedom, ; and (right) reference domain for local coordinates . In this example, would copy the values on the shared interface in Ω to the corresponding local edges on and (e.g., u5 would be copied to and ). Application of QT would yield the sum of the values on the shared interface (e.g., would yield ).
![Two-dimensional illustration of fundamental SEM mesh configurations for E = 2, N = 4: (left) global mesh, reflecting degrees-of-freedom u¯=[u1 … un]T; (center) collection of spectral elements, Ωe, each with local degrees-of-freedom, u¯e=uije; and (right) reference domain for local coordinates (r,s)∈Ω̂:=[−1,1]2. In this example, u¯L=Qu¯ would copy the values on the shared interface in Ω to the corresponding local edges on Ω1 and Ω2 (e.g., u5 would be copied to u4,01 and u0,02). Application of QT would yield the sum of the values on the shared interface (e.g., w¯=QTu¯L would yield w5=u4,01+u0,02).](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/fluidsengineering/146/4/10.1115_1.4064659/1/m_fe_146_04_041105_f001.png?Expires=1745961828&Signature=4RqCz8yI7~dlQ9TyzMuxS3IiUx80KnezTCvm3HSluDvu1HHdwRaPYaJvM8YOsacod0p8LLOp4uWjhXvFRr7ZdH3HQ8FkOlfeq7tSkvrGyX0HpwofCoYVc1lnWTMOMP4-HZvUfhAAfy~Xis~G-hSgbFIdFKiuXV1ir3KfjiKxgI-lK2OpJ0rTR52O7EOpTwzJhhElQ9tu0J5diOMWRqlZ7MHSaS8kxdyYIC3fYGX7CXyScFATpd8NoAUMIRrmisMXqe7wS89cdlD2Bafd33c4R99tA6JloVafqSvATfCza2nH2KOpPNk056VNUL4irLh2EI~UN6VFVMTSPJsrAc901g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Two-dimensional illustration of fundamental SEM mesh configurations for E = 2, N = 4: (left) global mesh, reflecting degrees-of-freedom ; (center) collection of spectral elements, , each with local degrees-of-freedom, ; and (right) reference domain for local coordinates . In this example, would copy the values on the shared interface in Ω to the corresponding local edges on and (e.g., u5 would be copied to and ). Application of QT would yield the sum of the values on the shared interface (e.g., would yield ).
2.3 Performance Considerations.
Each Navier–Stokes time-step requires three principal substeps: advection; pressure-correction; and viscous solve.
and approximating the material derivative on the left using kth-order backward differencing along the characteristics that are incident to each nodal point, the characteristics scheme is effectively implicit. The requisite terms in the BDFk expansion can be found tracing backward along the characteristics [21] or by solving a sequence of linear hyperbolic advection problems using (say) fourth-order Runge–Kutta, which permits [22,23]. For stability reasons [24], the advection terms, , are de-aliased by using Gauss–Legendre quadrature points in each direction, which can make hyperbolic subproblems relatively expensive. Thus, a increase in might yield only a reduction in Navier–Stokes solution time, with some of the cost coming from an increase in the number of advection substeps and some coming from an increase in number of pressure iterations on each step.
The viscous problems (9) can be efficiently solved with diagonally preconditioned conjugate-gradient iterations and are generally solved in a block form which allows QQT to be applied to a vector field involving rather than separate scalar fields, , which reduces message latency overhead by a factor of three.
The pressure Poisson solve (6) is intrinsically the stiffest substep as it represents the action of (infinitely) fast acoustic waves. NekRS uses GMRES with a variety of pressure preconditioners. The default is to use p-multigrid with local Schwarz based smoothing [9,25] coupled with fourth-kind Chebyshev acceleration [26]. It is sometimes effective to precondition the SEM operators with spectrally equivalent low-order FEM meshes [27–29]. This idea, which dates back to the seminal work of Orszag [19], is also supported in NekRS and is useful for cases with ill-conditioned meshes [9].
All of the key kernels support multiple OCCA versions that are just-in-time compiled with fixed array dimensions, which significantly improves compiler performance. The kernels are tested in the run-time setup phase and NekRS selects the fastest version; different versions may be used for different data sizes (e.g., the p-multigrid solver for elements of varying order with a different kernel for each). Figure 2 shows typical entries from a NekRS logfile.
Similarly, there are several ways to effective nearest-neighbor exchanges for the gather-scatter operation, QQT. Data can be packed on the GPU and exchanged using either GPU-direct or via the host, or it might be packed and exchanged solely through the host. Again, NekRS will time and select the fastest option during the gather-scatter setup of each operation. An additional option that is autoselected at setup is whether to overlap communication and computation by first applying operators to elements that are on the subdomain surface (i.e., those that require nonlocal data), initiating the exchange with neighboring processes, completing operator evaluation for interior elements, and finally completing the gather-scatter for the surface elements upon receipt of nonlocal data. Covering communication in this way typically results in a 10–15% savings in runtime. Another 15% gain is typically realized by using reduced (FP32) precision in the pressure preconditioning steps, which is an idea that has been pursued in other high-order codes as well (e.g., Ref. [30]).
We remark that all of the optimizations are designed to be effective in the strong-scale limit, meaning that the simulation is using as many GPUs as are effective. Through numerous tests, we have found that NekRS requires –4 × 106 points per GPU to realize 80% parallel efficiency; using fewer points per rank by using more GPUs leads to efficiencies that are generally deemed as unacceptable. At the 80% operating point, NekRS typically requires only 0.1–0.4 s per time-step, regardless of the overall problem size [13,14]. We demonstrate strong-scaling results to illustrate these points using the atmospheric boundary-layer test case on the V100-based Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) (see Fig. 19 in Sec. 5).
Figure 2 shows some of the logfile entries for a typical NekRS run on the NVIDIA A100-based Polaris at the ALCF. The case is a 17 × 17 rod-bundle configuration with 97 M gridpoints (E = 277,000, N = 7) running on P = 32 GPUs (M points per rank). The table indicates that kernel version (kv) 2 is selected for the FP64 implementation of the kernel when N = 7, but kv = 5 for N = 3. For FP32, the fastest kernel versions are, respectively, kv = 4 and 2, each of which sustains the expected performance over their FP64 counterparts. The fdm kernels implement the fast-diagonalization method [9,20], which is used to solve the local problems in the overlapping Schwarz preconditioner. The sustained FLOPS rate, reported for all 32 ranks, corresponds to 0.816 TFLOPS per rank. This value is computed by using a 1/2 flop for each FP32 floating point operation, so that the aggregate result provides a reasonable estimate of sustained FP64 performance. Further details regarding NekRS performance optimization can be found in Refs. [5] and [9].
In addition to performance gains, NekRS preserves the high-order convergence properties exhibited by Nek5000, which are intrinsic to the rapidly convergent SEM discretization. Nek5000 has had extensive verification studies for problems with analytical solutions (e.g., Refs. [20], [31], and [32]) and validation for more challenging turbulent flows (e.g., Refs. [33] and [34]). NekRS has been thoroughly tested to match the baseline Nek5000 results on all of its features, including the RANS models.
2.4 Mesh Restrictions.
NekRS like its predecessor Nek5000 requires unstructured conformal high-order hexahedral elements. That is the domain is discretized with high-order hexahedra, whose edges can employ a variety of curvature descriptions. The most general description of the edge curvature is a HEX20 representation where, in addition to the vertices of the hexahedron, the midpoints of each edge are provided. Only conformal meshes are supported, where the face of each hexahedron can be shared only between at most two elements and no hanging nodes are allowed. For complex geometries where a block-structured approach is difficult to design, two strategies are typically applied:
3 NekRS Simulation Capabilities
An important utility in Nek5000 that has recently been ported to NekRS is the findpts() library, which allows for fast, robust, general-purpose, and scalable interpolation of data fields. The user interface for this library entails specification of one or more query points on each MPI rank, where is rank dependent, followed by a call to findpts(), which returns the list of MPI ranks, element numbers, and local coordinates that correspond to the query point set, . Subsequent calls to findpts_eval(u) will return to the querying processor(s) values of a prescribed scalar or vector field at . Knowledge about intermediate data (e.g., the processor or element number that contains ) is not required by either the user or developer to retrieve a set of values. findpts() uses hash tables to rapidly identify candidate elements/processors followed by a Newton iteration to find the closest point in Ω for each . All interpolation is fully Nth-order using the underlying SEM basis and all communication requires only time. findpts() is central to several developments in Nek5000/RS, including Lagrangian particle tracking and support for overset grids.
Particle Tracking. NekRS currently supports one-way Lagrangian particle tracking (e.g., as in Fig. 17) to update particle positions given a function of the form , where the user provides f that is typically a function of the fluid velocity at and thus requires off-grid interpolation. Updates to the position and velocity are based on third-order Adams–Bashforth using third-order Runge–Kutta to bootstrap the process in the first few timesteps. For large (load balanced) particle counts it was shown in Ref. [12] that the particle update time on 60 NVIDIA V100 s of OLCF's Summit is equal to the Navier–Stokes solution time if the number of particles is of the number of grid points in the flow simulation. Realization of this result requires the use of particle migration, where each particle is moved to the processor where the element containing the particle resides, so that interprocessor communication is not needed for (many) subsequent particle updates. Without migration, the update cost is about larger. Through the ppiclf() library [36], Nek5000 supports two-way particle-fluid interaction and even four-way coupling that accounts for particle–particle interactions, and a NekRS port of these features is underway.
Nonconforming Overset Grids. One approach to simplifying the meshing process for complex geometries is to use overset grids that divide the domain into multiple, easier to mesh regions [37–40]. Furthermore, dividing the domain into multiple meshes can simplify simulations that have moving components by meshing individual parts that only move relative to the other subdomains. Overset grids can be viewed as a Schwarz overlapping method applied to a single step of the full Navier–Stokes equations in which the solution in each subdomain is advanced independently, with boundary conditions in the overlapping regions interpolated from the solution in the overlapping neighbors at the previous time-step. This straightforward approach provides accuracy that can be improved to higher order using predictor–corrector iterations [41].
Implementation of an overlapped mesh scheme requires interpolating the fluid velocity and any scalars, such as temperature, at each time-step, and multiple times per step if correction iterations are added. Additionally, changes to the mesh during the simulation, such as for moving machinery, requires recomputing the interpolation points at every time-step. Thus, using overlapped meshes requires scalable interpolation routines.
The overset grid support requires resolution of several other technical issues relating to mass conservation, timestepping, and parallelism which have been addressed in Nek5000/RS [41,42] and in earlier efforts by Merrill et al. [40,43]. For a GPU-based NS solver, however, it is also critical to have fast interpolation on the GPU, such as provided by the routines described in Ref. [12].
3.1 Example Validation Study for Turbulent Mixing.
NekRS has been extensively validated for a variety of flows. The present section comprehensively studies turbulent mixing occurring in a large enclosure. We perform LES using NekRS with reference to the Michigan Multi-Jet Gas-Mixture Dome (MiGaDome) [44], a scaled high-temperature gas reactor experiment developed at the University of Michigan. Here we report a validation study for a single-jet injection [45]. Once the numerical model is proven accurate, we briefly investigate the discharge configurations leading to jet instabilities. Hence, the proposed numerical campaign provides valuable physical insights for understanding mixing effects occurring in such a configuration type.
3.1.1 Numerical Model and Flow Conditions.
The MiGaDome facility operates with gas and aims to investigate the mixing effects of turbulent jets in an enclosed dome. The experimental data are obtained with time-averaged particle image velocimetry such that the test section is manufactured as a single body of clear cast acrylic, which provides optical clearance to the interior volume of the fluid.
Figure 3 provides an overview of the setup. First, Fig. 3(a) presents the geometry of the test section, which includes six inlet pipes and 12 others in the periphery as outlets. Following that, Fig. 3(b) shows the computational mesh, which contains approximately 435,000 hexahedral elements developed using the open-source mesh code Gmsh [46]. Given the high resolution of the mesh, the GLL discretization is only detailed in the jet region to improve the quality of the image.

The geometry and the mesh considered in the LES of MiGaDome: (a) geometry of the MiGaDome experiment, the facility can operate with multiple jet configurations and flow conditions and (b) the mesh employed for simulating MiGaDome experiments
Dimensionless LES has been performed on the OLCF Summit supercomputer for different jet configurations. The jets are defined by Reynolds numbers based on their bulk velocity and the inlet pipe diameter. The simulations ensure a fully developed turbulent incoming jet by employing a recycling boundary condition that maps the velocity field after 10D downstream to the inlet pipe back to its start.
The model has been validated for a single jet case at Re = 4097 [45]. Simulations were performed for this case with the mesh from Fig. 3(b) by employing polynomials up to the sixth-order, i.e., N = 7, and approximately a total of 150 × 106 of grid points. This configuration ensures at least one point near walls at and five within [47]. Besides that, it resolves the Taylor microscale everywhere in the domain. At this scale, viscosity significantly affects the dynamics of turbulent eddies in the flow. For this reason, this criterion is often recommended for designing meshes for LES [48].
Further, the developed model is employed for testing multiple jet configurations to scrutinize jet instabilities. While different discharging schemes have been exploited, the configuration of three jets evenly distributed in the azimuthal direction is reported here as a representative case. In this setup, the jets form an angle of with its neighbor at fixed Reynolds numbers of Re = 4097.
3.1.2 Results.
Figure 4 presents the instantaneous velocity field as well as select comparisons with experimental data for the single jet injection case [44]. In Fig. 4(a), one can visualize the shear layer as the jet evolves upward. Further, the vectors shown in this figure stem from the mean velocity field and allow for identifying flow recirculation patterns. As will be discussed later, identifying these flow patterns is paramount to explaining the stability of jets when dealing with multiple discharge configurations.
![Simulations of single jet experiment in MiGaDome [45]: (a) instantaneous velocity magnitude and (b) averages ⟨v⟩ and root mean squares (RMS) vrms of the upward velocity component v as well as the shear component ⟨uv⟩. The maximum velocity of the first profile Vm is employed to normalize the statistical quantities.](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/fluidsengineering/146/4/10.1115_1.4064659/1/m_fe_146_04_041105_f004.png?Expires=1745961828&Signature=W62Z3dsOSi88rBN5ZrAYuFDdRRZcZ5HwoHPh2CeUunqWXSYv-YtzfIGDli823lu6WT9rFkalRhxoGXDteXHd7cAKK-JYV8IiLMR~VoGVi8Byhx3ulhQs3PvutrOm3IYFAWOdnZIl61F~1XCFmF9cr0BnjWd8wYH2WPzN9M4XdSuOjN-5K9qkOyzRXXiV4dm8LnUrkDVo4Y81Fx92SZgLTgsZOQJqqJnAglo7sHNTBT4nefkexNp0OpI6cYHoN6xAaFKqTiHqrBUS1TYoP4loSYAhrzK9KWtmL-ytLyfNb2KXTGZXDAg8s1KvY8Y9wkUVq-b2T9FopC7QkqXVZOQm3Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Simulations of single jet experiment in MiGaDome [45]: (a) instantaneous velocity magnitude and (b) averages ⟨v⟩ and root mean squares (RMS) vrms of the upward velocity component v as well as the shear component ⟨uv⟩. The maximum velocity of the first profile Vm is employed to normalize the statistical quantities.
![Simulations of single jet experiment in MiGaDome [45]: (a) instantaneous velocity magnitude and (b) averages ⟨v⟩ and root mean squares (RMS) vrms of the upward velocity component v as well as the shear component ⟨uv⟩. The maximum velocity of the first profile Vm is employed to normalize the statistical quantities.](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/fluidsengineering/146/4/10.1115_1.4064659/1/m_fe_146_04_041105_f004.png?Expires=1745961828&Signature=W62Z3dsOSi88rBN5ZrAYuFDdRRZcZ5HwoHPh2CeUunqWXSYv-YtzfIGDli823lu6WT9rFkalRhxoGXDteXHd7cAKK-JYV8IiLMR~VoGVi8Byhx3ulhQs3PvutrOm3IYFAWOdnZIl61F~1XCFmF9cr0BnjWd8wYH2WPzN9M4XdSuOjN-5K9qkOyzRXXiV4dm8LnUrkDVo4Y81Fx92SZgLTgsZOQJqqJnAglo7sHNTBT4nefkexNp0OpI6cYHoN6xAaFKqTiHqrBUS1TYoP4loSYAhrzK9KWtmL-ytLyfNb2KXTGZXDAg8s1KvY8Y9wkUVq-b2T9FopC7QkqXVZOQm3Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Simulations of single jet experiment in MiGaDome [45]: (a) instantaneous velocity magnitude and (b) averages ⟨v⟩ and root mean squares (RMS) vrms of the upward velocity component v as well as the shear component ⟨uv⟩. The maximum velocity of the first profile Vm is employed to normalize the statistical quantities.
As part of a validation study, first- and second-order statistics are reported in Fig. 4(b) for both NekRS predictions and experimental measurements in the single jet case. Line profiles over different heights are compared along the x direction with the origin at the jet centerline (i.e., x = 0). Comparisons are only provided for half of the jet while the flow is symmetric in the x direction, and spatial averages were also computed.
The reported results show that the jet profile development generally agrees between experiment measurements and LES results. Different polynomial orders were considered when performing the LES to demonstrate mesh convergence. It should be noted that Ref. [45] presents additional comparisons with experiments as part of another validation study of NekRS on turbulent mixing. Besides MiGaDome, this work also compares LES predictions considering a different scaled-down high-temperature gas reactor facility from Texas A&M [49].
As a complementary study, the triple jet injection case showcases that the flow patterns previously discussed in the validation study (Fig. 4(a)) potentially lead to the formation of jet instabilities. Figure 5 shows this fact by presenting the magnitude velocity field at two different instants.

Triple jet injection: instabilities occur due to flow pattern interactions in the enclosure: (a) instantaneous velocity magnitude at instant t0 and (b) instantaneous velocity magnitude 20 convective units after t0
One clearly observes that the turbulent jets in Fig. 5(b) are suppressed compared to their previous state from Fig. 5(a). This fact is largely explained by the interactions among the recirculation zones induced by each of the jets. Beyond that the same complex flow interactions also contribute to improving mixing across the hemisphere. Such an enhancement is obvious when comparing the central region of the dome between Fig. 4(a) (single jet case) and plots shown in Fig. 5 on x = 0 planes.
Interestingly, the above mentioned effects are even more intense when all six inlet pipes are employed, i.e., in a full injection scheme. Ultimately, these analyses show that LES can provide valuable insights into the physical process parallel to experiments. As a continuous validation process, future endeavor aims at characterizing and comparing the flow instabilities predicted by the numerical model with experiments.
4 Nuclear Energy Applications
This section discusses our collaborative efforts between ECP's Exascale small modular reactor (ExaSMR) and Co-design Center for Efficient Exascale Discretizations teams. We present simulation capabilities for ExaSMR geometries and performance analysis for neutronics and thermal hydraulics on pre-exascale and exascale systems using ExaSMR's Exascale Nuclear Reactor Investigative COde (ENRICO) platform consisting of OpenMC [50], Shift [51], and NekRS. It is noteworthy to highlight that both OpenMC and Shift now offer support for GPU-based calculations. The coupled capabilities established in ENRICO are strategically aligned with the evolving landscape of GPU-accelerated supercomputer platforms.
4.1 Modeling Effects of Mixing Vanes Through Momentum Sources.
In light water nuclear reactors, fuel rods are generally arranged in triangular or square patterns and supported by spacer grids to prevent vibrations and maintain stability. Mixing vanes, specifically split-type vanes, are employed to enhance turbulence mixing, thereby improving convective heat transfer from the fuel rod surface to the coolant. The complex flow structures generated by these spacer grids and mixing vanes have a profound impact on the performance of the coolant flow. This research aims to develop a reduced-order modeling approach that can effectively capture the macroscopic effects of spacer grid and mixing vanes on the coolant flow. By implementing novel three-dimensional momentum sources, the approach simulates flow swirling, intersubchannel mixing, and pressure loss, while significantly reducing computational costs and enabling larger domain sizes to be studied. The accuracy of the reduced-order model is evaluated by comparing its results with those obtained from reference LESs for a specific pin bundle geometry. The outcome of this research will facilitate the utilization of advanced computational fluid dynamics (CFD) techniques and supercomputing capabilities to tackle grand challenges in nuclear engineering, such as studying the behavior of neutronics and thermal-hydraulics in large-scale systems.
The LES simulations were conducted for a 5 × 5 fuel rod bundle geometry to investigate the effect of spacer grid and mixing vanes on the coolant flow. The explicit representation of detailed geometric structures, including the spacer grid and mixing vanes, was incorporated in the LES model (as shown in Fig. 6). Four Reynolds numbers were considered: 10,000, 20,000, 40,000, and 80,000. It is noteworthy that the target Reynolds number for the small modular reactor (SMR) core is approximately 78,300, based on the NuScale SMR design specifications [52]. The obtained LES results provide valuable information on the instantaneous velocity fields, showcasing the significant flow mixing induced by the presence of the mixing vanes. Time-averaged first- and second-order flow quantities were extracted from all cases. The swirling and mixing factors, along with the pressure drop, were derived from the LES solutions to serve as reference data for calibrating the corresponding RANS momentum sources.

A 5 × 5 fuel rod bundle and the LES velocity field solutions showing strong flow mixing induced by mixing vanes grid
The efficacy of the momentum sources was first evaluated by their implementation and testing in the 2 × 2 bundle configuration, where they successfully replicated the time-averaged macroscale flow characteristics, including lateral flow mixing and streamwise pressure drop induced by the spacer grid and mixing vanes. The application of momentum sources near the inlet induced a shift in the streamline pattern from parallel to swirling, as depicted in Fig. 7. Quantitative measures such as swirling and mixing factors, as well as the pressure drop, exhibited excellent agreement with the reference data. Furthermore, the momentum sources were further implemented in pin-resolved CFD simulations of a full SMR core (shown in Fig. 8). In this larger-scale simulation, the momentum sources induced significant cross flow in the core, consistent with the findings in the 2 × 2 bundle simulations. The magnitudes of the cross flows in both directions were substantial, representing a high percentage of the bulk flow and playing a crucial role in heat transfer and mixing processes. A detailed view of the cross flow pattern at five hydraulic diameter downstream is illustrated in Fig. 9. For more comprehensive details regarding the implementation and rigorous verification of the momentum source model, interested readers are encouraged to refer to the work by Fang et al. [53]. This study provides in-depth insights into the effectiveness of the momentum source model and its ability to accurately capture the complex flow behavior in the presence of spacer grids and mixing vanes.

Streamlines from the RANS simulations revealing the lateral flow motions induced by prescribed momentum sources

Detail of cross flows in full core simulation. (Left) velocity in direction y. (Right) velocity in direction x.
4.2 Coupling With Monte Carlo Neutronics Code OpenMC Within Exascale Nuclear Reactor Investigative Code.
The ENRICO is designed to drive automated multiphysics simulations of reactors. It achieves this by coupling a neutronics code with a thermal-hydraulics (TH) code [52]. The neutronics code is used to solve the steady-state neutron transport equation and determine the energy deposition distribution, which serves as the source term for the TH code. The TH code then solves equations for heat transfer and fluid dynamics, determining the temperature and density distributions, which are then fed back into the neutronics code to update the model since neutron cross sections are dependent on temperature and density. To solve for the different physics fields, ENRICO employs a nonlinear fixed-point iteration, also known as Picard iteration. This approach has been demonstrated to be effective in previous investigations [54], especially when under-relaxation is used. The individual physics solvers are run sequentially, and the convergence is reached when the L1, L2, or norm of the temperature change of two consecutive iterations is less than the prescribed tolerance. ENRICO has been designed to run efficiently on distributed-memory clusters. By running each physics solver on independent nodes, ENRICO eliminates memory constraints and ensures efficient performance. The solvers that have been integrated into ENRICO include the OpenMC and Shift MC codes for neutronics simulations, and the spectral element CFD codes Nek5000 and GPU-oriented NekRS for heat transfer and fluid dynamics simulations. It is worthwhile to mention that ENRICO was developed to be agnostic to the underlying physics codes, which can be expanded to support other neutronics/TH software.
Exascale Nuclear Reactor Investigative Code was originally developed using Nek5000 as the TH solver. The integration of NekRS CFD solver into the ENRICO framework represents a significant advancement in thermal-fluid calculations, particularly in terms of accelerating simulations on GPU-based leadership class supercomputers such as Summit and Aurora. To fully capitalize on the potential performance boost, a thorough benchmark study is essential to ensure that the new OpenMC-NekRS solver delivers consistent results with the original OpenMC-Nek5000 solver. To this end, a benchmark study was conducted using test cases comprising a single rod model and an assembly model. The study compared key quantities of interest (QoI) from both solvers, including the coolant temperature distributions and the effective reactivity coefficient (keff). The long rod case is based on a model previously reported in a milestone report [55]. To ensure a fair comparison, an identical set of input parameters was used for both OpenMC-NekRS and OpenMC-Nek5000. The benchmarking analysis focused on the ENRICO results after ten Picard iterations, enabling a comprehensive evaluation of the solver performance and its consistency with respect to the key thermal-fluid parameters.
Figure 10 presents the comparison of temperature distributions obtained after ten Picard iterations between the new OpenMC-NekRS solver and the original OpenMC-Nek5000 solver. Specifically, Fig. 10(a) focuses on the axial temperature profile, where the results from both solvers exhibit an extraordinary level of agreement, with a maximum deviation of just 0.02%. To further validate the accuracy of the simulations, radial temperature profiles were extracted at a height of z = 100 cm. The comparison reveals excellent conformity, with maximum deviations of 0.50% inside the fuel pin and a mere 0.01% in the coolant flow. These results serve as compelling evidence that the new ENRICO solver, based on NekRS, consistently produces simulation outcomes that are in close agreement with the original Nek5000-based version. The benchmark study successfully confirms the reliability and accuracy of the enhanced OpenMC-NekRS solver within the ENRICO framework.

Comparison of temperature distributions from the long rod case with both OpenMC–NekRS and OpenMC–Nek5000 solvers: (a) axial temperature profile in the long rod and (b) radial temperature profile in the long rod at z = 100 cm
In addition to the single rod model, benchmarking tests were conducted on a more complex 17 × 17 fuel rod assembly model [56]. As with the previous cases, both the OpenMC-NekRS and OpenMC-Nek5000 simulations were provided with the same set of input parameters. The coupled simulation ran for seven Picard iterations, and the resulting solution fields were examined, including the velocity and temperature fields extracted at a height of z = 100 cm, as shown in Fig. 11. To thoroughly assess the consistency and reliability of the new OpenMC-NekRS solver, key QoIs were selected for comparison, as presented in Table 1. The results obtained from the NekRS based ENRICO closely align with those from the original Nek5000 based ENRICO, indicating an excellent agreement and further affirming the reliability of the OpenMC-NekRS solver. These benchmarking tests reinforce the confidence in utilizing the enhanced ENRICO framework with NekRS, offering a powerful and reliable solution for multiphysics simulations in complex nuclear reactor assemblies.

Cross-sectional view of assembly solutions at z = 100 cm from OpenMC–NekRS simulations: (a) velocity field within the assembly and (b) temperature field within the assembly
Comparison of QoI from benchmarking tests with the assembly model
QoI | OpenMC–NekRS | OpenMC–Nek5000 | Relative deviation |
---|---|---|---|
Peak fuel temperature | 1048.36 | 1050.00 | 0.16% |
Peak coolant temperature | 595.98 | 597.30 | 0.22% |
keff | 1.1669 | 1.1670 | 0.01% |
QoI | OpenMC–NekRS | OpenMC–Nek5000 | Relative deviation |
---|---|---|---|
Peak fuel temperature | 1048.36 | 1050.00 | 0.16% |
Peak coolant temperature | 595.98 | 597.30 | 0.22% |
keff | 1.1669 | 1.1670 | 0.01% |
4.3 Coupled NekRS-OpenMC Simulations of Small Modular Reactor Full Core.
After the successful verification of the NekRS-OpenMC coupling capability, a comprehensive simulation of an SMR full core was conducted on the Summit supercomputer at OLCF. Each simulation job utilized 300 Summit nodes, with a polynomial order of N = 3 for NekRS. During each Picard iteration, the predominant share of simulation time was allocated to NekRS calculations. On average, the OpenMC session utilized approximately 54 min within the total 360-minute runtime, constituting only 15% of the overall computational expense. During each Picard iteration, NekRS executed 10,000 time steps with a time-step size of 0.5 ms. Overall, a total of ten Picard iterations were simulated. To assess the convergence of the Picard iterations, Fig. 12 depicts the maximum temperature change between iterations. The plot clearly indicates that convergence is achieved after four iterations, demonstrating the stability and efficiency of the iterative process.
Figure 13 presents the radial temperature distribution across the entire SMR reactor core at the axial midplane (z = 100 cm). The plot reveals distinct radial rings within each fuel pin, depicting the transitions from the fuel region to the gap and cladding. In this comprehensive simulation, a total power of 160 MW is deposited in the full core model, resulting in a temperature rise of 56 K (100.8 F) in the coolant flow, which closely aligns with the design specifications of NuScale ( = 100 F). Notably, the peak fuel and coolant temperatures observed are 1212.0 K and 587.7 K, respectively. It is important to note that the turbulence modeling options employed in this study are intentionally simplistic. As part of ongoing efforts to enhance the simulation accuracy, additional simulations are planned with more sophisticated turbulence modeling and higher resolution to evaluate and quantify the uncertainty associated with the current results. To further improve the resolution of the SMR system response, a natural circulation model has been developed and integrated into the full core simulations. For a more comprehensive understanding of this aspect, interested readers are encouraged to refer to the work by Fang et al. [57], which provides in-depth details on this development and its implications. Details on the performance are provided in Ref. [58].

Radial temperature distribution (z = 100 cm) from coupled OpenMC–NekRS simulations of the full-core model
4.4 Hybrid Reynolds-Averaged Navier–Stokes/Large Eddy Simulation Modeling.
The RANS/LES hybrid approach is a novel and promising strategy that has been introduced to enhance the simulation of a nuclear reactor core. In this approach, the core is divided into different regions or zones, with a portion of the core simulated using LES while the remaining regions are handled by RANS modeling. This zonal hybrid strategy aims to combine the strengths of both approaches, leveraging the accuracy and capability of LES in resolving turbulent flow features in specific regions, while still benefiting from the computational efficiency of RANS for the majority of the core. The preliminary results obtained for a subset of the reactor using this approach have shown great promise, motivating further investigation and development to ultimately improve the full core simulation without significantly increasing computational overheads. The project has pursued two parallel methods: the first involves modeling the hybrid RANS-LES using a multizonal mesh, and the second incorporates RANS and LES coupling through overlapping meshes, utilizing the overset mesh (also known as “neknek”) feature to achieve this ambitious goal. As an illustrative example, a relatively simple 5 × 5 rod bundle is considered, and the computational grids for both the multizonal and overlapping mesh approaches are depicted in Figs. 14 and 15, respectively. The elemental mesh resolution is uniform in the z-direction and nonuniform in the x-direction, clustering elements in the middle to ensure sufficient resolution for the LES modeling area. These grid representations highlight the flexibility and versatility of the hybrid approach, showcasing the potential to effectively optimize computational resources while obtaining high-fidelity simulation results for complex reactor geometries. A demonstration of the capability is shown in Fig. 16 demonstrating a solution with RANS and LES characteristics.

Cross sections of overlapping meshes for the same bundle at polynomial order 3: (a) 3 × 3 subchannel for LES (b) 5 × 5 subchannel for RANS, and (c) overlapped subchannels

Demonstration of zonal hybrid approach for a rod bundle case (5 × 5). Instantaneous velocity magnitude on a streamwise cross section.
4.5 Additional Nuclear Energy Simulations.
The advances in performance achieved by ExaSMR have led to the adoption of NekRS for a variety of other nuclear engineering simulations. Particularly notable have been the advances in the simulation of pebble bed reactors which have been scaled to full core large eddy simulations including 350,000 pebbles, while previously reported simulations achieved of the order of hundreds to thousand of pebbles [59,60]. We note that pebble bed reactors operate at high Reynolds numbers compared to other applications involving packed beds. As the flow is highly turbulent, the creeping flow assumption is not valid for the conditions typically encountered in nuclear pebble beds.
The simulations have been used to elucidate flow dynamics in cylindrical packed beds, which are characterized by a nonhomogeneous porosity distribution that peak near the walls of the cylinder. The simulations have demonstrated that the form constant in the pressure drop depends on the porosity and the dependency follows closely metrics that are based on inertial effects [61,62]. A first-of-its-kind correlation for the form loss has been generated that can accurately predict the velocity distribution in the bed. We note that the wall-region is very important in pebble bed reactors for a variety of reasons including bypass flow and localized power peaking.
5 Wind Energy Applications
Through a collaboration between the ECP's ExaWind project at the National Renewable Energy Laboratory and the Center for Efficient Exascale Discretizations, the Nek5000/RS team has recently developed wall-modeled large-eddy simulation approaches for atmospheric boundary layers [10,13]. Such a capability is critical for applications such as wind farm analysis and dispersive transport in urban canyons. Here, we discuss the governing equations for wall-modeled large-eddy simulation in Nek5000/RS along with new subgrid-scale (SGS) turbulence modeling approaches [10]. We also present convergence and performance results, demonstrating that NekRS provides high-order convergence and that it outperforms low-order codes on various advanced architectures [13].
5.1 Large Eddy Simulation for Atmospheric Boundary Layer Flows.
Atmospheric boundary layer flows are characterized by turbulent mixing, vertical diffusion, vertical and horizontal exchange of heat, etc., over a large range of scales and thus present significant challenges for simulation codes, even at exascale. To quantify the effects of numerical modeling and discretization choices, the atmospheric boundary layer community has setup a sequence of benchmark problems, including the Global Energy and Water Cycle Experiment Atmospheric Boundary Layer Study (GABLS) [63]. These benchmarks represent the atmospheric boundary layer in regional and large-scale atmospheric models and are considered important for improved modeling in the study of wind-energy, climate, and weather on all scales [64].
For the atmospheric LES, we solve the incompressible NS and potential temperature equations in a spatially filtered resolved-scale formulation in nondimensional form. Appropriate source terms are added for the Coriolis acceleration and buoyancy force.
In the case of the high-pass filter model discussed below, the forcing term would also include a damping term of the form , where χ is an order-unity multiplier and is a spatial filter function that allows grid-scale fluctuations of the argument to pass through so that their amplitude is damped in time.
where Re is the Reynolds number, Pe is the Peclet number, Sij is the resolved-scale strain-rate tensor, and and are the SGS stress tensors that require modeling.
5.2 Global Energy and Water Cycle Experiment Atmospheric Boundary Layer Study Benchmark.
We consider the GABLS benchmark [63], illustrated in Fig. 17, which is a well-documented stably stratified flow problem, having the ground temperature being cooler than the air temperature with continued cooling over the simulation duration. The computational domain is defined on 400 m × 400 m × 400 m with the streamwise direction in x, the spanwise direction in y, and the vertical direction in z. We define an initial condition at t = 0 with constant velocity in the streamwise direction equal to geostrophic wind speed of U = 8 m/s. The initial potential temperature is 265 K in m and linearly increased at a rate of 0.01 K/m in 100 m m, with the reference potential temperature of 263.5 K. An initial perturbation is added to the temperature with an amplitude of 0.1 K on the potential temperature field for m.
![NekRS simulation for the atmospheric boundary layer flows demonstrating the potential temperature on the vertical planes and the streamwise velocity on the horizontal plane. The total number of grid points is n = 134 M with E=643 and N = 8. The simulation is performed on OLCF's Summit using ten nodes with 60 NVIDIA V100 GPUs. (Simulation by Lindquist et al. [12]).](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/fluidsengineering/146/4/10.1115_1.4064659/1/m_fe_146_04_041105_f017.png?Expires=1745961828&Signature=L1DEQ4j7a5cmNOujcvgtoYxIdDvNW94QqZhbDgVkPON-yZIOUgV3CBxuRg3J0DJRY1NnMaJk1IF~l87jiOx-zMO6Zc6X6CZEKyeCId7gzuEFavCMPNz5yaNTFbLoNOuDD6NIO1LVLJ2Qnd80gjBe7LLJgMujIw-EMojUemWQoIUkYTFCWFQcGvPYJjlonL2tN3OG1vSzsLAhSgyaSknBgnqgQAUx930F~sUBgOvfGbzQG3UW8a~XsmcpOSPk5Bfx1CU4SmybCOvSaTSwtDwapEuHxunQk-bJbQUoY1xgX47p5qJO-ruOHWW9CT8n9Xep237kcnMmLyPvVfiw7CvRoQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
NekRS simulation for the atmospheric boundary layer flows demonstrating the potential temperature on the vertical planes and the streamwise velocity on the horizontal plane. The total number of grid points is n = 134 M with and N = 8. The simulation is performed on OLCF's Summit using ten nodes with 60 NVIDIA V100 GPUs. (Simulation by Lindquist et al. [12]).
![NekRS simulation for the atmospheric boundary layer flows demonstrating the potential temperature on the vertical planes and the streamwise velocity on the horizontal plane. The total number of grid points is n = 134 M with E=643 and N = 8. The simulation is performed on OLCF's Summit using ten nodes with 60 NVIDIA V100 GPUs. (Simulation by Lindquist et al. [12]).](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/fluidsengineering/146/4/10.1115_1.4064659/1/m_fe_146_04_041105_f017.png?Expires=1745961828&Signature=L1DEQ4j7a5cmNOujcvgtoYxIdDvNW94QqZhbDgVkPON-yZIOUgV3CBxuRg3J0DJRY1NnMaJk1IF~l87jiOx-zMO6Zc6X6CZEKyeCId7gzuEFavCMPNz5yaNTFbLoNOuDD6NIO1LVLJ2Qnd80gjBe7LLJgMujIw-EMojUemWQoIUkYTFCWFQcGvPYJjlonL2tN3OG1vSzsLAhSgyaSknBgnqgQAUx930F~sUBgOvfGbzQG3UW8a~XsmcpOSPk5Bfx1CU4SmybCOvSaTSwtDwapEuHxunQk-bJbQUoY1xgX47p5qJO-ruOHWW9CT8n9Xep237kcnMmLyPvVfiw7CvRoQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
NekRS simulation for the atmospheric boundary layer flows demonstrating the potential temperature on the vertical planes and the streamwise velocity on the horizontal plane. The total number of grid points is n = 134 M with and N = 8. The simulation is performed on OLCF's Summit using ten nodes with 60 NVIDIA V100 GPUs. (Simulation by Lindquist et al. [12]).
The Reynolds number is , where Lb = 100 m is the thickness of the initial thermal boundary layer and ν is the molecular viscosity. Under the stated conditions, M, which precludes direct numerical simulation wherein all turbulent scales are resolved.
We consider periodic boundary conditions (BCs) in the streamwise and spanwise directions. At the top boundary, (z = 400 m), a stress-free, rigid lid is applied for momentum, and the heat flux for the energy equation is set consistent with the 0.01 K/m temperature gradient initially prescribed in the upper region of the flow. At the bottom boundary, we use a wall model in which traction BCs for the velocity. The specified shear stress comes from Monin–Obukhov similarity theory [65]. For the energy equation, a heat flux is applied that is derived from the same theory and a specified potential temperature difference between the flow at a height, z1, and the surface. For the traction BCs for the horizontal velocity components, we followed the approaches of Refs. [66] and [67] in the context of the log-law. The normal component of the velocity as zero and the traction BCs is imposed on the tangential velocity based on the horizontally averaged slip velocity that develops at the boundary. Further details regarding the wall model are provided in Ref. [10].
5.3 Subgrid-Scale Models.
We have explored several SGS modeling approaches for the GABLS problem including a mean-field eddy viscosity (MFEV) [68] for the anisotropic part of the stress tensor. The law of the wall is effected through the use of the MFEV concept and the approach originally used by Schumann [69] is used to convert the horizontally averaged traction to local values based on the local slip velocity in each of the horizontal directions. The isotropic part of the dissipation is provided using either a high-pass filter (HPF) [70] or a Smagorinsky (SMG) eddy viscosity [71] applied to the fluctuating strain-rate. Here, we present some of the basic components of these approaches and compare their convergence behavior. Additional thermal components required for these models are presented in Ref. [10].
Here, CK is a constant and is a mixing-length scale [10,68]. In the HPF model, isotropic fluctuations are damped by addition of the term to fi, as mentioned earlier.
where and . The isotropy factor γ is obtained by with (see Refs. [10] and [13] for further detail).
Figure 18 shows the horizontally averaged streamwise and spanwise velocities at at three different resolutions using which correspond to the averaged grid spacing of 3.12 m, 1.56 m, and 0.79 m, respectively. The top left shows relatively rapid convergence for MFEV+HPF, with a clearly converging profile at that is very close to the one at . The results of MFEV+SMG, while not as converged at as MFEV+HPF, shows profiles that are fairly close at all three resolutions, which implies that MFEV+SMG is perhaps more robust than MFEV+HPF. The third panel illustrates that both models converge to the same jet height, which is an important parameter for wind-farm modeling.

Horizontally averaged streamwise and spanwise velocities at with MFEV+HPF and MFEV+SMG using traction boundary conditions, demonstrating convergence at three different resolutions
In addition to grid-convergence, we also consider performance, which is a primary focus of exascale computing. Figure 19 shows NekRS strong-scaling results for the GABLS example on Summit. The simulations were run for 200 timesteps starting with the 5123 solution at t = 6 h as the initial condition. Timings were averaged over the last 100 steps. The upper left panel shows the standard “time-per-step” versus number of V100 s, P, for resolutions of , 10243, and 20483. These curves are seen to nearly collapse into a single line when the data are replotted versus the number of points per GPU, n/P, seen in the upper right panel. From the parallel efficiency plot in the lower panel, we see that 80% efficiency is realized at M, which corresponds to a wall-clock time per step of s per step. We reiterate the earlier remark that at the strong scale limit of M, the time per solution is essentially independent of problem size.
6 Conclusions
This paper has presented the transformative enhancements in performance enabled by the development of NekRS, an open-source GPU-based fluid-thermal simulation code. NekRS has been developed with a focus on efficiency and scalability, leveraging a kth-order accurate timesplit formulation for time advancement and scalable iterative solvers for spatial discretization. The high-order SEM used in spatial discretization enables fast, low-memory, matrix-free operator evaluation, contributing to the overall performance of NekRS. We discussed novel capabilities within NekRS and provide a validation example for multijet experiment that has also provided valuable insights into complex flow structures. We then discussed large scale energy applications.
In the context of nuclear energy applications, the integration of NekRS into the ENRICO framework, which includes the OpenMC and Shift MC codes for neutronics simulations, and the spectral element CFD codes Nek5000 and NekRS for heat transfer and fluid dynamics simulations, has significantly advanced thermal-fluid calculations for nuclear reactors delivering simulations that were unthinkable only a few years ago. This has been possible by leveraging the power of GPU-based leadership class supercomputers such as Summit and Frontier. We have also developed a reduced-order modeling approach that effectively captures the macroscopic effects of spacer grid and mixing vanes on the coolant flow. This approach, which implements novel three-dimensional momentum sources, simulates flow swirling, intersubchannel mixing, and pressure loss. It significantly reduces computational costs and enables larger domain sizes to be studied, thereby enhancing the performance of NekRS in these applications. Furthermore, our work with NekRS in wall-modeled large-eddy simulations of atmospheric boundary layers for wind-energy have demonstrated the potential of NekRS in providing valuable insights into complex flow physics and their impact on performance.
In conclusion, the advances in the performance and capabilities of NekRS have demonstrated its potential in a range of applications, including nuclear and wind energy. Future work will continue to leverage these performance enhancements to further advance our understanding of complex flow physics in energy systems.
Acknowledgment
The research used resources at the Argonne Leadership Computing Facility and at the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Funding Data
This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, under Contract No. DE-AC02-06CH11357 and by the Exascale Computing Project (ECP) 17-SC-20-SC; Funder ID: 10.13039/100000015.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.