## Abstract

Topology optimization is one of the most flexible structural optimization methodologies. However, in exchange for its high level of design freedom, typical topology optimization cannot avoid multimodality, where multiple local optima exist. This study focuses on developing a gradient-free topology optimization framework to avoid being trapped in undesirable local optima. Its core is a data-driven multifidelity topology design (MFTD) method, in which the design candidates generated by solving low-fidelity topology optimization problems are updated through a deep generative model and high-fidelity evaluation. As its key component, the deep generative model compresses the original data into a low-dimensional manifold, i.e., the latent space, and randomly arranges new design candidates over the space. Although the original framework is gradient free, its randomness may lead to convergence variability and premature convergence. Inspired by a popular crossover operation of evolutionary algorithms (EAs), this study merges the data-driven MFTD framework and proposes a new crossover operation called latent crossover. We apply the proposed method to a maximum stress minimization problem in 2D structural mechanics. The results demonstrate that the latent crossover improves convergence stability compared to the original data-driven MFTD method. Furthermore, the optimized designs exhibit performance comparable to or better than that in conventional gradient-based topology optimization using the P-norm measure.

## 1 Introduction

Topology optimization, first proposed by Bendsøe and Kikuchi [1], enables the determination of an optimized material distribution for a structural optimization problem and offers a high level of design freedom [2]. While this attractive feature makes it applicable to various structural design problems, topology optimization faces challenges with multimodality, where multiple local optima exist in the solution space [3]. That is, gradient-based optimizers used in conventional topology optimization methods may fall into low-performance local optima. This intractable characteristic is often seen in strongly nonlinear problems, e.g., minimax problems; thus, it is challenging to obtain structures that exhibit high levels of performance.

One of the standard ways to overcome the problem of multimodality in engineering optimization applications is evolutionary algorithms (EAs) since they are gradient-free [4]. An EA, such as the genetic algorithm, mimics the evolutionary mechanisms of living organisms, and solutions are represented as strings of genes. The solution search is performed by applying three basic genetic operations: selection, crossover, and mutation, to a population of individuals. Each iteration of these genetic operations is referred to as a generation. The selection is an operation that retains individuals with relatively better objective function values in the population for the next generation. The crossover is an operation that partially exchanges genes between selected individuals to generate new individuals (offspring) that inherit traits from old ones (parents). However, if some individuals in the population have significantly higher fitness than others in the early stages of the search, they may weed out others by selection and crossover, leading to a loss of diversity and a high probability of premature convergence [5]. The mutation is an operation that introduces new genes into the population by changing a portion of the genes of selected individuals, which helps maintain diversity in the population. Several methods [6–9] have been proposed to solve topology optimization problems using EAs, taking advantage of their gradient-free nature. While they can perform a global search for strongly nonlinear problems, Sigmund [10] has pointed out issues with EA-based topology optimization. That is, topology optimization problems often require a large number of design variables, and the computational cost of the EA increases exponentially with the number of design variables due to the so-called *curse of dimensionality*.

As a potentially promising way to avoid the curse of dimensionality, some deep generative models can dramatically reduce the dimensionality of the topology optimization problem. Variational autoencoders (VAEs) [11] and generative adversarial networks (GANs) [12] are popular deep generative models. In a VAE, an encoder is built to compress high-dimensional data into a low-dimensional manifold, called latent space, and maps it to a probability distribution, while a decoder reconstructs high-dimensional data from the latent space. In a GAN, a generator creates new data samples by starting from random noise and trying to produce data that are indistinguishable from real data. A discriminator, on the other hand, assesses these generated samples and tries to distinguish them from real data. As a review paper [13] mentioned, relevant studies on deep generative models for engineering design problems have increased dramatically in recent years. As pioneering work, Guo et al. [14] proposed a data-driven indirect design representation for high-dimensional design problems, which iteratively optimizes the latent space of a VAE as the design variable field. Oh et al. [15] proposed a design framework that iteratively trains a GAN to generate a variety of designs. Kazemi et al. [16] proposed a method to generate conceptual designs using a GAN for multi-physics topology optimization problems.

On the basis of combining EAs and deep generative models, Yaji et al. [17] proposed a data-driven multifidelity topology design (MFTD) method that enables gradient-free topology optimization. The basic idea of data-driven MFTD is that design candidates, generated by solving low-fidelity topology optimization problems, are iteratively updated using an EA that guides queries to a high-fidelity analysis model. The key to this framework builds upon data-driven topology design [18], incorporating a VAE as a crossover-like operation for each optimization step. The effectiveness of the framework was demonstrated for topology optimization problems that are hard to solve directly with conventional methods, such as minimax and turbulent flow problems. However, since the generative process in a VAE is based on a uniform random sampling in the latent space, it is expected that the effectiveness of the approach can be improved if the crossover operation is adopted based on EAs.

This article proposes a particular crossover operation based on EAs, called *latent crossover*, for the data-driven MFTD framework. Specifically, simplex crossover (SPX) [19]—a crossover operator of real-coded genetic algorithms (RCGAs) [20]—is used for latent crossover. We apply the proposed method to a maximum stress minimization problem of an L-bracket and verify the effectiveness of latent crossover, comparing it with the original data-driven MFTD. We also discuss its usefulness by comparing the results of the proposed method with those of gradient-based topology optimization (GTO) using the $P$-norm measure for the maximum stress minimization problem.

## 2 Latent Crossover

In data-driven MFTD [17], whose details are described in Sec. 3, the high-dimensional material distribution data of the design candidates are encoded by a VAE into low-dimensional real-valued latent variables that correspond to EA genes, making the framework similar to the RCGA among EAs. Its high representation flexibility makes crossover more important in the RCGA than in the binary GA, and it has been the subject of various studies. For example, Kita and Yamamura [21] proposed a theory called the function specialization hypothesis concerning the selection and crossover operators in RCGAs, which includes the following ideas:

The selection operator eliminates individuals with low fitness and, meanwhile, selects and replicates those with high fitness. Therefore, it is designed to narrow the population distribution gradually.

The crossover operator transforms the distribution by combining parent individuals to generate offspring and is designed to retain the ability to generate new offspring for a finite population, but not to change the population distribution.

In data-driven MFTD, candidate solutions are generated through random sampling from the latent space of a VAE, so in terms of the genetic distribution and statistics of the population, we consider the probability distribution of the generated offspring. Figure 1 shows an example of the probability distribution for generating offspring in a two-dimensional latent space in the range from $\u2212$2 to +2 for each dimension. The darker areas have a higher probability of generating offspring. Assuming that the distribution of the parent population, as shown in Fig. 1(a), is given, data-driven MFTD performs sampling based on a uniform distribution in the latent space, regardless of the distribution of the parent population. The resulting probability distribution of the generated offspring becomes the one shown in Fig. 1(b). It cannot be said that the statistics of the parent population are inherited. Although the use of a VAE as a deep generative model enables a crossover-like operation in the original data-driven MFTD, it is similar to crossover but cannot be considered strictly performing crossover because of random sampling. Since the input data follow a normal distribution in the latent space due to the nature of VAEs [11], generating offspring through sampling based on a normal distribution rather than a uniform distribution can be considered reasonable. However, as shown in Fig. 1(c), the probability of generated offspring does not follow the distribution of the parent population; therefore, the statistics of the parent population are not inherited in this case either. Based on the EA concept, preserving the diversity of the population helps prevent premature convergence, but crossover-like sampling from the latent space using random sampling can lead to an early loss of diversity in the population. This results in fluctuation in convergence and, in the worst case, failures to perform a global search, leading to the possibility of getting stuck in local optima.

As mentioned earlier, it is impossible to strictly inherit the statistical characteristics of the parent population through random sampling. According to its nature, a crossover operation generates offspring by targeting small areas for parents who are close together and large areas for those who are far apart [25]. Thus, applying latent crossover to the parent population in Fig. 1(a), the probability distribution of generated offspring is expected to become the one shown in Fig. 1(d). Therefore, it can be said that a crossover operation in the latent space, i.e., the latent crossover, is promising.

## 3 Framework

### 3.1 Data-Driven MFTD With Latent Crossover.

By using the MFTD approach and a deep generative model, data-driven MFTD iteratively updates solution candidates in a gradient-free manner similar to EAs. Note that the latent space is updated at every optimization step. The schematic flow of the proposed data-driven MFTD with latent crossover is shown in Fig. 2, and the details of each step are explained here.

#### Initial Data Generation:.

#### Evaluation:.

The performance of candidate solutions is evaluated using a high-fidelity analysis model, which is used to compute the original multiple objective functions $Ji$ and $Gj$ in Eq. (1) with discrete $\gamma e$ binarized to ${0,1}$.

#### Selection:.

As mentioned in Sec. 2, selection is a critical genetic operation in RCGAs. For problems as in Eq. (1), it is necessary to evaluate solutions using multiple objective functions and select those to be preserved in the next generation. This article uses the nondominated sorting genetic algorithm II (NSGA-II) [27] strategy as a selection algorithm, which selects candidates in a multi-objective manner by ranking them based on the Pareto dominance relation using distances in the objective function space. The nondominated candidate solutions are selected based on performance evaluation values from the high-fidelity model, and then a set of Pareto solutions is constructed.

#### Crossover:.

A VAE is trained with the Pareto solution set as input to construct a latent space, where high-dimensional material distributions are encoded into low-dimensional latent variables. Here, it is important to note that the learning data are not accumulated iteratively but rather, a fixed number of data to be selected is predetermined, and a VAE is trained anew in each iteration. Latent crossover is performed using these latent variables to generate offspring in the latent space. Decoding the offspring generated by latent crossover yields new material distributions that inherit the characteristics of the input data, and candidate solutions are generated. The details of the VAE and the latent crossover operation are described in Secs. 3.2 and 3.3, respectively.

#### Mutation:.

The latent space of the VAE is constructed using the Pareto solution set of the current generation and corresponds to a subspace in which the solutions are distributed. Even if the mutation method of RCGAs, such as the nonuniform mutation operator [28], is applied in the latent space, its outcome is limited to a specific subspace against the whole solution space. This limitation exists because such a mutation only performs a local search in the subspace around the solutions distributed in the whole solution space. Thus, it cannot be expected to maintain the diversity of the population and prevent premature convergence, as discussed in Sec. 1.

This article uses the average value of material distributions in a given generation as a reference structure. This average distribution can be considered to be representative of the material distributions of the population. By solving the low-fidelity optimization problem with the constraint function of Eq. (3) and the reference structure, promising candidate solutions can be generated with unique features that are not present in the population. This approach enables a mutation-like operation, similar to the mutation in EAs, to maintain diversity and prevent premature convergence. It should be noted that the mutants added to the population through this operation are still limited to a specific subspace and may not search the whole solution space comprehensively.

### 3.2 Variational Autoencoder.

Here, the VAE trained with the architecture shown in Fig. 3 and the loss function of Eq. (5) constructs a latent space following a single standard normal distribution. In contrast, there are advanced generative models such as Gaussian mixture VAEs [29] whose latent space follows multiple distributions. For instance, on the basis of this idea, Tsumoto et al. [30] have proposed a clustering method for solutions obtained through topology optimization. Due to the search mechanism of evolutionary algorithms, data-driven MFTD could involve the training data being distributed into several clusters, and Gaussian mixture VAEs might provide better learning accuracy compared to the standard VAEs in such cases. However, as mentioned in Sec. 3, since VAEs are trained anew at each iteration in the optimization process, this study employs the aforementioned standard VAEs in terms of computational cost and learning stability.

Compared to simple dimensionality reduction using autoencoders, VAEs are trained by incorporating probabilistic variation through $\epsilon $, allowing for the estimation of the given dataset distribution, and can be used as a deep generative model for continuous data generation. When using material distributions as a dataset for topology optimization, essential features within the dataset are extracted by compressing them into dramatically smaller latent variables. According to the standard normal distribution, latent variables do not take extremely large or small values. To represent all material distributions without excessive randomness, original data-driven MFTD [17] generates offspring by sampling uniform random numbers in $[\u22124,4]$, which covers 99.7% of the data within $\xb14\sigma $, for each latent variable. However, as mentioned in Sec. 2, generating offspring with a uniform probability distribution in the latent space, as shown in Fig. 1(b), regardless of the distribution of parent individuals, can be problematic. In this article, we perform latent crossover using the crossover operator explained in Sec. 3.3.

### 3.3 Simplex Crossover.

Due to the high degree of freedom of representing genes as real-valued vectors, the RCGA has limited offspring that can be generated from selected parent individuals using crossover operators, such as the single-point crossover commonly used in binary evolutionary algorithms. Several crossover operators for RCGAs [19,31,32] have been proposed to address this issue. This article uses the simplex crossover (SPX) [19] for a latent crossover operator. SPX is one of the multiparent crossover operators for RCGAs that generates offspring using three or more parent individuals and is consistent with the crossover design guidelines [22–24] as it inherits the average value and covariance matrix of the population.

When the search space is defined as the real $n$-dimensional space $Rn$, where individuals are represented as vectors of real numbers, the algorithm for SPX is as follows:

Randomly select $(n+1)$ parent individuals $P0,P1,\u2026,Pn$ from the population.

- Calculate the centroid $G$ of the parent individuals as follows:(7)$G=1n\u2211i=0nPi$
- Calculate variables $xk$ and $Ck$ for $k=0,1,\u2026,n$ as follows:(8)$xk=G+\epsilon (Pk\u2212G)$Here, $\epsilon $ is the expansion rate parameter, and $n+2$ is the recommended value for inheriting population statistics [19]. $rk$ is obtained by transforming a uniform random number $u(0,1)$ in the interval $[0,1]$ as follows:(9)$Ck={0(k=0)rk\u22121(xk\u22121\u2212xk+Ck\u22121)(k=1,\u2026,n)$(10)$rk={u(0,1)1k+1(k=0,\u2026,n\u22121)1(k=n)$
- Generate a child individual $C$ as follows:(11)$C=xn+Cn$

## 4 Numerical Examples

### 4.1 Problem Setting.

Data-driven MFTD, as mentioned in Sec. 3.1, is a framework for multimodal optimization problems with high nonlinearity and targets problems where the low-fidelity optimization problem is formulated as an easily solvable pseudo-problem for the original one to be solved.

The design domain and boundary conditions for the L-bracket, as shown in Fig. 5, include fixing the upper edge and applying a vertical downward distributed load at the top right corner to avoid stress concentration. The length of the bracket is set to $L=2$, and the design domain is divided into 6400 square elements ($N=6400$). Young’s modulus of the structural material is set to 1, one of the voids is set to $1\xd710\u22129$ instead of 0 to avoid the singular stiffness matrix, and Poisson’s ratio is set to 0.3.

As for the parameters set based on preliminary studies related to the overall procedure, the number of initial data and Pareto solutions from the selection operation are set to 100 and 300, respectively. Regarding the parameters related to the mutation operation, $Nmut$ is set to 16, and $G~mutmax$ is set to 0.01. During the latent crossover, nine parent individuals are used by the SPX method because the dimension of the VAE latent space is 8.

### 4.2 Verification of Variational Autoencoder Model.

First, we verify the VAE model and parameters, which play a central role in data-driven MFTD. After preliminary studies on the hyperparameters, we establish the VAE architecture as shown in Fig. 3. The VAE is trained with 100 material distribution samples with 500 epochs, a batch size of 20, and a learning rate of 0.001. The training is terminated if the loss function $LVAE$ of Eq. (5) is not improved in every iteration for a total of 50 iterations.

Figure 6 shows the history of the loss function in Eq. (5) during training using the material distribution data at iteration 0 described in Sec. 4.4 as an example. The number of epochs is represented on a logarithmic scale to highlight the areas with significant changes in the loss function. The loss function converges smoothly, indicating that the VAE is appropriately trained under the investigated condition.

### 4.3 Verification of Latent Crossover Effect.

For the problem setup in Sec. 4.1, we compare the original and proposed data-driven MFTD frameworks. Since both methods involve random effects, we evaluate and compare them using the hypervolume indicator [41] over ten trials, which is normalized using the initial one. The hypervolume is a measure of the convergence performance of multi-objective optimization. In the case of two objectives, it is represented by the area formed by the reference point and the Pareto front in the objective space as shown in Fig. 7, so a larger hypervolume value means that the Pareto front has progressed. Although mutation is usually performed at regular intervals of iterations, we confirmed that in the case of this design problem, the mutants are selected only once at the beginning, and no mutants are selected as elite solutions thereafter. Therefore, we used the initial data composed of the mutants and initial solutions to compare them with the search performance by crossover without mutation. As this validation involves multiple computations due to the inclusion of randomness, the number of Pareto solutions created through selection has been set to 100 for computational efficiency.

Figure 8 shows the iteration history of the hypervolume indicator over ten trials. Note that its value of each iteration is relative hypervolume normalized by the initial one. In terms of the value at 100 iterations, random sampling in Fig. 8(a) shows a considerable variation in the range from 1.38 to 1.52, while the latent crossover in Fig. 8(b) remains stable in the range from 1.48 to 1.54. The average values of each hypervolume indicator in the ten trials are plotted in Fig. 9. Up to iteration 30, the value of random sampling is higher than that of latent crossover. However, after iteration 30, this relationship is reversed, and at iteration 100, the average value of random sampling is 1.45, while that of latent crossover is 1.50, indicating a difference of 5%. In addition, at iteration 100, the lower limit of the 95% prediction intervals for the latent crossover case exceeds the upper limit for the random sampling case. A t-test was performed on the hypervolume values at iteration 100, and the p-value was 0.00180, which is less than 0.05. Therefore, it can be considered statistically significant that the latent crossover outperforms the random sampling.

In addition, we compare the performance of the best and worst cases among the ten trials shown in Fig. 9 in terms of the relative hypervolume value. Figure 10 presents a comparison of their performance. It is evident from Fig. 10 that the best case with latent crossover achieved the most advanced Pareto front. Even in the worst case with latent crossover, the Pareto front exhibits a spread in the objective function space, whereas in the worst case with random sampling, the Pareto front is highly contracted and fails to maintain diversity. This issue could be serious regarding the nature of EAs [5], as there is an increased risk that the optimized structures are local optima with poor performance.

The SPX operator used as the latent crossover operator gradually changes the population distribution while inheriting the statistics, so the increase in hypervolume is slower in the early stages of the search (up to iteration 30) compared to the random sampling. Therefore, this approach maintains diversity and prevents premature convergence, which leads to a more advanced Pareto front in the final iteration (at iteration 100) in Fig. 10. This improvement can be explained based on the theory that the balance between exploration and exploitation [33], i.e., expanding the Pareto front and advancing it, respectively, is significant in EAs. From these results and discussions, it can be concluded that data-driven MFTD achieved stable and high search performance with the latent crossover based on the theory of RCGAs.

### 4.4 Validity of Optimized Structure.

Next, we compare the structures obtained through data-driven MFTD with structures obtained through direct optimization using a gradient-based approach without relying on MFTD principles. Despite only solving the mean compliance minimization problem of Eq. (13) as the low-fidelity optimization problem, we investigate how closely the structures obtained by data-driven MFTD can approach the performance of structures obtained by conventional gradient-based optimization. In addition, we examine the differences between these structures.

Figure 11 illustrates the structures and performance comparison of results obtained through GTO and data-driven MFTD. First, we discuss the optimization results of data-driven MFTD.

Figure 12 shows the initial dataset obtained by solving the low-fidelity optimization problem in Eq. (13). The initial dataset, which consists of compliance minimization designs, has structures that cause stress concentration at their reentrant corners, whereas the optimized structures shown in Fig. 11 have rounded shapes with their reentrant corners smoothed out. The improved performance and reduced volume can be seen by comparing the plots of iteration 0 and iteration 400 in the objective function space shown in Fig. 11.

When comparing the optimization results of GTO and data-driven MFTD in Fig. 11, it can be confirmed that the solutions obtained from data-driven MFTD exhibit performance comparable to or better than those from GTO. This is particularly notable in the volume fraction range of 0.3 to 0.5. In the range of lower volume fractions from 0.2 to 0.3, GTO exhibits significant variations in structural performance due to the parameter of continuation thresholds. This suggests that it might be getting trapped in local minima with poor structural performance, likely due to the multimodality caused by the strong nonlinearity of the objective function in Eq. (14). In addition, even with the application of the Heaviside projection, complete removal of the grayscale is not achievable, and especially for low-volume structures, there is a tendency for discontinuities, leading to significant changes in maximum stress values before and after the binarization of $\gamma e$, as pointed out by Kato et al. [45]. These effects result in the solutions obtained by GTO having a sparse distribution in the objective space. On the other hand, as described in Sec. 3, data-driven MFTD employs an evolutionary algorithm, enabling gradient-free solution updates. This means it is less affected by the multimodality of the objective function. In addition, using Eq. (12) for high-fidelity evaluation of the maximum stress itself with discrete $\gamma e$, rather than using the $P$-norm stress with continuous $\gamma e$ in Eq. (14), allows the obtained solutions to dorm an orderly Pareto front. Here, the poor performance of the data-driven MFTD solutions in the range of volume fractions from 0.2 to 0.3 may attributed to the mutation method. As described in Sec. 3.1, in data-driven MFTD, we introduce an overlap constraint as a mutation method to solve the LF optimization problem, generating promising structures different from the reference design. The parameter $G~mutmax$, which controls the degree of overlap, uses a constant value independent of the volume. Therefore, while larger structures may be effectively mutated, smaller structures might face challenges in obtaining valid solutions. Due to the reduced effect of the mutation in low-volume regions, it is speculated that the method has led to a kind of local optimum. This suggests that there is room for improvement in the mutation strategy.

Comparing the optimized structures in Fig. 11, the designs obtained by GTO successfully avoid stress concentration at their reentrant corners. However, they consist of straight members and often have triangular or rectangular voids. One of the advantages of data-driven MFTD is that material distributions are represented as vectors and updated using a VAE, eliminating the need for sensitivity analysis. Therefore, as in Eq. (12), the maximum stress can be used directly as the objective function. This feature leads to overall curved structures with rounded appearances at their reentrant corners and elsewhere, as shown in Fig. 11, suggesting that stress concentration is further avoided. In addition, the optimized designs obtained through GTO exhibit various patterns, suggesting entrapment in local minima due to the multimodality of the $P$-norm stress in Eq. (14). On the other hand, the optimized designs obtained through data-driven MFTD exhibit nearly identical topologies regardless of volume, differing mainly in member thickness. Compared to GTO, data-driven MFTD achieves global search and appears to reach a promising structural topology. Optimized structures with volume fractions of 0.2–0.3, where these trends are clearly reflected, are shown in Fig. 13. In the case of GTO, it is evident that regardless of continuation thresholds, structures differ significantly even with only a 0.005 difference in volume fraction constraint. This confirms that solutions obtained through GTO are merely local solutions due to multimodality. On the other hand, the optimized structure obtained through data-driven MFTD in Fig. 13(e) maintains a consistent topology regardless of volume. This demonstrates an effective optimization, even for low-volume structures, where conventional GTO struggles, indicating resilience against the influence of multimodality.

As described earlier, it has been demonstrated that the data-driven MFTD framework can address the complex problem of maximum stress minimization by solving the simple problem of mean compliance minimization as a low-fidelity optimization problem. Compared to the solutions by conventional gradient-based optimization, the obtained structures exhibit comparable or better performance and have similar characteristics in terms of avoiding the stress concentration at reentrant corners. This finding suggests that data-driven MFTD may be capable of deriving promising solutions in a gradient-free manner, even in cases of strong multimodal problems where gradient-based optimization is more challenging or potentially infeasible. Note that using multiple initial values in gradient-based topology optimization might yield the optimized structures similar to or better than those obtained with data-driven MFTD. However, it is unclear which initial values should be employed, or whether better solutions exist in the first place. Compared to conventional gradient-based topology optimization, the result indicates that the data-driven MFTD method is likely to yield a unique set of Pareto solutions through an extensive search process.

To generate the data in Fig. 11, we run both data-driven MFTD and GTO codes over a 2.7 GHz AMD Ryzen Threadripper PRO 3995WX 64-Cores CPU. The VAE code for data-driven MFTD was run on a NVIDIA RTX A6000 GPU. The time required to generate the optimized structures in Fig. 11 was 33.7 min for GTO, while data-driven MFTD took 6.8 h. It should be noted that there are potential future improvements to accelerate data-driven MFTD, such as training a VAE every fixed iteration instead of every iteration and utilizing surrogate models for structural performance evaluation.

## 5 Conclusion

This article proposed a *latent crossover* strategy that performs crossover in the latent space of the VAE for the data-driven MFTD framework. Since the latent space is constructed with continuous real numbers, this article employed the SPX as a latent crossover operator based on the theoretical aspects of crossover in RCGAs. The results showed that the proposed method improves the search performance compared to the original method, which performs random sampling in the latent space. As an interesting aspect, this article confirms that the proposed method achieves almost the same performance as that of gradient-based topology optimization using the $P$-norm measure for the maximum stress minimization problem, despite only solving the mean compliance minimization problem as the low-fidelity topology optimization problem. Furthermore, it was found that the final results of the proposed method tend to achieve a similar topology, while the optimized results of the gradient-based method exhibit various patterns due to the multimodality caused by the strong nonlinearity of the $P$-norm measure. Hence, the data-driven MFTD approach is expected to yield a unique set of Pareto solutions through gradient-free searching.

The concept of latent crossover enables the integration of evolutionary algorithms and machine learning methods. In our future work, we plan to incorporate various types of evolutionary algorithms other than RCGAs, as well as VAE-based advanced machine learning methods into the proposed framework. In addition, to verify the efficacy of the proposed framework on different optimization problems, we consider developing a systematic formulation method for the low-fidelity optimization problem and plan to apply it to other multimodal problems involving strongly nonlinear physical phenomena.

## Acknowledgment

The second author was supported by JSPS KAKENHI (Grant Nos. 20KK0329, 20H02054, and 23H03799).

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.