Utilizing a neural network, individual down-axis images of combustion waves in the rotating detonation engine (RDE) can be classified according to the number of detonation waves present and their directional behavior. While the ability to identify the number of waves present within individual images might be intuitive, the further classification of wave rotational direction is a result of the detonation wave’s profile, which suggests its angular direction of movement. The application of deep learning is highly adaptive and, therefore, can be trained for a variety of image collection methods across RDE study platforms. In this study, a supervised approach is employed where a series of manually classified images is provided to a neural network for the purpose of optimizing the classification performance of the network. These images, referred to as the training set, are individually labeled as one of ten modes present in an experimental RDE. Possible classifications include deflagration, clockwise and counterclockwise variants of co-rotational detonation waves with quantities ranging from one to three waves, as well as single, double, and triple counter-rotating detonation waves. After training the network, a second set of manually classified images, referred to as the validation set, is used to evaluate the performance of the model. The ability to predict the detonation wave mode in a single image using a trained neural network substantially reduces computational complexity by circumnavigating the need to evaluate the temporal behavior of individual pixel regions throughout time. Results suggest that while image quality is critical, it is possible to accurately identify the modal behavior of detonation waves based on only a single image rather than a sequence of images or signal processing. Successful identification of wave behavior using image classification serves as a steppingstone for further machine learning integration in RDE research and development of comprehensive real-time diagnostics.
Image analysis plays a critical role in combustion research as it not only provides a means to slow down highly dynamic processes for human and computer visualization [1,2] but also plays a role in permitting real-time control through digital interpretation . Chemiluminescence, or the spontaneous emission of light as excited chemical species returning to their ground state, is often used as an indicator of heat release rate. Nori and Seitzman  provided a detailed review of the use of chemiluminescence in the visible and ultraviolet spectrum emitted from excited-state species (CH*, OH*, C2*, and CO2*) to provide a qualitative and quantitative spatially resolved measure of significant chemical kinetics that can be attributed to deflagration flame properties such as heat release. Similarly, Shepherd  provided a comparison between Schlieren images of detonation waves to OH* chemiluminescence in order to accurately define the location of the wavefront. Time-resolved images of the propagating detonation wave (via chemiluminescence) could be used to ascertain an average detonation wave speed that could provide a quality of the wave in comparison with theoretical properties.
Detonation combustion is beginning to play a prominent role in the development of high-efficiency gas turbine engines. Pressure gain combustion (PGC) achieved through detonation, as opposed to deflagration, potentially offers a power cycle alternative to isobaric combustion of the idealized Brayton cycle, which could result in higher thermodynamic efficiency and power output. This increase in efficiency is the result of an increase in total pressure for the combustion products with lower entropy generation for the same heat release compared with deflagration-based combustion .
Rotating detonation engines (RDEs) offer a theoretical means of achieving pressure gain through detonation that could be applied in a gas turbine engine. RDEs function as a continuous detonation device in which a detonable fuel-oxidizer mixture is injected into an annular channel producing a self-sustained detonation wave that propagates around the channel circumference, as shown in Fig. 1 . While an inherently unsteady process, the rapid progression of the detonation wave(s) within the combustor combined with optimized flow control produces an exhaust flow that could be introduced to turbomachinery with minimal impact to individual component efficiencies [8–10].
Time-based diagnostics such as dynamic pressure and chemi-ionization (flame ionization) are often used to characterize the performance of the RDE. Using conventional measures, a cross-correlation of multiple sensors can provide information describing detonation wave behaviors [11,12]. However, as point measurements, they are unable to provide instantaneous spatial resolution across the entire combustor and may confuse complex modes which often requires additional interpretation of modal changes and other phenomena.
Various studies have independently investigated detonation wave behavior by tracking azimuthal pixel intensity throughout time in high-speed images [13–15]. Time-varying polar pixel intensity within the annulus is commonly shown by way of contour plots, which are often referred to as detonation surfaces or X–t diagrams. Further analysis involving Fourier or Hough transforms can accurately evaluate the progression of individual detonation waves to determine wave number, frequency, and speed. Those approaches, however, rely on extensive post-processing and evaluation of a series of consecutive images captured at high framerates. This is in opposition to the methodology described in this paper that suggests the interrogation of a single image using a data-driven machine learning model coupled with a conventional high-speed pressure transducer could ultimately lead to an accurate, real-time diagnostic for the detonation wave mode determination including number, direction, and speed. Unlike the cross-correlation routine previously discussed, the use of imagery permits spatial resolution in the RDE annulus. In addition, the proposed single image method would not be dependent on assumptions of wave number and direction that is often associated with conventional wave behavioral analysis in RDEs.
Traditional feature extraction and image analysis are time-consuming and computationally expensive processes. For studies of time-dependent processes, such as progressing detonation wave(s), extracting temporal details such as operating frequency requires a large number of images, often 1000 frames , depending on the desired resolution. As such, these methods are rarely considered for real-time applications. In fields such as radiology and facial recognition , traditional automated image processing methodologies are being replaced by various deep learning algorithms such as convolutional neural networks (CNNs). Unlike traditional artificial intelligence techniques, CNNs are well-understood tools and are often used for rapid feature extraction and image classification. CNNs preform feature extraction by using nonlinear activation functions to filter features of an image utilizing only relevant regions as opposed to repeatedly processing the complete image. This enables a drastic reduction in memory requirements and computational times for the analysis of a large number of images. These networks then classify images using a Softmax function which determines the probability that an image belongs to a certain classification.
This study outlines the workflow to develop and train a data-driven model utilizing a CNN for the purpose of rapidly assessing the detonation mode in an RDE. This model is considered a data-driven model as it is trained on images that have been manually classified based on the known wave mode. It is important to note that while it may take tens or hundreds of thousands of images to train the model, once trained the user would be able to identify the detonation mode (wave number and direction) from a single image. When coupled with high-speed pressure data, an accurate real-time diagnostic for the wave mode including wave speed may be possible in a future work. Furthermore, a trained model stemming from the algorithm described here consists of two files (model and parametrized weights) which could easily be shared and implemented by others operating similar experiments by writing only a few lines of code.
As a first step, the current study is limited to outlining the algorithm followed to develop and train the data-driven CNN model to identify wave number and direction. Approximately 100,000 images were acquired, manually classified by visual inspection and then divided into sets for training and validation. The training set consisted of a randomized series of classified images that define the mode of the detonation wave in the combustor. These images were used to optimize the parameterized weighting functions within the neural network using TensorFlow and Keras libraries in python . Once the network was trained, the validation image set is used to characterize the performance of the CNN. The validated CNN is then available for practical usage in rapid mode classification detecting quasi-steady modal behavior as well as unsteady behavior and modal changes.
Classifying wave modes through image analysis has been the goal of multiple studies in the RDE community [13–15], but this method is the first to do so using individual images, reducing computational time significantly. At this time, possible benefits of different wave modes such as reduced losses or ease of high-pressure turbine integration are not yet fully understood by the RDE community. It is possible that a preference may arise between co- or counter-rotational (CR) modes, while it may also be possible to optimize annulus fill heights for a known number of waves. Although the value of this information is not yet fully known, it is apparent that being able to accurately classify these modes at speeds closer to the timescale of the RDE is comparatively beneficial.
Experimental Setup and Analysis
Tests for this study were conducted in the high-pressure combustion test facility at the Department of Energy’s (DOE) National Energy Technology Laboratory (NETL). The RDE utilized in this study is based on the 6-in. (152 mm) Air Force Research Laboratory (AFRL) geometry that has been widely examined in a number of academic studies [18–21]. Figure 2(a) provides an image of the RDE installed in the test facility at NETL. This rig is unique in that the RDE is installed in a ducted exhaust as opposed to being open as observed in many RDE research facilities. The ducted exhaust, shown in Fig. 2(b), combined with a high-temperature control valve (not shown) provides the ability to regulate the operating pressure independent of the RDE channel and exit geometry. This ability is beneficial for studies interested in the integration of the RDE with turbomachinery [8–10].
The 100-mm long combustor annulus has an outer diameter of 149 mm with a combustor gap width of 5 mm and an air injector to combustor area ratio of 0.2. The RDE was operated on hydrogen in air over a range of equivalence ratios while maintaining a constant air mass flowrate of 0.37 kg/s. Pre-combustion pressure was set at 0.18 MPa with a combustion air inlet temperature of 340 K.
As tests were performed on an uncooled rig, excessively high temperatures associated with the detonation (>1700 K) limited the test duration to 6–10 s to prevent damage to the experiment and sensitive instrumentation. A number of conventional measurements, such as dynamic and static pressure, OH* chemiluminescence via a UV bandpass filtered photomultiplier tube, ion probes to measure chemi-ionization from the detonation wave, and thermocouples were placed at various axial and azimuthal locations throughout the combustor. Time-dependent measurements such as pressure (measured using a semi-infinite tube pressure (ITP) coil and PCB model CA102B04/CA102B15 pressure transducers), OH*, and chemi-ionization were recorded at 250 kHz and provide a means to quantify the performance of the detonation with regards to wave speed, number of waves, and wave direction. To ascertain the average detonation wave speed, the power spectral density (PSD) of the time-based signals determines the dominant frequency which is used to define the detonation wave speed. However, this commonly used methodology for defining wave speed relies on approximating the number of waves that are present in the combustor as well as their direction of rotation.
An example of the PSD for dynamic pressure measured 10 mm downstream of the injection plane is shown in Fig. 3. The dominant frequency is defined by the maximum peak amplitude (energy per frequency) with lesser peaks associated with additional periodic content or harmonics of the dominant mode. For the example shown in Fig. 3, the dominant frequency was determined to 7.1 kHz. The theoretical Chapman–Jouguet (CJ) wave frequency for the test RDE at the experimental conditions was found to be 4.45 Hz (fCJ = UCJ/DRDE) using the SDToolbox  and cantera .
As the theoretical CJ frequency is less than the measured dominant frequency, it is assumed that there are two waves present in the combustor, suggesting a wave frequency of 3.55 kHz, or 80% UCJ, for each individual wave in the combustor. Quantifying the wave speed and number provides a comparison for later methods based solely on image analysis that do not rely on underlying assumptions such as theoretical CJ wave speed.
To capture images of detonation waves, a 50 mm × 20 mm thick quartz viewport is included in the elbow section of the ducted exhaust as shown in Figs. 2 and 4. The window utilizes a nitrogen purge to limit the temperature exposure to the hot exhaust gas. A UV-Nikkor 105 mm lens was coupled to Vide Scope VS4-1845HS-UV intensifier and a Photron FASTCAM SA-Z high-speed digital camera. The off-axis arrangement of the camera with mirrors (Fig. 4) provides an axial view of the RDE annulus while protecting the camera in case of failure of the viewport. Down-axis high-speed images of the RDE annulus are recorded at 50,000 frames per second (fps) with a resolution of 301 × 301 pixels and a gate width of 20 µs on the intensifier.
Detonation waves appear as high-intensity regions within the image and are constrained by the annular gap within the RDE. An example image, displayed in Fig. 5, shows a detonation wave in the right-most portion of the annulus, traveling around the path of the annular gap which is overlaid for visual reference.
The number and direction of detonation waves present in the annulus of an RDE is referred to as a mode. Possible modes commonly seen in RDEs include one or multiple waves traveling in the same rotational direction, either clockwise (CW) or counterclockwise (CCW), as well as CR and longitudinal pulse behavior. Throughout this text, a common naming convention will be used when referring to a specific mode, in which the wave quantity is directly followed by the wave direction. For example, when three counterclockwise detonation waves are present, the proper label is 3CCW. The RDE utilized in this study typically experiences only a limited number of modes during operation. Therefore, the dataset chosen to perform the initial training of the CNN includes six common modes: 1CW, 1CCW, 2CW, 2CCW, 3CW, and 3CCW. An example image of each of these modes is shown in Fig. 6. Additional modes, including counter-rotating and deflagration modes, are considered in a subsequent dataset discussed later in this paper.
It should be noted that the pixel intensity of the detonation waves was scaled, and the noise filtered to improve human readability of this figure. Utilization of the same images for training, validation, or analysis would not require such enhancement. Even with scaling, it is apparent that variations exist in the pixel intensity, such that as the total number of waves increases, overall pixel intensities decrease. This is due in part to a reduced fill height and a subsequent lesser volume of reactants available for each wave. An additional hindrance to wave visibility that occurs in the upper left quadrant of the image is due to a weld that is further discussed in the next subsection, Convolutional Neural Networks. However, even when a wave is out of view, the radial spacing of waves can be used to infer the presence of a wave occupying the annulus space not visible to the down-axis camera. Specifically, multiple waves are spaced somewhat evenly around the annulus circumference. For example, Fig. 6(e) shows an image where three waves are present, but only two are visible. Because the two visible waves are spaced at approximately 120 deg, instead of a 180 deg spacing expected of two-wave modes, the third wave’s presence is known. This concept is leveraged during the manual classification of the dataset and is inherited by the CNN.
While human detection of the quantities of detonation waves quantities among images in Fig. 6 may be innately simple, discerning wave direction is much more difficult. Unlike other RDE image processing methods which use a collection of sequential images offering time dependencies, this study aims to draw conclusions from a single image. Although differences in some images depicting CW and CCW behavior are not obvious, detonation waves present themselves with distinct luminosity profiles which clearly suggest angular direction. These profiles are most clearly seen when only one detonation wave is present in either direction, as shown in Figs. 7(a) and 7(c).
To further acknowledge the profile differences of each wave direction, pixel intensities along the center radius of the annulus are plotted against azimuthal location in Figs. 7(b) and 7(d). Intensity profiles for each rotational direction are similar in structure, but profile characteristics occur in a reversed order along the azimuthal path, indicating opposite directions. In both instances, an abrupt spike in intensity caused by the blunt face of the detonation front is followed by a slow decay caused by the trailing profile. It is important to note that the intensity trends plotted in Fig. 7 are not an input for the CNN. The profile variations simply suggest the existence of differing wave profile features, which will foster directional classification of an individual image by a CNN.
As shown in Fig. 7, the accurate assessment of wave number is more intuitive than the prediction of a wave direction in a single image. An additional imaging error is introduced by the experimental setup. Due to a protruding weld in the exhaust section of the RDE, the downstream view of the upper-right segment in the annulus is obstructed. Therefore, a portion of the annulus is consistently represented by lower pixel intensities. This behavior is represented in Fig. 8, which displays a contour of the maximum Fourier coefficient at each pixel region throughout a sample of 100 frames. Approximately 25 deg of the upper portion of the annulus will not be visible, while an additional 50 deg in each upper quadrant is only partially visible. Due to the low visibility, the physical presence of a wave in this region may result in incorrect classification by the neural network. In these cases, a given mode may be classified as one with the same angular direction, but one fewer or one additional wave present. Because the neural network is trained on images that are manually classified as the correct mode, this confusion is expected to be very minimal.
Convolutional Neural Networks.
Convolutional neural networks are a type of the deep learning network that excel at analyzing images for categorical classification. Analysis is accomplished by taking an input image and applying a set of trainable weights and biases to various aspects in the image, allowing the network to differentiate between the various features of an image. In this application, those features indicate differences between varying wave direction and number. The strength of CNNs over traditional neural networks is the application of convolutional layers which work by sliding a filter over a finite pixel region and taking the dot product between filters and regions of an image to produce an activation map. As the CNN becomes deeper with multiple convolutional layers, the dot product of the deeper layers inherits dot products of the previous convolutional layers allowing low-level features to be built into high-level features. In this type of network, each artificial neuron is connected to only a small part of the input volume, but they all have the same weights. The sharing of weights is referred to as parameter sharing and allows the CNNs to be locally connected rather than fully connected as in a traditional neural network. The ability of these networks to preform parameter sharing while having local connectivity reduces the number of trainable parameters in the system, i.e., neuron weights, leading to a more computationally efficient system.
Another important aspect of CNNs is the use of max pooling layers which reduces the spatial dimensionality of an input image. Pooling layers play a key role in the computational efficiency of CNNs by reducing the dimensionality of the input data resulting in fewer parameters and thereby lesser computational expense. These max pooling layers operate as independent layers from the convolutional layers. Max pooling layers achieve dimensionality reduction by applying a filter of a specified size to an input image and taking the max pixel value within the filter location. The filter is then moved by some pixel distance, defined as a stride length, where the filter is again applied. The process is repeated until the entire image has been filtered and reduced.
Convolutional neural networks have been proven to outperform all other types of classification algorithms in the ImageNet object identification challenge [24–27]. These networks are highly flexible and have been applied to a variety of specialized applications. Dering and Tucker showed that CNNs could be used to predict a products function given its form . Mao et al. utilized CNNs to predict unsteady wave forces on bluff bodies due to the free-surface wave motion . CNNs have also been used in conjunction with sensors to diagnose faults in rotating machinery as shown by Xia et al. . Neural networks have already been applied in the field of gas turbines through the work of Tong where they displayed the effectiveness of using machine learning to predict core sizes of high-efficiency turbofan engines .
Convolutional neural network designs are inspired by the connectivity patterns of neurons in the human brain as the architectures attempt to simulate human visual cortices. Images are broken down into discrete areas known as receptive fields in which neurons respond to stimuli only in that field. These fields are overlapped together to cover the entire visual area. By breaking down the image into these fields, it is possible to draw out the spatial and temporal dependencies in an image by applying relevant filters.
The arrangement and specification of convolutional, max pooling, and other layers is referred to as the CNN architecture. There are a multitude of unique architectures that exist in the literature consisting of various combinations of filters, layers, neurons, and other trainable parameters, termed hyperparameters, that all attempt to achieve the same goals as quickly and efficiently as possible. Hyperparameters are tunable parameters that can significantly affect the performance of a CNN. Typical hyperparameters are learning rate, activation function, kernel initializers, stride length, kernel size, and pooling size.
Convolutional Neural Network Methodology
Images considered in this study are recorded by a Photron FASTCAM SA-Z, at a rate of 50,000 fps and a gate width on the intensifier of 20 µs. Depending on the detonation wave speed and the number of waves present, this framerate captures waves at 10–15 azimuthal locations throughout each revolution.
Figure 9 represents the algorithm used to train the CNN model which includes four major efforts: Imaging, CNN Training, CNN Validation and CNN End Usage. The Imaging portion includes manual classification, or labeling, of the images according to the number of waves present in the annulus as well as their direction. Images can be classified in several ways. The visual classification of a series of images while the RDE maintains a quasi-steady mode is performed. Images considered in the initial development of this methodology include six classifications or wave modes. Those modes are CW and CCW variants of single, double, and triple co-rotational wave behaviors. Classified images are then split, as depicted in Fig. 9, into two sets labeled Training Image Set and Validation Image Set for separate use in CNN Training and CNN Validation processes, respectively.
In CNN Training, it is important to shuffle training images beforehand to ensure the CNN is incrementally considering small training batches that are representative of each possible classification. Shuffling removes the time-dependency by separating sequential images prior to classification. Randomized training does not prevent the future use of the trained CNN as a temporally resolved diagnostic, but just reduces the potential of bias error in the derivation of the model.
The incremental performance of the network is evaluated as an ability to accurately classify labeled validation images each time the network has updated its trainable weights for the entire set of training data. Since neural networks are global approximators, this approach ensures that a CNN does not overfit the system to just the set of images used for training. This process is repeated for several iterations, or epochs, until a desired validation accuracy is reached, completing the CNN Training effort. To complete CNN Validation, the Validation Image Set is fed to the Trained CNN. Once the proper validation accuracy of the Trained CNN is achieved, the CNN is considered to be a Validated CNN, which is suitable for CNN End Usage. In the CNN End Usage stage, shown in the lower-right portion of the flowchart and enclosed in a bold box in Fig. 9, newly recorded images can be fed to the Validated CNN for the classification of the wave mode. A user of the trained and validated CNN would only need to follow the three steps included in the CNN End Usage portion of the algorithm, enclosed in a bold box in Fig. 9, in order to classify an image recorded during a test. This process significantly reduces the time required to determine the wave mode and when ultimately coupled with instantaneous sensor data, the real-time determination of wave mode and speed could be possible.
While it is possible to design an entirely unique CNN architecture, previously successful architectures designed for image classification are incredibly valuable to the practitioner. This study will adapt some of the most notable, publicly available architectures including AlexNet , VGG16 , ResNet , LeNet , as well as a lesser known architecture such as SqueezeNet , and adapt them to best fit the needs of high-speed multi-wave and bidirectional image classification. In each case, architectures are fully accessible and adjustable unlike other machine learning platforms often categorized as black box algorithms.
Networks considered in the present study were mostly designed for classification using images in the ImageNet database. Thus, certain parameters need to be changed before training could be achieved. This requirement is due to the greater similarity among the RDE images compared with those used in the ImageNet database, i.e., dog, plane, and zebra. For example, differences between two images showing 1CW wave behavior and 1CCW wave behavior are incredibly small compared with differences between two images of a cat and that of a boat. The ImageNet competition factors in a top 5 classification error score in choosing a winner, so the CNNs were designed to maximize that score. However, within the RDE initially considered in this study, there are only six classes, so the architecture should maximize only the top 1 classification. In order to compensate for the similarity of the different RDE image classes, the learning rate of the system was reduced to 10−6, where a traditional learning rate is on the order of 10−3. Adjusting the learning rate allows the network to learn slower making smaller changes to the weights of the system, thereby not missing out on the finer details of the system. The smaller learning rate coupled with a “he normal” kernel initializer for the initial distribution of weights allowed for the network to capture finite changes between wave directionality, as exemplified in Fig. 7. A unique advantage of the dataset used in this study is that because RDE detonation waves rotate around an annulus each image could be rotated 360 deg and still represent a possible true image. These synthetically created images allowed the CNN to visualize each class of wave at all positions around the annulus.
Preferred Convolutional Neural Network Architecture Selection (SqueezeNet).
SqueezeNet, visualized in Fig. 10, is a unique architecture in the CNN design space. It was developed to achieve the same accuracy as AlexNet but with fewer parameters and a smaller model size. SqueezeNet is unique among CNNs because it incorporates a building block referred to as a fire module. Fire modules are built using three different strategies: maximizing 1 × 1 convolutional filters, reducing the input channels when using 3 × 3 filters, and downsampling on the backend of the network in order to increase the networks activation map size. These three strategies are utilized in the fire module by the creation of a squeeze convolution layer comprised of only 1 × 1 filters which feeds an expanded layer with a mixture of both 1 × 1 and 3 × 3 layers. By limiting the hyperparameters in the fire module, i.e., the size of the filters, the architecture can limit the number of channels exposed to the 3 × 3 filters, which reduces the number of trainable parameters making the model smaller and more computationally efficient. The architecture is comprised of a single convolutional input layer followed by eight fire modules and ending with another single convolution layer. Max pooling is preformed after the first layer, fourth fire layer, eighth fire layer, and last convolutional layer all with a stride of 2 pixels. By using these strategies, SqueezeNet can achieve a comparable accuracy with 50 times fewer parameters and a model size that is less than 0.5 MB compared with larger architectures like AlexNet which has a model size of 255 MB. These changes proved advantageous to the total amount of time needed to classify the images. The smaller architecture size is less computationally expensive to run, while the fewer parameters results in less mathematical computations being run. Since the accuracy is on par with more computationally expensive models such as AlexNet or ResNet, the main benefit is the speed and accuracy at which images can be classified.
Results and Discussion
Performance Comparison of Five Convolutional Neural Networks.
For training, classified images were shuffled randomly into a training set and validation set consisting of 65,600 and 16,400 images, respectively. For each architecture, a different number of epochs was needed to reach accuracy convergence. The accuracies for each architecture are shown in Fig. 11. The number of epochs at which the accuracy plateaus depends heavily on the number of convolutional layers included in the structure of the network as well as the subsequent volume of trainable parameters. As seen in Fig. 11, a deeper network such as AlexNet may require 100 training epochs while a shorter network such as LeNet requires just six. A reduced number of necessary training epochs results in a lower upfront computational effort, saving time and computing resources during training. The number of required epochs is considered when selecting a preferred network to optimize initial training time as well as validation speed.
The final training and validation accuracies achieved by each model and their associated epochs are summarized in Table 1. An additional parameter is also tabulated to represent the speed at which the fully trained network can classify individual images from the validation dataset. The speed was recorded using a local desktop HP EliteDesk 800 G2 DM 65W computer with an Intel® core (TM i5-6500 CPU @ 3.2 GHz and 16.0 GB of RAM). Due to the fact that each architecture achieves accuracies above 95%, the number of required training epochs as well as the classification speed are the driving factors when selecting the preferred architecture.
|Architecture||Training accuracy||Validation accuracy||Training epochs||Speed (s/frame)|
|Architecture||Training accuracy||Validation accuracy||Training epochs||Speed (s/frame)|
For both measures, SqueezeNet and LeNet surpass the performance of the remaining architectures with low epochs and efficient classification speeds. Although LeNet achieves similar accuracies using only a fraction of the epochs used by SqueezeNet, SqueezeNet’s improved classification speed provides a practical efficiency that is realized each time new images must be classified. This faster classification speed is an expected advantage of the fire modules which are unique to the SqueezeNet architecture. In an experimental setting, SqueezeNet’s 45% improvement in classification speed over LeNet will result in significant time savings, with little sacrifice in accuracy, and is therefore chosen as the preferred neural network throughout the remainder of this text.
While processing speeds of other common imaging methods are not reported directly in the literature, those methods can be timed to provide a reference speed. Processing the same images evaluated throughout this study, the method presently employed at NETL  displayed a classification speed of 0.5892 s/image. It is important to acknowledge that this timed classification is not optimized for efficient processing speeds, but boasts the ability to also calculate wave speed and frequency. However, classification data points require the analysis of at least 100 sequential images, translating to 58.92 s per classification, thus likely eliminating any possible consideration as a real-time diagnostic.
Performance of Preferred Convolutional Neural Network: SqueezeNet.
The learning curve is shown in Fig. 12 which shows the training and validation accuracy as a function of epoch. While the training accuracy is seen to exhibit smooth behavior in Fig. 12, the validation accuracy does not always improve with each additional epoch. The somewhat sporadic behavior of the validation accuracy implies that the weights associated with the final epoch may not offer the highest validation accuracy. The magnitudes and effects of this behavior vary between architectures. For this reason, it is imperative to record and use the model weights associated with the highest of both accuracies, which is common practice in the application of CNNs.
To visualize the final performance of SqueezeNet, a confusion matrix is compiled in Fig. 13. This matrix tabulates the number of each predicted classification for a given true classification. For example, Fig. 13 shows that 3900 1CW images are classified as 1CW by the network with 0 images being incorrectly classified as any other modes, which shows a 100% accuracy of the network on 1CW images. Figure 13 also displays lesser performances, such as the evaluation of 4000 images of known 3CW behavior in which a great majority of images are predicted correctly except for one and three images labeled as 2CW and 3CCW, respectively. These errors are a result of previously predicted visibility issues and directional confusion but remain insignificant.
To understand the performance of the model with respect to the number of images available in each classification set, a normalized confusion matrix is shown in Fig. 14. In this configuration, cells in each row display the percent of predictions of a given subset, totaling 100%. Intuitively, the ideal confusion matrix should have maximum intensity across the diagonal as a result of perfect classification of every image.
Other misclassifications of 2CW and 2CCW images are likely a result of a low sampling density, which is depicted in Fig. 13 showing that the total number of images present in the data subset is significantly lower than other classification subsets. Although this could likely be resolved with an increase of 2CW and 2CCW images in the training dataset, infrequent misclassifications are negligible due to the practical understanding of the RDE operation.
Accuracy improvements would be achieved if the architecture instead classified experimental images free from obstruction. With a full view of the annulus gap, identifying the number of waves present would become further trivial while the classification of wave direction would improve as a result of more visible profile features. This improvement in the experimental setup addresses the error associated with 86% of the total image misclassifications. Additionally, a more uniform sampling density across each classification type would likely increase lesser accuracies that were seen in modes 2CW and 2CCW.
Extension of Evaluated Modes: Counter-Rotating and Deflagration.
As noted throughout the text, modes experienced in the studied RDE are limited as they do not include counter-rotating behaviors. Figure 13, discussed previously, confirms proper CNN feature extraction for high-speed images containing co-rotating detonation waves within an RDE. To evaluate a more complex range of modal behaviors representative of those seen throughout the RDE community, an extended dataset is created and subjected to the same process steps outlined in Fig. 9. Images capturing deflagrative behavior, or operation including no establishment of detonation waves, are added as an additional classification. A lack of detonation waves does not directly correspond to the adopted naming convention and will, therefore, be referred to simply as Def. Images recorded down-axis of another well-studied RDE at Purdue University [35–37] containing single, double, and triple counter-rotating waves are utilized as well. According to the naming convention, these are denoted as 1CR, 2CR, and 3CR, respectively. Examples of the four new modes are shown in Fig. 15.
When acquiring a set of images for the purpose of training a CNN, two common sampling types are generally favored: experimental images recorded using an identical setup, or images recorded under vastly dissimilar conditions. Results presented in Figs. 13 and 14 represent the former due to consistent origin, experimental setup, and pixel dimensions across the dataset. An example of the latter would be represented by a database encompassing various designs, framerates, filters, and other imaging factors associated with several RDE studies. Combining two unique datasets (NETL and Purdue) falls between either preferred dataset structure but serves to show that complex modes can be accurately classified by the network. Therefore, some inherent confusion between the two datasets is expected.
Nonetheless, the performance of the CNN trained on the extended dataset, summarized using a normalized confusion matrix in Fig. 16, serves as proof of an overarching ability to simultaneously classify complex and co-rotational modes.
In both cases of notable error, specifically the misclassification of 2CCW and 3CW as 1CR, confusion between the two datasets is a major contributor. This confusion accounts for 14% and 7% error in 2CCW and 3CW classifications, respectively, and would be significantly reduced if the dataset better conformed to one of the preferred sampling strategies.
Since extension of the dataset was primarily performed to study the response of the CNN when exposed to complex modes, it is most important to note the high accuracies associated with those modes. Those values are indicated within the four lower-right cells in the diagonal. Deflagration, although not a mode of detonation, is perfectly classified across the dataset. Similarly, 1CR and 2CR images are each correctly classified with 99% accuracies.
Less accurate classification of 3CR images is in part a result of lower overall pixel intensity due to the presence of six detonation waves and a subsequent reduced fill height within the annulus. This trend is consistent with images previously presented in Fig. 6 and is the cause of a lesser defined wave profile in Fig. 15(c). However, the poorest performance among the complex modes of 93% classification within the 3CR subset exceeds the lowest performance among the purely co-rotational analysis, and therefore proves the ability classify complex modes by the way of feature extraction.
To increase the performance of SqueezeNet on the RDE images, several image processing techniques could be applied when training. In addition to rotating images 360 deg around the annulus, sheer intensity could be added to each picture to account for different lighting conditions. Gaussian noise could also be applied to the images to help bolster the network and its ability to interpret noisier images captured under different conditions. These changes could help the network approximate better to other environmental conditions and make the system more robust; however, they likely will come at some cost to overall accuracy.
Practical Application and Future Work.
Proper implementation of the proposed methodology results in a validated CNN which is easily employed as a diagnostic tool offering high accuracies, processing speeds, and possible data acquisition integration for real-time classification. End usage of the validated CNN, accentuated by a red outline in Fig. 9, can be applied to time-sequential images. Therefore, modal changes and instabilities can be observed through a series of frames. While the CNN will not directly report these behaviors, the continuous classification of each frame will make them innately obvious to the practitioner. For example, if consistent 2CW classifications are suddenly followed by consistent 3CCW classifications, it is apparent that a mode change has occurred. Similarly, if classifications constantly alternate between multiple modes, it is reasonable to assume unsteady modal behavior within the annulus. While the ability to detect mode changes may seem contrary to the previously suggested negligence of misclassification outliers, the timescale in which mode changes are experienced is much longer than that of a single frame. Therefore, relying on this practical understanding of the RDE, one outlier of 3CCW in a stable series of 2CW classifications is easily identified as a sparse error. Recognition of these phenomenon can be easily automated if of significant interest.
As mentioned in the previous subsection, Extension of Evaluated Modes: Counter-Rotating and Deflagration, the extended dataset does not strictly follow either of the two recommended data approaches. Because the data are neither specific to a single RDE or a result of broad parametric sampling, the resultant network is not expected to be especially tuned to the experimental setup at NETL, or robust enough to classify images from any given RDE. Therefore, the network best suited for future in-house work is the initial network trained only on images acquired at NETL. Finalized weights of both networks are viable, but those trained only on the subset of modes experienced in the rig of interest are better suited for practical applications specific to the rig of interest.
With an established ability to classify wave modes quickly and accurately with a CNN trained using the proposed methodology, it is now possible to integrate that information with existing pressure correlation techniques to improve wave speed calculations as the focus of a future work. While methods to calculate wave speed using pressure data already exist, those point measurements lack spatial resolution and may, therefore, confuse more complex modes or secondary behaviors. Additionally, those methods which rely only on pressure traces may require large sampling sizes and will, therefore, be unlikely to reach real-time feedback capabilities. However, with wave number and direction known, smaller windows of pressure traces may be quickly analyzed to determine wave frequency. A concept of the data selection is shown in Fig. 17, where after determining detonation wave number and rotational direction from an individual image using the CNN, a very short sampling of pressure data from a single sensor is needed to find the wave velocity and frequency. It is important to note that this technique does not aim to analyze every image or every pressure reading. Instead, for each iteration i, a classification of a single image captured at ti and pressure data within a sample window of width ΔTSi will be acquired. Velocity calculations will be performed within the time window indicated as ΔtCi in Fig. 17. The next iteration, beginning after the calculation of the previous iteration are complete, will acquire the most recent data for the next loop, beginning at ti+1. In general, this means that only one image at the beginning of each loop will be considered, and the pressure data in the highlighted section are not considered. Both operations will be optimized for accuracy as well as processing speeds.
The classification of the number and direction of detonation waves present in a RDE annulus during the combustor operation is critical to advancing the control and integration of RDEs in gas turbine engines. A methodology for wave behavior classification was demonstrated in this work through the analysis of individual downstream high-speed images using a CNN. The goal of this work was to use CNNs to classify wave number and direction in multi-wave modal behavior in an experimental RDE. Classified modes included clockwise and counterclockwise variants of one, two, and three co-rotational detonation waves, single, double, and triple counter-rotating detonation waves as well as deflagration. Direct high-speed images of detonation waves were collected down-axis of experimental RDEs at the US DOE NETL and Purdue University using high-speed digital cameras, then manually classified for the application of CNN architectures.
After surveying and successfully training five publicly available CNN architectures, SqueezeNet was chosen for its classification speed of 0.0229 s/frame. Achieving training and validation accuracies of 99.6% and 98.5%, respectively, SqueezeNet demonstrated the ability to consistently classify the number of waves present as well as their direction in individual images, and did so in a fraction of the time required by other common methods. This capability extends the application of this method to a variety of RDE imaging techniques regardless of framerate limitations. The one-time execution of the training methodology results in a trained and validated CNN available for unlimited end usage, classifying modes experienced in newly captured images.
Additional work to develop a CNN trained using an extensive image set representative of various RDE imaging efforts may be completed in the future and potentially be made available to the RDE community. Another potential application of CNNs in the field of RDE research is the ability to actively track the wave around the annulus. Using object detection in three-dimensional CNNs would allow the tracking of detonation waves in time, and therefore would permit the network to determine the frequencies of the waves in the annulus, and may be developed in future studies.
Extending the proposed methodology alongside conventional high-speed sensor data to achieve real-time detonation wave classification capabilities, including wave frequency, will be addressed in future CNN development. The demonstrated and proven approach offers an early example of CNN applications aiding in the development of RDE technologies.
This research was supported by an appointment to the National Energy Technology Laboratory Professional Internship Program, sponsored by the U.S. Department of Energy, administered by the Oak Ridge Institute for Science and Education (ORISE), and greatly assisted by additional images provided by Dr. Carson Slabaugh at Purdue University.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper. Data provided by a third party are listed in Acknowledgment.