References

1. Blanchard, G. T., S. Sundeen, and K. B. Baker (2009), Probabilistic identification of high-frequency radar backscatter from the ground and ionosphere based on spectral characteristics, Radio Sci., 44, RS5012, doi:10.1029/2009RS004141.

2. Bland, E. C., A. J. McDonald, S. de Larquier, and J. C. Devlin (2014), Determination of ionospheric parameters in real time using SuperDARN HF Radars, J. Geophys. Res. Space Physics, 119, 5830–5846, doi:10.1002/2014JA020076.

3. Burrell, A. G., S. E. Milan, G. W. Perry, T. K. Yeoman, and M. Lester (2015), Automatically determining the origin direction and propagation mode of high-frequency radar backscatter, Radio Sci., 50, 1225–1245, doi:10.1002/2015RS005808.

4. Cousins, E. D. P., and S. G. Shepherd (2012), Statistical characteristics of small-scale spatial and temporal electric field variability in the high-latitude ionosphere, J. Geophys. Res., 117, A03317, doi:10.1029/2011JA017383.

5. Ribeiro, A. J., J. M. Ruohoniemi, J. B. H. Baker, L. B. N. Clausen, S. de Larquier, and R. A. Greenwald (2011), A new approach for identifying ionospheric backscatter in midlatitude SuperDARN HF radar observations, Radio Sci., 46, RS4011, doi:10.1029/2011RS004676.

6. https://github.com/vtsuperdarn/davitpy

Summary

Our model tested the K-Means and GMM using eight available features. The traditional method uses only velocity and spectral width. Our method was consistently more accurate than the traditional method when compared to ground truth, across data from different dates. More accurate classification of ground scatter may be useful to gain a better picture of the weather conditions of the ionosphere. These preliminary results demonstrate that GMM is a promising algorithm for classifying backscatter, and future refinement could be done to increase the accuracy further and apply the algorithm to other radars. Future research to increase accuracy could try using a 3D boxcar filter as was done in Ribeiro et. al., or discarding low-variance features in preprocessing. In addition, test different numbers of class (instead of default 5) and different velocity threshold (instead of default 10 m/s) for grouping GS/IS clusters might also improve our results.

Project Code Link

Results Analysis

Seen from the above figure, both the K-Means and GMM are doing a better job than the traditional model. The GMM is doing a great job by achieving an accuracy of as high as 94.1% on this specific day!

To test how applicable this algorithm is, we apply this algorithm to 28 days of data from the SAS radar from late August to late September in 2017. Accuracy of three models: traditional, kmeans, and GMM varies with date is shown as below:

Results Analysis

Labels from an empirical model using the virtual height criteria and elevation angle measurements [Bland et al.; Burrell et al.] are obtained as “ground truth” for evaluation of our unsupervised machine learning algorithms. Results are also compared with the traditional method of GS/IS identification algorithm, the K-Means and GMM generally have a higher accuracy than the traditional method with GMM doing a better job than the K-Means method. One example of comparisons among four models on 2017-09-20 using data from all beams of the SAS radar is shown below, where red (blue) indicating IS (GS).

DataSet & Methodology

The data we are going to use is from SuperDARN HF radar observations from the Saskatoon radar. sas.db has been generated using functions (updated_sd_db_generator.py) adapted from davitpy from 2017-08-28 to 2017-09-27 to incorporate ground scatter and ionospheric scatter labels from empirical and traditional models as well as features we need for the K-Means and GMM. We chose the Saskatoon radar because it is one of a limited number of SuperDARN radars that provide reliable elevation angle where the backscatter was reflected. Elevation angle is the most reliable way to differentiate between IS and GS, but is not available on all SuperDARN radars, which is why it is important to have other classification methods.

Data with power less than 6 dB and unreliable elevation angle measurements are excluded from analysis. Eight features including the Doppler velocity, spectral width, power, beam, range gate, time, transmitted frequency, and elevation angle from all beams of the SAS radar on one day are used for the K-Means and GMM clustering.

Both the K-Means and GMM are highly sensitive to initial condition. In attempt to mitigate that the number of initializations was set to 50. In short, this forces the algorithm to compare 50 different centroid seeds before choosing the best one and proceed with the fitting. Additionally, the GMM was initialized with k-means in order to aid with conversion.

Even though we are only looking to separate the data into two classes, it was proven to be quite difficult to do directly. Instead we decided to fit the model to several clusters and then combine the clusters based on the fact that GS and IS have different velocity. In the end we used 5 clusters for the clustering portion of the algorithm, then the resulting clusters are grouped into two classes. Clusters with mean velocity less than 10 m/s are classified as ground scatter and larger than 10 m/s are grouped as ionospheric scatter. By learning several clusters before combining them based on their median velocity we are able to achieve superior performance to that of the previously known methods.

Database Link

Super Dual Auroral Radar Network Backscatter Clustering

Machine Learning Group 7

Backscatter echoes from Super Dual Auroral Radar Network (SuperDARN) high frequency (HF) radars can be categorized into different clusters according to where they are from. Echoes from the ionosphere are called ionospheric scatter (IS). Depending on the height of the ionosphere where those echoes are from, they could be meteor, E-region, or F-region backscatter, which are all characterized by large Doppler velocity and spectral width. In contrast, echoes scattered from the ground or sea are called ground scatter (GS), and are characterized by low Doppler velocity and spectral width.

A traditional method of separating IS and GS in SuperDARN data is given in Blanchard et al. [2009]. This method has been implemented in current versions of the SuperDARN software. This study classifies scatter based on a piecewise function of velocity, power, and spectral width. The velocity of IS is exponentially distributed, but their classification method is biased and often misclassifies low-velocity IS as GS. Ribeiro et al. [2011] attempts to improve on this method, using a “depth-first search” to cluster data in time and space, and then classifies the clustered data based on time and velocity using a decision tree. This method is shown visually to produce a less-biased exponential distribution of velocity for IS data, and therefore improves on the method from Blanchard et al. [2009]. Cousins and Shepherd [2012] applies the K-means algorithm to cluster data by only position and velocity. In this study, Cousins and Shepherd classify GS as any data cluster whose centroid has velocity magnitude < 100 m/s and standard deviation < 100 m/s. This criteria is used to reliably remove all GS and obtain only IS, but it is biased and misclassifies some low-velocity IS as GS as well. Bland et al. [2014] focuses on studying GS to determine atmospheric parameters, distinct from most previous studies that focus on IS. It uses elevation angle-of-arrival data, which is only reliably available from certain SuperDARN radars such as the Saskatoon (SAS) radar, to obtain ground-truth on whether data is IS or GS, which will be useful in this project for benchmarking. Burrell et. al. [2015] outlines an improved method of estimating elevation angle-of-arrival for SuperDARN radars, which is data that provides a reliable way of classifying GS and IS, but is not available on all radars. They examine a problem with previous methods of estimating elevation angle where signals assumed to be coming from in front of the radar are actually coming from behind it, which introduces great uncertainty into the estimate. Using the differing characteristics of signals coming from the rear vs. those coming from the front, they develop an algorithm that allows them to separate the two and produce better elevation angle estimates.

Our objective for this project is to categorize the ionospheric backscatter and ground scatter from the SuperDARN HF radar observations using machine learning algorithms with an emphasis on refining feature selection. Our unsupervised learning algorithms will take advantage of clustering and good feature selection, with the objective of improving backscatter classification compared to traditional methods.

VT SuperDARN

Home: About

Home: Homepage_about

Home: About

References

​

Summary

Results Analysis

Results Analysis

DataSet & Methodology

Super Dual Auroral Radar Network Backscatter Clustering

Machine Learning Group 7