Investigating the performance of neural network backpropagation algorithms for TEC estimations using South African GPS data

. In this work, results obtained by investigating the application of different neural network backpropagation training algorithms are presented. This was done to assess the performance accuracy of each training algorithm in total electron content (TEC) estimations using identical datasets in models development and veriﬁcation processes. Investigated training algorithms are standard backpropagation (SBP), backpropagation with weight delay (BPWD), backpropagation with momentum (BPM) term, backpropagation with chunkwise weight update (BPC) and backpropagation for batch (BPB) training. These ﬁve algorithms are inbuilt functions within the Stuttgart Neural Network Simulator (SNNS) and the main objective was to ﬁnd out the training algorithm that generates the minimum error between the TEC derived from Global Positioning System (GPS) observations and the modelled TEC data. Another investigated algorithm is the MatLab based Levenberg-Marquardt back-propagation (L-MBP), which achieves convergence after the least number of iterations during training. In this paper, neural network (NN) models were developed using hourly TEC data (for 8 years: 2000–2007) derived from GPS observations over a receiver station located at Sutherland (SUTH) (32.38 ◦ S, 20.81 ◦ E), South Africa. Veriﬁcation of the NN models for all algorithms considered was performed on both “seen” and “unseen” data. Hourly TEC values over SUTH for 2003 formed the “seen” dataset. The “unseen” dataset consisted of hourly TEC data for 2002 and 2008 over Cape Town (CPTN) (33.95 ◦ S, 18.47 ◦ E) and SUTH, respectively. The models’ veriﬁcation showed that all algorithms investigated provide comparable results statistically, but differ sig-niﬁcantly in terms of time required to achieve convergence during input-output data training/learning. This paper therefore provides a guide to neural network users for choosing appropriate algorithms based on the availability of computation capabilities used for research.


Introduction
Total electron content (TEC) estimations using the neural network (NN) technique have been done over many years with relative success (e.g. Hernàndez-Pajares et al., 1997;Tulunay et al., 2004Tulunay et al., , 2006Leandro and Santos, 2007;Senalp et al., 2008;Yilmaz et al., 2009). The main work in the application of this nonlinear technique involves finding a relationship between known input and output parameters using a relevant training algorithm. A training algorithm or learning function has the purpose of adjusting connection weights between input and output layers to achieve the desired result for the system under characterisation (Haykin, 1994;Rojas, 1996;Zell et al., 1998). This study undertakes an investigation of some training algorithms employed in TEC models by different research groups. Specifically, this is a comparative study of performance levels of some algorithms for feed forward networks only. Considered algorithms are backpropagation for batch (BPB) training, backpropagation with momentum (BPM) term, backpropagation with chunkwise weight update (BPC), backpropagation with weight delay (BPWD), standard backpropagation (SBP) and Levernberg-Marquardt backpropagation (L-MBP). The first five training algorithms are inbuilt functions within the Stuttgart Neural Network Simulator (SNNS) software (Zell et al., 1998;Reczko et al., 1998). The sixth algorithm implemented is a These algorithms mainly differ in the way that weights are adjusted; otherwise, backpropagation is essentially implemented during training. Comprehensive details of these algorithms with mathematical descriptions of weight adjustments and procedure of their applications can be found in Rojas (1996); Zell et al. (1998), and Demuth et al. (2009). For its simplicity, SBP algorithm has been widely used for TEC modelling and mapping in both feed forward and recurrent neural networks (e.g. Leandro and Santos, 2007;Yilmaz et al., 2009;Habarulema et al., 2007Habarulema et al., , 2009Habarulema et al., , 2010. Research has shown that different problems require different training algorithms and types of neural networks. For example, in space weather applications involving predictions that use solar wind data as inputs, recurrent networks have been found to be more desirable (Lundstedt et al., 2002;Weigel et al., 2002Weigel et al., , 2003Vandegriff et al., 2005;Lundestedt, 2006;Habarulema et al., 2009;Heilig et al., 2010). Other ionospheric parameters, such as the critical frequency of the Eregion (foE) and critical frequency of the F2 layer (foF2), have been predicted using feed forward networks (e.g. Cander et al., 1998;Cander, 1998;McKinnell, 2002;McKinnell and Poole, 2004;Oyeyemi et al., 2006). Radial basis function networks have also been used in the generation of TEC data for ionospheric TEC mapping, as well as short term forecasting of foF2 (e.g. Chan and Canon, 2002;Yilmaz et al., 2009). TEC modelled and forecasted results for models, which utilised the L-MBP algorithm, have also been presented (Tulunay et al., 2006;Yilmaz et al., 2009). This algorithm is credited for its time savings during NN training/learning processes (Jang et al., 1997). While neural networks have been widely applied to ionospheric data, few resources are available which compare the performances of different algorithms with specific emphasis on ionospheric parameter modelling. Although this paper does not investigate all training algorithms, it serves as a guide to neural network users who may want to apply different training algorithms to their datasets, especially for ionospheric applications.
NN models are developed using a similar dataset with different training algorithms and verified on statistically similar (not necessarily identical) datasets (both "seen" and "unseen") to assess their performance levels. Training was done on a 2.4 GHz PC with 2 GB RAM. For fair comparisons, the same architecture -comprised of one input layer (6 input nodes), one hidden layer (9 hidden nodes) and one output layer -was used. In this way, it has been observed that the number of epochs/iterations required to achieve convergence or generalisation is the main determinant in the choice of the algorithm, especially for time needed and computing capability available. Table 1 shows the approximate time and iterations determined for each algorithm to achieve convergence. BPB has the highest number of iterations and hence takes longer to give the optimum solution. This may be related to the learning rate, which contributes to the "speed" with which training takes place. In BPB, the learning parameter is divided by the number of training patterns, making it significantly small, and it could therefore slow the training rate of the network on all training patterns (Zell et al., 1998). Results discussed in this paper were obtained from NN models developed using the South African data from the Sutherland station (

TEC from GPS
GPS TEC data were derived from observations made by dual frequency receivers located at SUTH and CPTN, both in South Africa. The values were derived using the adjusted spherical harmonic analysis (ASHA) algorithm, which makes use of the mapping function that assumes the ionosphere to be a single layer of height 350 km . A total of 9 years (2000-2008) of data was derived over SUTH, of which the 2000-2007 dataset was utilised in the development of the NN models. Hourly TEC data for 2008 was used in the final verification of the models along with CPTN data for the year 2002. The ASHA algorithm is based on spherical harmonic expansion and was adapted from the Schaer (1999) global model to be used as a regional model. It uses data from a local GPS receiver network and was chosen to estimate single station TEC results for comparative purposes with ionosonde measurements . Full details of the TEC derivation procedure from GPS observations using the ASHA algorithm (and its validation with different data sources) are presented in Opperman et al. (2007) and Opperman (2007).

Physical and geophysical data parameters
TEC is modelled as a function of known physical and geophysical parameters which influence its variability. These are seasonal and diurnal variations, as well as solar activity and geomagnetic activity. Seasonal and diurnal variations are effectively represented by day number of the year (DOY) and hour of the day (HOD), respectively. The measure of the solar activity is represented by the sunspot number, while the magnetic A index values account for the geomagnetic influence on TEC. Quantitatively, the magnetic and solar activities determined by Habarulema et al. (2007) for South African GPS TEC modelling were used in this study. These are the average of the previous 4-months for the daily sunspot number and the average of the previous eight 3-hourly magnetic A index values (A8) derived from the archived K-index data recorded at the Hermanus Magnetic Observatory (34.43 • S, 19.23 • E), South Africa. Diurnal and seasonal variation representations, sunspot number, and geographic latitudes and longitudes have been used previously in the modelling of foF2 and TEC using neural networks (e.g. Cander et al., 1998;McKinnell, 2002;McKinnell and Poole, 2004;Leandro and Santos, 2007).

Selected functions required for training
Data representations for parameters which influence TEC variability along with known TEC values make up the training patterns of the form [(x 1 , t 1 ), (x 2 , t 2 ), ..., (x n , t n )], where x i is the training pattern consisting of inputs, and t i is the target corresponding to x i (where i = 1, 2, ..., n). The update and initialisation functions used for BPB, BPC, BPM, BPWD and SBP algorithms are topological order and randomised weights, respectively. The initialisation function randomly selects the weights which are real numbers. Presenting an input pattern x i from the training set generates an output o i which is different from the target output t i . The aim is to ensure that o i and t i are identical for i = 1, 2, ..., n with the help of a training/learning algorithm (Haykin, 1994;Rojas, 1996;Zell et al., 1998); the ultimate goal is to minimise the error function of the network defined as where N ne is the error function of the network. Once N ne has been sufficiently minimised for the training dataset, input patterns not known to the network are presented, and the network is expected to recognise whether the new input patterns are similar to the learned patterns. Once this condition is met, the network generates a similar output (Rojas, 1996). This interpolation process is the one referred to as the verification of the NN models. Therefore, training/learning involves finding the optimal combination of weights that allows the network function to approximate a given function, in this case implicitly through known input and target training examples (Rojas, 1996). Topological order update simply means that the presented input training pattern propagates forward from the input layer through the hidden layer until the activation reaches the output layer (Zell  , 1998). No pattern remap function was used. The purpose of a pattern remap function is to quickly vary the desired output of the network without changing the pattern files during training, and non-use indicates that no remapping was done and thus all presented patterns were trained (Zell et al., 1998;Reczko et al., 1998). In the implementation of these functions, a similar network setup of 6 input nodes, 9 hidden nodes and 1 output node was used during the training of the network. The same architecture was used for the L-MBP algorithm.

Interpolation results
It is known that neural networks interpolate well within the input space, and therefore the network is expected to reproduce the dataset that was used to train it with relatively good accuracy (McKinnell, 2002;Habarulema et al., 2007). Since the main aim is to compare the performance accuracies of the training algorithms on the training dataset, part of the training dataset can still be used in the final verification of the models. This can be referred to as the determination of the correct application of the neural network technique on a particular dataset. The overall verification of the models was performed using Sutherland (2003 and and Cape Town (2002) datasets. While it is expected that the network should perform very well for SUTH 2003 data (part of the training dataset), the differences between algorithm performances should be evident if present. Figure 1 shows the scatter plot for hourly GPS TEC and modelled TEC values using different NN training algorithms over SUTH for 2003. Correlation coefficient values indicate a slightly better performance by the L-MBP algorithm compared to the rest of the considered algorithms. The widely used algorithms (SBP and L-MBP) in ionospheric modelling provide improved interpolation of TEC estimates. Although the model was tested on the "seen" dataset (2003), it was noted that all training algorithms achieved over 90 % accuracy (over the entire dataset, denoted as "all hours" in Table 2) in estimating TEC, and their modelling results are highly comparable.    Figure 2 shows the computed RMSE values between GPS TEC and modelled TEC using the six algorithms considered at local sunrise (04:00 UT), midday (10:00 UT), sunset (16:00 UT) and midnight (22:00 UT) over SUTH in 2003. Superimposed on these plots is the GPS TEC variability at these respective times. Table 2 is a summary of the correlation coefficients computed using GPS derived TEC and modelled TEC (from 6 algorithms) for hourly data, 04:00 UT, 10:00 UT, 16:00 UT and 22:00 UT over SUTH in 2003. For the "seen" data, it was observed that the L-MBP algorithm performs well for local sunrise and midday TEC estimates. Figure 3 shows the scatter plot for GPS TEC and modelled TEC for the "unseen" data over CPTN in 2002. Table 3 shows the corresponding correlation coefficients for local sunrise, midday, sunset and midnight hours. RMSE values for similar periods are graphically shown in Fig. 4 with superimposed GPS TEC values. The following summarises the observations made from Table 3

and Figs. 3 and 4:
-In terms of correlation coefficients, all algorithms achieved over 95 % prediction accuracy. However, to draw a conclusion on the overall performance of any algorithm, more than one statistical technique is required. An example is the SBP algorithm, which gives a correlation coefficient value for 16:00 UT that is less than those for BPM and BPWD algorithms (Table 3), but gives the least RMSE value for 16:00 UT (see Fig. 4).
-During periods of low TEC variability (04:00 UT and 22:00 UT) the accuracy of all algorithms reduce. Although it is assumed that the ionospheric shell height is 350 km in the derivation of TEC data, which were later used in models development as target values, in reality the shell height varies with time and solar activity levels.
Given that 350 km is the typical day-time ionospheric shell height in mid-latitude regions, the reduction in algorithm prediction accuracy during low TEC variability periods could be related to the assumption of a constant value. During nighttime (in absence of ionisation) the shell height increases.
-At local midday and sunset the SBP algorithm gives improved TEC estimates compared to other algorithms.
For any specific time selected, the performance differences of all the algorithms are not significantly large (as quantified by correlation coefficient and RMSE values), apart from results shown in Fig. 2 where the L-MBP algorithm interpolates ex-ceptionally well at 04:00 UT over SUTH. What is quite distinct is that the performance of all algorithms on "seen" data (SUTH 2003) is lower than their corresponding performance on "unseen" data (CPTN 2002). Figure 5 shows the GPS TEC data that was used for the model development, along with daily sunspot numbers to also demonstrate the correlation between solar activity and ionospheric TEC behaviour. This figure reveals that there was missing data for 2001 and almost half of 2004. The years 2003 and 2004 were in the declining phase of the sunspot cycle, and the absence of 2004 data could have contributed to the observed low prediction accuracies of the NN algorithms for the "seen" data.

Extrapolation results
Daily TEC values for January, February and March 2008 were modelled using NN models developed with the SUTH 2000-2007 dataset. This was done to assess the generalisation capability of the investigated algorithms in extrapolation  of TEC data outside the temporal range used in developing the models. Figure 6 shows average diurnal TEC variations for January, February and March 2008 over SUTH. While the variability of TEC values generated by the L-MBP algorithm can be seen to be far from other values (for January and February 2008), the prediction performances cannot be easily infered from this figure. Computed RMSE values between derived GPS TEC and modelled TEC are shown in Fig. 7. Generally, BPC, BPM, BPWD and SBP algorithms are consistent in their performances for the three months. Presented extrapolation results show that BPM and BPWD provide better TEC estimates with L-MBP giving the least accuracy for January 2008. This is also reflected in Table 4, which shows the RMSE values between GPS TEC and modelled TEC data for the first 10 days in January 2008. Statistically, average values indicate better performance for BPM and BPWD algorithms.
There are observed fluctutations in performance levels for different algorithms. BPC and SBP have almost the same performance for February 2008, while L-MBP gives better TEC estimates for March 2008. At this stage our results are therefore inconclusive about the preferred algorithm for ex-  generated by the L-MBP algorithm, which gives the smallest RMSE value for 5 January 2008. The Kp index had a constant value of 3.7 from ∼06:00-14:00 UT with Dst varying in the range of 15 to −18 nT. During this period, the AE index (not shown) reached a maximum value of 619 nT from 60 nT, and remained highly variable even on the next day when TEC showed an irregular and declining trend. The next high Kp index observed is 4 when the Dst index was −26 nT at 23:00 UT. A magnetic substorm occurred on 5 January 2008 (Xu-Dong et al., 2010), which could have caused an increase in TEC compared to the previous 3 January 2008 TEC values. Maximum TEC of ∼27 TECU is observed at 12:00 UT on 5 January 2008 in contrast to a maximum value of ∼20 TECU at 10:00 UT on 3 January 2008. Additionally, the foF2 value over Grahamstown (33.30 • S, 26.53 • E), South Africa, increased from 5.5 MHz on 4 January 2008 to 7.325 MHz on 5 January 2008 at 12:00 UT, and later reaching a maximum value of 7.725 MHz at 12:45 UT. This is a difference of 1.825 MHz for 4-5 January 2008 compared to 0.625 MHz for 3-4 January 2008 at 12:00 UT. An investigation of storm time TEC variability over South Africa during a geomagnetic storm of 15 May 2005 reported TEC enhancement and attributed it to the travelling ionospheric disturbances (Ngwira et al., 2012). It appears that substorm activity may also lead to TEC enhancement at South African midlatitude stations; however, a specific mechanism responsible for the observed event in this paper has not been investigated yet.

Conclusions
This paper has presented results comparing performance levels of some backpropagation algorithms for ionospheric TEC estimations. Similar to other sources (e.g. Jang et al., 1997; Yilmaz et al., 2009), it is evident that the L-MBP algorithm requires the fewest number of iterations, compared to other algorithms, to achieve generalisation. The reported convergence time (∼3 min) is expected to reduce with improved computing capacity. For TEC modelling, the differences in accuracy between the investigated algorithms is not very significant. What is worth noting, though, is the time it takes each algorithm to achieve convergence or generalisation. This time can increase or decrease depending on the size of the dataset under consideration. This investigation was conducted using a dataset of ∼50 300 data points. Each algorithm can generally be used depending on user requirements and available resources. For small datasets, the MatLab based L-MBP algorithm is sufficient. With more computing power, the other training algorithms can be used to slightly improve the accuracy. BPM and BPWD algorithms appear to achieve the same accuracy in a relatively short period of time and may be advantageous over the SBP algorithm, which has been widely used for modelling various ionospheric parameters. According to Zell et al. (1998), the momentum term (in the BPM) leads to the computation of the new weight change using the old weight change (during training), thereby mimimising oscillations associated with SBP for narrow minimum area error surfaces. BPM is a simple modification of SBP which accelerates the training/learning process (Rojas, 1996), and this is clearly evident in Table 1 in terms of the time and number of epochs required to achieve convergence.