Lumbar Moment Estimation of Engine Drivers in a Static Sitting Working Position by using Multiple Linear Regression

The paper presents a simpler and more precise model of lumbar moment prediction based on single linear, or multiple linear regression with two predictors. The body mass index (BMI) as the predictor contains two of the most important static anthropometric measures, height and mass, whose separated role in lumbar moment prediction, as well as their mutual relations, have not been sufficiently investigated. This study analysed mass, height, age and BMI as lumbar moment predictors, on a sample of 50 Croatian male engine drivers. Two prediction models were compared: (1) multiple linear regression prediction with mass and height as predictors; (2) single linear regression with mass as the only predictor. Results confirmed the multiple regression model as the best one (R 2 = 0.9015 with standard error of prediction 1.26), having the mass of the best predictor. Surprisingly, the single regression model with mass as predictor explained only 3.6% of lumbar moment variance less than multiple regression model, with related standard error of prediction 1.46 (mean percentage value of the relative error was only 0.8% higher than at multiple regression model). The obtained findings suggest high prediction potential of mass and height that should be verified on various subject samples


INTRODUCTION
In the transport of both cargo and passengers, depending on the local capacity and condition of the railway infrastructure, different measures of implementation are being investigated in different EU member states, in order to motivate people to switch to public transport from using their own car [1], which is one of the most important measures to meet the challenges of the 2050 Green Deal [2].The levels (Level 0-Level 3) and operation modes are fundamental concepts [3] of the ETCS system (European Train Control System).As long as the engine driver has to be constantly in the cabin of the locomotive or railcar (current practice is up to and including ETCS Level 2), the estimation of the physical workload in a static sitting working position is a relevant research topic.
The engine drivers' task demand depends dominantly on the change of speed, unlike road vehicle drivers who can at the same time change both gears and direction.Fuller [4] recognised the fact that the choice of speed is the primary solution of the problem of keeping the difficulty of the task within the selected limits, and the limits are subject to the motivation influences.Therefore, the choice, maintenance or change of speed is the most important task in the engine driver's performance, with the most important possible impact on the safety of the traffic process.The success of the engine driver's performance as well as his physical effort measured by the amount of lumbar moment will be influenced by, among other factors, whether the

Lumbar Moment Estimation of Engine Drivers in a Static Sitting Working Position by using Multiple Linear Regression
Davor SUMPOR 1 , Sandro TOKIĆ 2 , Jasna LEDER HORINA 3 , Mislav Stjepan ŽEBEC 4  grouped related commands related to change of speed (multipurpose controller for manual serving of the braking module and/or accelerator module, as well as "dead-man" function) is placed in the maximum or normal hand reach.According to the guidelines of the Rail Safety and Standards Board from Great Britain [5] the actual priorities during scientific research that can be related to safety include, among other important factors, the design of the driver's cab.
The hypothesis investigated in this paper states that there is a simple and more precise model of lumbar moment prediction (than the previous ones) based on single or multiple linear regression method, which contains a maximum of two predictors.
In a previous study [6], prediction of the lumbar moment value M ly at the level of vertebra L4/L5 with the predictor of the body mass index (BMI) has been determined in the hypothetically most unfavourable static seating working position of engine drivers, as a measure for the driver's physical workload.Regression equation M ly (BMI) has an acceptable correlation dependence of medium strength, with correlation coefficient r = 0.764 (Equation 1).

(
) . .M BMI BMI 0 7847 3 0787 The most unfavourable static seating equilibrium working position (with a completely upright torso and head) implies both engine driver's hands stretched out horizontally which mimic the hypothetical most unfavourable situation when, in reality in Croatia, in older models of locomotives and railcars, frequently used and manually served commands are placed within the maximum arm reach.Therefore, regression Equation 1 refers to the stick biomechanical 2D model of an engine driver in sagittal plane, in the least favourable hypothetical static sitting working position.
It is widely recognised that the performance and all types of driver workload are related, and this relationship is individual and non-linear.According to the open dynamic TCI model of "task demand -driver's capability" interface [4], if "human factors" significantly affect the "task demand" and consequently the engine driver's performance through the speed selection, it is obviously related to the availability and placement of frequently used speed change commands on the control panel, such as the above-mentioned multipurpose controller for manual serving.
Therefore, a possible reason for the changed driver's performance level in the least favourable hypothetical static equilibrium of the driver's working position may be physical load of the driver due to his own increased mass.The excessive body mass in relation to the engine driver's height is expressed by the amount of BMI (Equation 2).The BMI contains two most important static anthropometric measures, the standing height h and the body mass m.BMI is not a best choice for a predictor, because the amount of BMI is not related to the distribution of mass in the human body.

BMI h m
However, the separate or possible joint influence of height and mass in the static sitting position on the amount of lumbar moment has not been sufficiently investigated, and that fact is the second reason why BMI was not the best choice for a predictor in the previous research using the single linear regression method.BMI contains the two most important static anthropometers, mass and height, which are not functionally related, but are clearly stochastically related with positive moderate correlation.Therefore, the selection of mass, height and BMI as predictors will be analysed and discussed in the chapter under the title "Selection and Analysis of Potential Predictors".
Our own previous research [7] confirms that after the age of 30, the percentage of obese and overweight engine drivers (BMI ≥ 25) increases significantly (after the age of 30, the BMI percentage is in the range between 72.3% and 92.31%, which depends on the age group).The studies carried out during 2011 in Slovenia [8], targeting 245 employees at the railways, indicate 66.9% of overweight or obese workers, with no significant differences between the two groups of workers regarding the nature of their work (white or blue-collar workers).In a study on 118 subjects, the influence of BMI on seat comfort was studied [9].Relationships between BMI and the size of the backrest contact surface, pressure change and maximum pressure were determined.The reason for the significantly higher percentage of overweight and obese engine drivers in Croatia can be related to the shift work schedule, which implies irregular shifts that include night work, as well as unhealthy dry food.It is important that BMI as a predictor candidate has an adequate variability (mostly analysed by the coefficient of variation CV), otherwise it cannot correlate properly with other variables, including the lumbar moment.
Back in 1984, it was established that the lumbar moment and intra-abdominal pressure can be used to estimate human physical effort in static body positions [10].This additional study had been based on previously measured static and kinematic anthropometric measurements for 50 male engine drivers from Croatia [6].In the simplified calculation i.e. prediction of the lumbar moment Mly, predictors can be variables mass, height, BMI and age.Instead of additionally measuring several segments of arm length (hands, forearms and upper arms) for 50 research participants and subsequently calculating segmental masses for hands, forearms and upper arms (see Chapter 4), it is possible to obtain an even simpler calculation of the predicted M ly by using a regression equation model with one (m or h) or two predictors (m and h), while measuring these predictors is simple and fast.
Measurements of mass and height can be performed with a calibrated digital scale with mechanical altimeter Tanita WB 3000 (digital scales accuracy class III) in the Laboratory for Applied Ergonomics in Traffic and Transport [11].The above-mentioned device is not expensive and it enables a relatively quick measurement of the two most important anthropometric measures, mass m and height h, which were also predictor candidates.The latest trends indicate the use of more modern but also more expensive devices such as 3D body scanners [12].Anthropometric measurements obtained by using this measurement are more accurate, precise and repeatable, and can be applied for the ergonomic design of a sitting workplace.

METHOD
Participants.Fifty (50) male engine drivers from Croatia, in the age range from 28 to 53 (8% were left-handed) participated in anthropometric measurements.In 2012, when the study was conducted, the engine drivers population in the Republic of Croatia included 1,357 male engine drivers (there were no female engine drivers at all).Participants have been selected by taking into account regional representativeness of the population, so engine drivers were measured in seven Croatian railway centres in various regions: Zagreb, Karlovac, Varaždin, Slavonski Brod, Knin, Rijeka and Zadar.
Instruments.Anthropometric measurements have been performed with large and small Lafayette Instrument anthropometers (accuracy class II, sliding anthropometric callipers designed to measure a straight-line distance between two landmarks with an accuracy of 1 mm) and calibrated digital scales with the mechanical altimeter Tanita WB 3000 (digital scales accuracy class III).
Procedure.The anthropometric measurements were conducted at the same time of the day, between 11:00 a.m. and 3:00 p.m.It was always done by the same person, one of the co-authors of the paper.All length-based anthropometric measures were measured with the precision of the whole number of centimetres, while mass m was measured with the precision of one decimal place (in kilograms).For certain length-based anthropometric measures (hand, forearm, upper arm), the measurement procedure depended on the hand dominance: for right-handed subjects, the anthropometric measures were measured on the non-dominant left side of the body, whereas for the left-handed subjects the anthropometric measures were measured on the non-dominant, right side of the body.

Data analysis design.
In order to reach the proposed research goal, the following data analysis procedures were conducted: 1) Potential predictors of lumbar moment values were analysed in terms of their distribution features that constitute the presumptions for single and multiple regression -variability and distribution normality (with the focus on asymmetry) 2) Potential predictors of lumbar moment values were selected by considering their distribution features and the correlation coefficients between predictors and lumbar moment values, but also among the predictors (with previous non-linearity relation checking by Pearson and Spearman correlations comparison) 3) Calculation of lumbar moment values via single and multiple linear regression equations, based on the selected predictors 4) Calculation and mutual comparison of two indicators of regression fit: (i) standard error of prediction and (ii) mean percentage value of the relative error of calculated lumbar moment.

ANALYSIS AND SELECTION OF POTENTIAL PREDICTORS
Before proposing two optimal predictors for lumbar moment by using the correlation matrix of all available variable candidates, distribution-based assumptions for single and multiple regression have been verified.Distribution-based statistics that have been analysed are mean M, standard deviation sd, coefficient of variation CV calculated via Equation 3, distribution normality test significance and skewness test statistics.They are presented in Table 1 and related explanations are given below Table 1.
Positive values of the skewness coefficient (SC) shown in Table 1 mean that the distribution concentrated around central tendency measure (median C) has a certain number of significantly higher results, i.e. a long tail on the right side of the distribution.Therefore the mean value M will be greater than the median value C for the variables m, h and BMI.The negative SC value shown in Table 1 for the age variable means that age shows a negatively asymmetric distribution.
Standard error of the skewness coefficient SE SC calculated (for each variable x) in Equation 5depends only on the sample size, so it has the same value for all analysed variables SE SC = 0.337 as shown in Table 1.
In order to prove that the obtained SC reflects the actual existing distribution skewness in the whole population of Croatian engine drivers, we need to test SC significance.To test the related null-hypotheses, z-statistics was used: Since z SC is distributed according to standard normal distribution, the obtained skewness of the analysed variable will be significant on 95% or 99% level when the calculated z SC in Equation 6 is absolutely greater than 1.96 or 2.58, respectively.An additional reason to test asymmetry significance is to find the source of possible distribution normality deflection of the observed variable.If the asymmetry significance has been confirmed, it can cause problems in correlation calculus and interpretation when correlated variables show the opposite significant distribution asymmetry.
The results of the analysis from Table 1 show several distribution characteristics that are relevant for correlation and regression calculation: -CV values indicate good but not optimal variability for correlation calculation, especially for height h.
Namely, reduced variability will cause reduced covariation and lead to lower correlation coefficients [13].Therefore, it is expected that bivariate correlations of the height variable will be somewhat lower, since its variability is approximately four times smaller than the variability of other variables -All variables but age show normal distributions (tested by Kolmogorov-Smirnov, Shapiro-Wilk and Chisquare tests), which suggests possible distortion of correlational coefficients only for the age-variable.
Nevertheless, the consequences of this distortion are confined only to correlation of age with other variables, but mostly with BMI and m, which show the opposite distribution asymmetry in relation to age distribution [13].
Table 2 shows that the variables of mass m and height h (expressed by the amount of the coefficient of variation CV) behave similarly for different groups of respondents, regardless of their sex, age and occupation.The variable h is very homogeneous across all subjects (CV is in the range from 2.97 to 3.66), while m has a significantly greater variability (in relation to height h) as indicated by the amount of the coefficient of variability CV (CV is in the range of 15.4 to 21.15).These findings on high height homogeneity across different samples will be discussed later in the paper.Table 3 shows the integrated Pearson-Spearman correlation matrix of all variables that are candidates for lumbar moment predictors: mass, height, age and BMI.The Pearson correlations are shown above the main diagonal of the matrix, while the Spearman correlations are presented below the diagonal.This comparative presentation of correlation coefficients (Pearson vs Spearman) among predictor candidates is useful to quickly check whether the dependence between any x and y variable is linear: since Spearman's Rho is more appropriate than Pearson's r for non-linear relations/dependences in all non-linear relations, Spearman's Rho should have larger values then Pearson's r [16].Comparison of Pearson's r and Spearman's Rho (or r s ) for any 10 relations in the correlation matrix suggests that every Rho is smaller than the related r, which means that all relationships between the predictors variables candidates are linear.Therefore, the linear relationship presumption for regression calculation has been completely obeyed.
Since the predictors selection in multiple regression demands (i) the highest correlation between predictor and criterion variable and (ii) the lowest correlation between the two possible predictors, the above-mentioned correlation matrix suggests the following: the best lumbar moment predictor is mass, the worst predictor is age, so the second predictor should be selected among height and BMI a better choice for second prediction is height since BMI almost completely correlates with the mass-predictor (r = 0.904) and it will not contribute separately to the prediction of M ly (which is not the case for the height-variable that shares less than 30% of variance with mass).By taking into account the fact that the previously mentioned reduced variability of the height-variable suggests its correlations might be only higher with the approaching variability level of other variables, the obeyed distribution normality and linear relationship lead to conclusion that the optimal candidates for the two-predictors multiple regression prediction of M ly (x 1 ,x 2 ) are mass and height.

CALCULATION OF THE INDIVIDUAL LUMBAR MOMENT VALUE
This chapter is taken in abbreviated form from the previous paper [6], with the aim of briefly explaining how the amount of lumbar moment for each individual male engine driver can be calculated in sagittal plane, according to Figure 1, in the least favourable hypothetical static equilibrium working position, with both arms horizontally extended in the zone of maximal reach.
Knowing the height and mass of the human body, by using the Donskij-Zacijorskij method [17] and multiple regression Equation 7, it is possible to calculate the amounts of single segmental masses mi for hands, forearms and upper arms in the respondents, i.e. engine drivers.

m B B m B h
The positions of mass centres m i were calculated according to Table 4, measured from the upper border of the body segments [18].
Body segment gravities F gzi parallel to the z axis (hand gravity F gs , forearm gravity F gp , upper gravity Fgn and torso gravity F gt ) have been calculated according to Equation 8, and the amounts of lumbar moments M ly according to Equation 9 have been obtained by the reduction of all F gzi from segmental masses m i into the origin of the xz coordinate system (Figure 1).
. F m 9 81 In compliance with the considerations of Mairiaux et al. [10] or Muftić et al. [20], the origin of the zx coordinate system represents also the point of reduction L4/L5 of the lumbar moment to the level between the fourth (penultimate) and fifth (last) lumbar vertebra in the mobile part of the spine viewed from above downwards.
For the purposes of the calculation explained in this chapter, it is important to note that the following static measured anthropometric measures were used as input variables: height, mass and anthropometric measures based on length (hand, forearm, upper arm).Segmental masses m i for the hands, forearms and upper arms were calculated using the multiple regression by using only the height and mass as predictors (Equation 7).The amount of lumbar moment calculated in this way will be mentioned several times in the following chapters under the name "calculated lumbar moment M ly based on the measured anthropometric measures".

RESULTS
Since we proposed mass and height as the best predictors for lumbar moment M ly , based on the data of the actual research, the next task was to calculate the regression equation for predicting M ly (m,h) with m and h in Equation 10: Coefficients b 0 , b 1 and b 2 are calculated in the next Equations 11-13:  whereby standardised coefficients β 1 and β 2 are fully determined by the correlation matrix shown in Table 3 and calculated in Equations 14 and 15 taking into account that y = M ly , x 1 = m, and x 2 = h.r r r r Applying the above formulas on the related values from Table 2 and Table 3 gives the target multiple regression for predicting M ly (m,h) with m and h (Equation 16 Such high prediction has small standard error s y.x1x2 = 1.26 and sends an important message: 91% of lumbar moment variance can be explained by optimal linear combination of mass and height.Visual illustration of that prediction gives us Figure 2. Separate contribution of both predictors to such a good prediction has been analysed through standardised regression coefficients β 1 and β 2 , i.e. β m and β h , respectively.Values of these coefficients (β m = 0.808 and β h = 0.225) clearly demonstrate that the mass contribution to M ly prediction is almost four times greater than the height contribution.This finding is not surprising when we take into account the values of the related Pearson correlations from Table 3 (r[M ly ,m] = 0.93) and (r[M ly ,h] = 0.666), but the M ly -m correlation itself and the related "prediction power" of mass is surprising.
To test the prediction of the lumbar moment M ly (m) only with mass, we performed a single linear regression procedure with the model M ly = b 0 + b 1 •m.The related calculus gave us the next regression (Equation 18): From two regression coefficients (b 0 =2.0662 and b 1 =0.2513),only b 1 is statistically significant (the related t-statistics and p-values are t(b 0 ) =1.513 with p = 0.137 and t(b 1 ) =17.598 with p < 0.01), which means that b 0 equals zero in the related engine driver population, while b 1 lies in the next 95% confidence interval: The standard error of single regression prediction is small s y.x = 1.46 and does not differ much from the multiple regression prediction standard error (s y.x1x2 = 1.26).This finding is expected from comparison of  the explained M ly variance by single prediction (r 2 = 0.865) and by multiple prediction (R 2 = 0.901), and requires discussion on multiple regression usefulness, when compared to single regression prediction.
To evaluate the contribution of the two regression models in the actual research (based on mass and height, and based only on mass) in relation to the previous one (based on BMI), we created Tables 5 and 6.
Table 5 shows a comparison of analytically calculated value of lumbar moment based on the measurement of static anthropometric measures [6] with the three above-analysed regression predictions for lumbar moment given by three different regression Equations 1, 16, and 18.
The column titled M ly presents the individually calculated values of the lumbar moment for each individual engine driver as described briefly in Chapter 3 [6].The column titled M ly (BMI) represents the prediction of the lumbar moment by using single linear regression Equation 1, with the variable BMI as predictor [6].The column titled M ly (m) represents the prediction of the lumbar moment by using the single linear regression Equation 18, with the variable m as predictor.The column titled M ly (m,h) represents the prediction of the lumbar moment by using the multiple linear regression Equation 16, with variables m and h as predictors.
For the lumbar moment values predicted in the regression Equations 1, 16, 18 in relation to the calculated lumbar moment M ly based on the measurement of static anthropometric measures [6], the mean percentage value of relative error E RM (shown in Table 6) was calculated based on the entire sample of 50 engine drivers (Equation 19).
E n In accordance with the results shown in  M ly (m,h) in relation to the single regression model M ly (m) has only 0.8% lower mean percentage relative error E RM .In addition, to predict the value of the lumbar moment M ly , it is necessary to measure an additional anthropometric measure, body height h.

DISCUSSION
The engineer's goals met good statistical presumptions in the measured sample of engine drivers and produced some interesting scientific findings in the field of anthropometry.
First, all anthropometric measures obeyed most of the statistical presumptions for linear regression calculation (interval or ratio measurement scale, linear relationship between measures, distribution normality and appropriate variability) and that enabled a valid interpretation of the obtained regression models.Only the height variable had quite a reduced variability (compared to the other anthropometric variables) and we might speculate on an even better model if that variability were higher.Nevertheless, data in Table 2 clearly state that the height variability is consistently several times lower than the mass variability across four various samples defined by sex and profession [6,14,15] and this feature seems not to be a local phenomenon [21,22].
Correlation matrix data (Table 3) mostly corroborate findings expected on calculation formulas, but also reveal two findings that were not so expected.The first one is non-significant correlation between h and BMI (despite the functional dependence of BMI on h) that might be explained by two arguments: Argument (1): inverse quadratic dependence of BMI on h with simultaneous linear dependence on m (which is linearly dependent on h), which are two antagonistic relations that produce dependence annihilation.Argument (2): the fact that Pearson's correlation detects only linear dependence, which is not present in the BMI-h relation.
Nevertheless, since Spearman's Rho did not detect a non-linear BMI-h correlation, it seems that the first argument gives a reasonable explanation.
Second relevant correlational finding is a surprisingly high M ly -m correlation (r = 0.93), which requires further measurements on different samples, since it sends one important message: in order to determine lumbar moment with a decent precision level, you need only the weight scale.
A high M ly -m correlation does not leave much space to any other anthropometric measure to contribute to the prediction in the multiple regression equation.That is why the mass-variable m predicted M ly (m) independently several times more than the height-variable (β m = 3.59•β h ), which leads to one logical question despite the high and significant multiple correlation (R = 0.949) and despite significant regression coefficients: why should we not predict M ly with a single linear regression by using the mass as the only predictor?
The additionally conducted single regression model for M ly (m) shown in Equation 18gave a slightly weaker M ly (m) prediction than the target multiple regression model M ly (m,h), shown in Equation 16.Namely, the multiple regression model (Equation 16) explained only 3.6% more M ly variance than the single regression model (Equation 18), thereby having only 0.2 lower standard error of prediction and only 0.8% lower mean percentage value of the relative error.The final decision on the choice between the two regression models will have to wait until they are tested on more different subject samples (different ages, bigger samples, more representative for the regular population), but will also depend on the measurement purpose and conditions.

CONCLUSION
These studies point to the fact that mass is the most important predictor of increased physical load on the driver in obese and overweight drivers in the least favourable hypothetical static equilibrium of the driver's working position (evaluated by predicting the lumbar moment M ly (m)), if frequently used commands served by hands are placed in maximum arm reach.
Another important conclusion is that we could calculate lumbar moment of Croatian engine drivers with significantly high precision by simply measuring mass and height of the drivers.That means that we could Promet -Traffic&Transportation. 2024;36(2):219-231.

Figure 1 -
Figure 1 -Two-dimensional stick model of the respondent in sagittal plane[19] Coefficients b 0 , b 1 and b 2 obtained on the sample of 50 engine drivers are statistically significant (the related t-statistics and p-values are t(b 0 ) = -3.669with p = 0.01, t(b 1 ) = 14.781 with p < 0.01, t(b 2 ) = 4.124 with p < 0.01) and their real values in the whole engine drivers' population lie in the next intervals, with a 95% probability: b 0 *: [-32.169,-9.395] b 1 *: [0.188, 0.248] b 2 *: [0.074, 0.214] Multiple regression Equation 16 predicts lumbar moment M ly (m,h) surprisingly well, which is obvious from statistically significant multiple correlation coefficient R calculated in Equation 17 (Pearson correlation between calculated values of M ly based on measured anthropometric measures and predicted values of M ly (m,h) calculated in Equation 16: R = 0.949, F = 215.002,df 1 = 2, df 2 = 47, p<0.001.Multiple correlation coefficient R is calculated in Equation 17:

Figure 2 -
Figure 2 -Relationship between the calculated values of lumbar moment M ly based on measured anthropometric measures and the ones predicted by mass and height through multiple regression M ly (h,m)

Table 1 -
Descriptive statistics for testing single or multiple linear regression assumptions for male engine driversSkewness coefficient SC is calculated (for each variable x) in the next Equation4.Calculated values of the SC as shown in Table1can be positive or negative, if the variable distribution is asymmetric.

Table 2 -
[6,14,15]of the variability of m and h for subjects of different ages, different sex and different occupations[6,14,15]

Table 4 -
[18] centres in the percentage of the function of the body segment length[18] *measured from the upper border of the segment

Table 5 -
Comparative representation of several lumbar moment calculation models and the related proposed predictors of the analysed Croatian engine drivers sample

Table 6 -
Calculation parameters of the mean percentage value of the relative error

Table 6 ,
for this sample, the lowest mean percentage value of the relative error E RM is for the lumbar moment value M ly (m,h) predicted by the multiple regression Equation16, using the variables h and m as predictors.It should be emphasised that, when compared, the multiple regression model