Data Source Importance Evaluation for Highway Networks: A Complex Network-Based Approach

Data collection technologies or data sources are critical for highway network management. However, due to the limitations on available management resources, determining the importance of these data sources is necessary to allocate these resources reasonably. This study proposes a complex network based method for evaluating the importance of multiple data sources in highway networks. This method includes mainly three steps. First, the busi-ness-data source relation will be identified and formulated for the highway network. Second, a business data source complex network is built from the previously identified business-data relationship. Third, an entropy weight method is used to compute and rank the importance of data source nodes by combining three indexes of degree centrality (DC), closeness centrality (CC) and structural holes (SC) computed based on the complex network. The proposed method is applied and illustrated using the highway network of Xuzhou City, Jiangsu Province, China. The results show that among the data sources, the most important data source is the continuous traffic survey station, followed by an automatic gantry-based station and vehicle detectors-based system. Discussions on the limitations, applications and future studies are provided for the proposed approach.


INTRODUCTION
The highway network is a critical infrastructure system supporting the development of national society and economy.To this end, effective management of highway networks is one of the essential tasks of highway management agencies around the world.For this purpose, highway management agencies have developed a variety of business models to fulfil the needs of highway network operations.The business models refer to the management tasks that highway management agencies have to perform to operate the highway network, such as performance measurement, incident detection or traffic volume prediction [1].All these business models play a vital role in the improvement of highway network operations.
The highway management business models rely heavily on the data collected in the highway network [2,3].To this end, many data collection technologies, or data sources, have been applied to highway networks around the world to collect and supply a large amount of data to support highway network management tasks [4,5].There has been a growing consensus that the application of such data will be inevitable for enhancing the effectiveness of highway network operations.
However, constructing and maintaining these data collection facilities, or data sources, requires a significant amount of resources [6,7].To this end, there is a need to allocate limited resources across these data collection facilities appropriately in terms of supporting highway network management tasks.Consequently, it is necessary to identify and hence rank the importance of these data sources [8].In doing so, highway management agencies can allocate limited resources based on this importance ranking order, and well-maintained data collection facilities will be able to generate normal highway network data that are beneficial for supporting highway network management tasks.
Therefore, the purpose of this paper is to provide a complex network based data source importance evaluation approach.In this approach, the critical part is to develop a complex network connecting the data sources and the business processes or tasks of the highway network.Then based on such a complex network, the importance of these data sources can be evaluated and ranked.For this purpose, three steps are included in the proposed approach.First, the business-data source relation will be identified and formulated.Second, based on the identified business-data source relationship, the business-data source complex network will be constructed by integrating these business-data source relations.Third, based on this complex network, the importance of data sources will be evaluated by integrating three conventional importance assessment indexes [9 -11], namely, degree centrality (DC), closeness centrality (CC), and structural holes (SC) [12,13], using an entropy weight method [14,15].
The rest of the paper is organised as follows.After a literature review section, the proposed method is presented in detail.Then, the proposed method is illustrated in a case study using the highway network of Xuzhou City, Jiangsu Province, China.Finally, discussions and conclusions are provided.

LITERATURE REVIEW
In this section, studies concerning data source importance evaluation and the relationship between businesses and data sources are reviewed and summarised.
Data source importance evaluation is not new and has been investigated in many fields.In social sciences, Hanson-DeFusco evaluated the importance of data sources relating to triangulation in social sciences in decision-making and planning evaluations [16].Also, Park et al. proposed a method to solve the problem of estimating node importance in knowledge graphs [17].In the medical area, Narayan reviewed and evaluated the national mortality data from the Civil Registration System (CRS), Medical Certificate of Death (MCCD), and Sample Registration System (SRS) in India, showing that the SRS data source scored the highest [18].In addition, Price and Burley assessed the choice of primary and secondary information sources and identified the main information sources for the potential use of current awareness in occupational diseases [19].When evaluating secondary data sources in epidemiology, Sorensen et al. believe that if the data source is very relevant to a specific research question, then this data source is very important and cost-effective [20].Hjørland explained why it is necessary to compare information sources with research frontiers.If findings on the research front change, the evaluation of information sources may also change [21].In information science, Hjørland discussed 12 different methods for evaluating information sources [22].In addition, in terms of the Worldwide Governance Indicators (WGI), Kaufmann et al. argued that the data source of business information providers is more important than other types of data sources because it provides more sample data [23].In transportation, Wood and Regehr tested the effectiveness of different axle load data sources in road design through a series of hierarchical analyses [24].In addition, Broach et al. affirmed the importance of the old "small" data sources in estimating the annual average daily bicycle traffic [25].Jiang et al. tested various combinations of electronic smart card data and global positioning system data in terms of bus travel time prediction [26].Despite these studies, research is still limited on the importance evaluation of data sources in transportation.In particular, the operation of highway networks has many data collection facilities, which makes it necessary to evaluate the importance of these data sources.
The relationship between businesses and data sources is critical for evaluating the importance of data sources, and many studies have been conducted over the decades.Yang et al. analysed the connection between the data-driven business of circular economy and data sources and used scatter plots to represent this connection [27].Khorashadizadeh et al. reviewed the knowledge map of COVID-19 management, constructing the relationship between various COVID-19 data and the concerns of the people [28].Wan et al. used knowledge graphs and big data analysis to integrate various maritime business and navigation service data [29].Kam et al. identified key supply chain locations and industry relationships through existing data sources of goods [30].Nguyen and Cao proposed a new sorting function to effectively measure the correlation between the XML data sources and a given query [31].Tok et al. developed a California freight data repository, connecting freight businesses with data sources and providing lookup tables for each data source for the convenience of the users [32].Tijssen et al. discussed the INDSCAL statistical model, revealing the structure of the relationship between multiple data sources and scientific entities [33].Through the above review, it can be found that there are studies exploring the relationship between businesses and data sources in many fields.However, as also can be seen, the studies are still limited when it comes to transportation.
In summary, data source importance evaluation has been investigated widely in many fields, while in transportation, in particular in highway network operations which involves a rich amount of data sources and business models, such studies are still limited.Therefore, this paper will construct a complex network connecting business models and multiple data sources, based on which a complex network based method is proposed to evaluate the importance of data sources.

PROPOSED METHOD
In this section, the proposed method is presented, including an overview and detailed descriptions of each step of the method.

Overview
The method framework is shown in Figure 1.According to Figure 1, the relationship will be first identified through analysing the businesses and data sources in the highway network, to construct the business-data source relation.After that, the business-data source complex network will be constructed after defining the nodes and edges of the network.Finally, three indexes are computed and integrated to evaluate and rank the data source's importance.

Business-data source relation construction
Recall that business models such as incident detection are essential for highway network management, and data items collected from each data source are vital for supporting these business models.Therefore, it is straightforward to capture the relationship between the business models and the data sources, i.e. the business-data source relation, such that the data sources can be investigated in the context of highway network management business models.
Business-data source relation is the connection that relates business processes to data sources in the highway networks.In this relationship, the businesses refer to the management tasks that highway management agencies have to perform to operate the highway network.For each business process, the process can be decomposed into business data items which are the inputs required to implement the business process.Naturally, a business process may have multiple business data items.For example, detecting an incident might need traffic volume and speed simultaneously.In contrast, the data sources refer to the detecting techniques or facilities that can provide operational data on the highway network.Also, a data source might include multiple data source fields, and different data sources can generate the same data source field.For example, traffic volume can be collected from continuous traffic count stations or electronic toll collection systems, and a count station can also collect volume, speed, or vehicle types at the same time.
After breaking the business processes into business data items and the data sources into data source fields, business data items to data source fields can be related, formulating a relationship as depicted in Figure 2.There will be a multitude amount of such relationships, and combined, all the relationships constitute the foundation for structuring the complex network to be used for data source importance evaluation.It is worthwhile to mention that these relationships are critical for the proposed approach in that they make it possible to evaluate the importance of different data sources in terms of highway network management tasks.

Business-data source complex network construction
As mentioned previously, establishing the business-data source relations provides a possibility of investigating the data sources in the context of business models.However, as can be seen in the previous section, for each business model, this paper can develop an individual business-data source relation diagram, and with the increase of business models, the diagrams will increase too, yielding the analysis almost intractable.Therefore, under this circumstance, this paper proposes to use a complex network to integrate these business-data source diagrams, i.e. to construct a business-data source complex network, to make the analysis of data source importance trackable.
Upon the above discussion, after obtaining the business-data source relations, this paper constructs the business-data source complex network through a process of integration.For a complex network, the nodes and the edges need to be defined first.In this paper, the node definition is straightforward, that is, 4 types of nodes, i.e. business node, business data item node, data source field node, and data source node, are defined according to the business-data source relation.Upon definition of the nodes, the connections in the business-data source relation diagram are then naturally defined as the edge, and the direction of the edges starts in an order of data source node, data source field node, business data item node and business node.Note that at this time all the edges have the same weight.Therefore, in order to simplify the complex network, this paper keeps only one edge between two nodes and defines the weight of the edge as the total number of edges between two nodes before the simplification.Upon this integration process, the schematic diagram of the complex network is shown in Figure 3. Intuitively, from a connectivity point of view, if the weight of an edge is high, then the nodes connected by the edge will likely be more important.

Centrality index computation
Upon construction of the business-data source complex network, instruments offered by the complex network theory can then be applied to evaluate the importance of the nodes.In this paper, three centrality indexes can be used to evaluate the importance of data source nodes, including degree centrality (DC), closeness centrality (CC) and structural holes (SC).These three indexes are defined in Equation 1.
where C D (ν i ) is the degree centrality of node ν i , n is the number of nodes in the network, k i is the degree value of node ν i , indicating the number of edges associated with the node, C c (ν i ) is the closeness centrality of node ν i , d ij is the shortest distance from node ν i to node ν j , S i is the network constraint coefficient concerning the structure hole, p ij is the proportion of the number of connections between node ν j and node ν i to all connections of node ν i , and q is the common neighbours of nodes ν i and j and is not equal to i and j.
In a complex network, degree refers to the number of adjacent edges of a node, and degree centrality of a node refers to its centrality level among neighbouring nodes directly connected to it.If the degree centrality of a node is higher, the importance of the node is higher.Closeness centrality considers the average length of the shortest path from a node to other nodes.If the closeness centrality of a node is greater, the importance of the node is greater, indicating that it is located at the centre of the network.A structural hole indicates a disconnection between the nodes in the complex network.For node ν i structural hole index can be obtained by calculating the network constraint coefficient.The smaller the constraint coefficient, the more likely it is to form a structural hole [34].

Calculate index weight
The previous section provides three indexes for evaluating the importance of data sources, and each of these indexes can be applied individually.However, current literature shows that evaluation using multi-indexes integration is more effective than a single index [35,36].In this sense, determining the weights becomes important for integrating these indexes.Therefore, in this section, this paper computes the weights for integrating the centrality indexes computed previously.For this purpose, the centrality indexes need to be normalised first to eliminate the impact of different dimensions.For normalisation, the node set definition is V= {ν 1 , ν 2 , •••, ν n }, and the centrality indexes set is S= {s 1 , s 2 , ..., s m }, and a decision matrix X is formulated as Equation 2. Note that in X, DC and CC are regarded as benefit indexes, and SC is regarded as cost index.The formulas for normalising benefit indexes and cost indexes are shown in Equations 3 and 4, respectively.
max max min where x ij is the index value of node ν i under index s j , f ij is the normalised value of node ν i under index s j , x j min is the minimum value among all nodes under index s j , and x j max is the maximum value among all nodes under index s j .
Then, entropy is used as the weight for index integration.For this purpose, the information entropy E j of each index is computed as Equations 5 and 6.
where r ij is the proportion of the standardised index value of node ν i to the total standardised index values of all nodes under index s j .
Finally, the weight of each index is computed using the information entropy as Equation 7.
m E E 1 where ω j is the weights of each index s j , and m is the number of indexes, i.e. m=3.Note that for each index, the weight indicates the dispersion of the index, and greater weight indicates greater importance for the index in integration.

Indexes integration and importance evaluation
In this section, this paper first uses the index weights calculated in the previous step to calculate the relative distances between different data source nodes and the most important data source nodes, as in Equation 8.The smaller the relative distance, the more important the node is.(8) where H i is the weighted sum of the relative distances, R i is the maximum weighted relative distance, f j + is the most important data source node under index s j , and f j -is the least important data source node under index s j .Note that max( ) min( ) is for benefit indexes and min( ) max( ) Then the integrated index Q i , or the comprehensive relative distance, is computed as Equations 9 and 10. min max min max where H + is the largest weighted sum of the relative distances, H -is the smallest weighted sum of the relative distances, and σ is the adjustment coefficient set as 0.5.
Finally, the importance of data source nodes is ranked according to Q as Equations 11 and 12.Note that smaller T i ' indicates a closer relative distance between the node and the most important node, meaning that the node is more important.( , ,..., )

CASE STUDY
This paper uses the highway network in Xuzhou City, Jiangsu Province, China for the case study.

Highway network introduction
The selected highway network in Xuzhou City has 8 national roads and 20 provincial roads, and by the year 2022, the total mileage was 1,590.6 kilometres.The density of the highway network is 141.29 kilometres per 100 square kilometres or 17.62 kilometres per 10000 people.This highway network constitutes the backbone of the surface transportation systems in Xuzhou city.
On this highway network, 16 major business processes or business tasks, with the business ID (identification) denoted by D 1 , …, D 16 , are conducted shown and explained in Table 1.To support these businesses, 6 major data collection methods or data sources under evaluation, with the data source ID denoted by A 1 , …, A 6 , have been established as shown in Table 2.According to Table 2, a continuous traffic survey station collects real-time vehicle flow information to form a large amount of traffic data.A vehicle detector-based system collects data such as vehicle speed, occupancy rate, and axle load through vehicle detection equipment.A manual toll collection system collects vehicle information when collecting the toll manually for a vehicle.An automatic gantry-based station collects and stores the vehicle data when collecting the toll automatically without stopping.Video-based vehicle detection system collects traffic data through real-time video analysis.Basic information on emergency bases mainly includes the type and amount of emergency resources, emergency base identification number, and geographical location of the emergency base, for supporting the operation of emergency resource scheduling.

Business-data source relation construction
As shown in Tables 1 and 2, after analysing the business processes and data sources in the Xuzhou highway network, the business-data source relation will be constructed.In doing so, this paper breaks the business processes into business data items first and breaks the data sources into data source fields, as shown in Tables 3 and 4. Note that the business data item indicates the conceptual or logical data item that will be used in the business model which can be found by examining the data flow of conducting the business model, and the data source fields indicate the physical or actual data field that can be collected from each data source.A business data item and data sources field are different in that a business data item might be directed to multiple data source fields, i.e. a business data item might be supported by multiple data sources.After the breaking process, the definitions of the categories and the items in each category are listed for data source fields and business data items in Table 3, with the items denoted by B 1 , …, B 87 and C 1 , …, C 24 , respectively.As shown in Table 3, there are 24 business data items and 87 data source fields in total.In addition, the explanations of these items are listed in Table 4.After obtaining the business model, business data item, data source, and data source fields, as listed in Tables 1-3, the business-data source relations can be constructed.For this purpose, this paper will set up three connections, including the business-to-business data item, the business data item to the data source field, and the data source field to the data source.For these three connections, it is straightforward to see that the connections between business model and business data item and between data source and data source field are the natural results of the breaking process mentioned previously, and therefore, the development of the business-data source relationship relies on essentially the connection between the business data item and data source field.In this study, the authors examined extensively all the business data items and data source fields and hence established the connections manually.
Using traffic volume measurement as a typical example, the relationship is shown in Figure 4.As can be seen in Figure 4, the first column to the left is the name of the business (D 1 ), and the next column is the business data items (C 1 , ..., C 10 ) required for computing this business, i.e. measuring the traffic volume.For each business data item, the required data source fields are shown in the third column to the left, and the fourth column gives the data sources that will provide these data source fields.Note that each business data item will link to multiple data source fields.
It can be seen that this relationship is complex, and for each business listed in Table 1, such a diagram can be formulated.Together, this paper will integrate these diagrams into a complex network for supporting data source importance evaluation.

Business-data source complex network construction
After generating all the business-data source relationship diagrams for the businesses listed in Table 1, this paper can construct the business-data source complex network as shown in Figure 5 through integrating all the business-data source diagrams.Note that in the business-data source diagram, each edge or connection will connect two items such as business model, business data item, data source, or data source field, and essentially, during the integration process, this paper will merge all the edges in all the business-data source relationship diagrams together concerning the starting and ending items of each edge.Consequently, all the business models, business data items, data source fields, and data sources, i.e. nodes, are displayed in a single diagram, with the integrated edges connecting all these four types of nodes.In this way, a complex network can be formulated, based on which conventional instruments such as centrality measures can then be applied to evaluate the importance of the nodes.It can be seen clearly that in this complex network, some nodes show more complex connections to the rest of the nodes, implying potentially greater importance for these nodes in the network.
It is also worthwhile to mention that, based on such a complex network, it is possible to evaluate the importance of all types of nodes, while in this paper, only nodes representing the data sources are evaluated, i.e. nodes A 1 to A 6 , as denoted by yellow.

Centrality index computation
After establishing the complex network, the steps indicated in the flow chart of the proposed method can be followed, as shown in Figure 1.According to the flow chart, based on the complex network built in the previous section, in this section, the central indexes for the data source nodes can be computed and ranked as shown in Table 5.It can be seen that first, all three indexes indicate that continuous traffic survey station is the most important data source for the highway network.It is not surprising since this data source supplies a rich amount of data on the vehicle passing the detection zone.In fact, this type of data collection facility is the most important source of monitoring the operation of highway networks.In addition, there are also differences in the results for these different indexes on the importance of data source nodes, which indicates a need to use the entropy weight method for index integration.

Indexes integration and importance evaluation
Data source importance is further investigated by integrating the three centrality index-based measures in this section.For doing so, the three data source node importance evaluation indexes are used to form the attribute set, i.e.S= {s 1 , s 2 , s 3 } = {DC, CC, SC}.An initial decision matrix is constructed as Equation 13based on the results calculated for each index in Table 5, which is normalised to a standardised decision matrix.Then, the entropy weight method is used to calculate the weights of each index.According to the method discussed previously, the weights of DC, CC and SC are found to be 0.3419, 0.3269 and 0.3312, respectively.Finally, the importance ranking results of the data source nodes are obtained by integrating the three indexes, with results shown in  From Table 6, it can be seen that the continuous traffic survey station is the most important data source in the Xuzhou highway network, followed by the automatic gantry-based station and vehicle detector-based system.This finding is consistent with the findings based on individual indexes in that continuous traffic survey stations are the most important data source in supporting highway network operations.In addition, in terms of the degree of importance, the Q values directly represent the proportional relationship of the importance of the data sources.To this end, it can be seen that the importance of a continuous traffic survey station is about 4 times that of an automatic gantry-based station and vehicle detector-based system, about 7 times that of a traffic video-based vehicle detection system, and about 10 times than that of manual toll collection system and basic information of emergency base.Based on the above observations, it is clear that the highway management agency is expected to invest more resources to maintain the continuous traffic survey stations in the highway network.

DISCUSSION AND CONCLUSION
Highway systems are important for supporting the social and economic development of the society.With the advancement of information technology, many data sources are formulated based on various vehicle detecting techniques or systems in highway networks, and a rich amount of operational data have been collected and applied in highway network management.Under this circumstance, there exists a need for allocating appropriately the resources for constructing and maintaining such facilities in terms of supporting the highway network management tasks, generating consequently an issue of identifying and evaluating the importance of such data sources.Targeting this issue, this paper proposes a complex network based evaluation method.In this approach, the businesses and data sources in a highway network are systematically analysed, after which these businesses and data sources are broken into business data items and data sources fields, respectively.Then the relationships between the business, business data item, data source, and data source field are formulated, based on which a business-data source complex network can be built through integrating such relationships.Data source importance evaluation is then conducted by using separately and collectively the centrality indexes obtained from the complex network.The application of the proposed complex network based approach is illustrated using a highway network in Xuzhou City, Jiangsu Province, China.For this highway network, this paper analysed the relationship between highway network management businesses and data resources and built a business-data source complex network accordingly.The importance of data source nodes was then computed and ranked based on the complex network.The results show that among multiple data sources in the highway network, the continuous traffic survey station is the most important data source.The proportional importance is then shown for these data sources.Upon the results, it can be inferred that continuous traffic survey stations are expected to receive more attention when allocating construction and maintenance resources.
Discussions are provided concerning the limitations, applications, and future studies on the proposed approach as follows.First, as shown in this work, the construction of the business-data source relation is critical for building the complex network for importance analysis.Currently, this relationship is built based on manual investigations of the business models and data sources, which is time-consuming and requires a high degree of familiarity of the analyser in the highway network management field.This manual solution limits the application of the proposed approach to other highway networks.
Second, the application of the proposed approach can be multi-fold.As shown previously, the identified importance index for data sources can be directly applied to generate the plan for data collection facilities construction and maintenance.In addition, business tasks, in particular, business data items can be investigated to show their relative importance for highway network management.Moreover, the complex network developed in the proposed approach can be applied to optimise the design of data collection facilities in terms of supporting highway network management tasks.
Finally, multiple future work can be conducted.Artificial intelligence technology can be applied to mine and build relationships from the documents concerning highway network business tasks and data sources.This will help improve the efficiency of developing and applying such complex networks onto more complicated highway networks.In addition, more centrality indexes and other weighting techniques can be explored under this complex network framework.Moreover, the proposed approach can be applied and tested to allocate resources for highway network data collection facility construction and maintenance.

Figure 2 -
Figure 2 -Diagram of business-data source relation

Figure 3 -
Figure 3 -Schematic diagram of the complex network

Figure 4 -
Figure 4 -Diagram for traffic volume and data source relation

Table 1 -
Business Models Summary

Table 2 -
Data source summary

Table 5 -
Ranking of data source node importance under three evaluation indexes

Table 6 -
The importance ranking of data source nodes