4. Conclusions
Based on the observed results from the full data collection period, the study team is able to draw a number of conclusions regarding the use of truck fleet GPS data for the measurement of travel time across international border crossings, both in general terms and in terms specific to the Otay Mesa Crossing. These conclusions are offered in the sections that follow.
4.1 Data Validity
As discussed earlier in this report, GPS-based fleet tracking systems have been in commercial use for some time. They have gained the trust of users as an accurate means for locating assets for the purposes of managing operations and providing customers information about shipment status. As such, the team did not consider it necessary to examine the accuracy or measurement precision of the device used for the test. It is a commercially available solution that relies upon standard components and communications methods.
This test focused on the degree to which such devices can adequately characterize travel time on a specific roadway network through a commercial vehicle border crossing. The secondary goal was to assess the viability of GPS-generated travel time data as a means to support real-time traveler information.
With respect to the primary goal, the devices and the processing methods employed to capture and analyze the GPS data succeeded. The combination of reverse geo-coding, which is a commonly employed practice in the use of GPS-based roadway speed monitoring, and the segregation of the border trip into geo-coded segments has proven to be a viable method. There are, however, some considerations that must be taken into account as border stakeholders contemplate its use on a regular basis.
First, the figures reflected in this report are based on a marginally representative sample. While the number of trips and their distribution throughout the day consistently exceeded the three to five percent threshold commonly required for probe vehicle-based traffic monitoring, the carriers that participated are similar to only a portion of those that cross the border. Both carriers are highly reputable companies that are C-TPAT qualified, and conduct a significant portion of their cross-border trips as FAST trips. As such, the test sample was not random, and may not be representative of the general population. However, it is important to note that all FAST trips conducted by all carriers experience the same travel patterns, and occupy the same portions of the infrastructure. In that regard, the figures shown here can be considered representative for all carriers with similar profiles.
The second consideration that must be taken into account is the location and length of the travel time measurement zone. At the onset of the project, carriers indicated to the study team that queues often extend to the Bellas Artes/Calle Doce intersection. However, the data gathered during the test suggests that traffic often flowed quite freely from there to the sorting gate upstream of the Mexican Customs Export screening point.
The implication is that care should be taken in deciding at what point the border crossing travel time clock should start. It is worthwhile to consider beginning measurement of an individual trip after a vehicle has reached the end of the queue waiting to cross. The time from that point until the crossing is completed is probably of more value to a border user.
In theory, if the first district is too short or is positioned at a point so far upstream that vehicles most often pass quickly through it then it is probably of little value. This was not an issue for the calculation of border travel time during this project, since the number of trips during which vehicles reported in District 1 was nearly identical to the number of reports recorded in District 2. This suggests that the size of District 1 was appropriate. As the number of vehicles providing GPS data increases, this becomes less of an issue, since those not reporting in the first district can be excluded without significantly compromising the dataset.
Alternately, if a lack of reports in the first district occurs with great frequency, it may be appropriate to forego the use of districts altogether and use a combination of vehicle speed data and data that shows the linear distance to defined landmarks-both of which can be obtained using GPS technology-to characterize the operating profile of each trip, rather than use a fixed measurement zone.
Finally, consideration must be given to the implications associated with the large values of standard deviation, and the differences between the mean and median values seen in the data. This level of variability does present particular challenges in formulating accurate values for current travel time, such as might be used in a traveler information system.
This is important because high levels of variability in the data can make it difficult to accurately characterize travel time fluctuations associated with non-recurring congestion, which has a profound effect on travel time reliability.
A larger number of vehicles, spread across a more diverse set of carriers, should improve the data and lower the degree of variability in much the same way. From a statistical standpoint, as the sample size increases, the proportion of outliers and their influence on the mean values should diminish.
It is not clear at this point the threshold at which that might occur to an acceptable degree. The border environment, while similar to the greater roadway network in many ways, does have some unique characteristics. Most notably, it is very small in comparison to a freeway network around a metropolitan area, for example. In a border zone such as the Otay Mesa crossing, where the trip length is short, travel speeds are typically low, and there is a great deal of commercial clutter in the immediate vicinity, extra care must be taken when defining algorithms for assignment of points to roadway links, and in defining measurement zones that are appropriately sized. On an open roadway network, where speeds are generally higher under normal conditions and monitored roadways are more easily segregated in a GIS overlay, the same level of precision is not required.
It is important to remember that some degree of variability will continue to exist simply due to factors such as variations in the proportion of vehicles participating in secure supply chain operations (such as the C-TPAT and FAST programs), fluctuations in the total number of vehicle crossing during a given period, the number of empty trucks versus loaded trucks passing through the crossing, and the application of various enforcement actions, such as the blocking procedure discussed earlier.
As for the GPS device installed in the trucks of the participating carriers, it is expected that the performance was similar to that of a number of devices currently available commercially. The key performance specifications of the unit were the reporting accuracy and the recording and reporting frequency (3 minutes and less than 10 minutes, respectively). Any device that offers these specifications should suffice for the purposes of travel time reporting. It should be noted that the Otay crossing is located in an area that is largely devoid of features that could interfere with GPS signals, such as closely-spaced tall buildings. For that reason, the performance achieved at Otay may not be similarly achieved in locations where such conditions exist.
4.2 Data Applicability
As discussed above, the results of this project indicate that the applicability of the GPS data for historical analysis is clear. The current dataset has significant planning and analytical value. Using queue modeling techniques, border and transportation agencies should be able to easily examine and model the effects of various practices and configurations. They should also be able to examine real-world effects of such changes through before-after analysis of the data. This should be of particular value for assessing infrastructure modifications and changes in staffing practices.
The capture and use of real-time-or more accurately, near real time-data presents a set of challenges that were not completely addressed during the project, but the experience offers some useful insights. First, the information exchange and processing mechanisms employed for the project would not be suitable as deployed solutions. The study team received the data via weekly FTP file exchanges, processed the data manually through TransCAD GIS software, and ran algorithms against a static database to derive travel time values. This would not be an acceptable solution for use of the data in near real time.
These mechanisms would need to be replaced with appropriate communications and processing tools and protocols in order to be usable for applications that rely upon timely information processing, such as traveler information systems that might inform border users of current conditions. These sorts of solutions-which use GPS data-are presently in use by various traffic data information providers. For example, INRIX uses GPS data as a component of its travel speed information services under a program it is conducting for the I-95 Corridor Coalition.28
In order for the Otay data to be made available for such purposes, three major actions would be necessary. First, the data from the fleets would need to be received as part of a live data feed. One popular method is the use of Extensible Markup Language (XML) connections through a web services environment. These sorts of feeds are relatively easy to establish, and offer flexibility for modification as necessary. Second, the association of individual points received from units with an underlying GIS would need to be automated. Again, this activity is becoming more commonplace. In fact, this is being done as part of a separate project with FHWA.29 It involves the assignment of individual GPS data points (i.e., latitude, longitude, time stamp, travel heading, travel speed) to pre-defined roadway links, as designated by geo-coded base maps that are typically in use by state departments of transportation. Finally, the transfer of data back to a user (e.g., a DOT) would need to be automated. This could be accomplished in much the same way as the GPS data feed, using an XML data stream exchanged using web services.
Each of these actions, as cited above, is relatively commonplace and none require extensive programming. The investment necessary to accomplish this would be modest, though it would certainly be contingent upon the technical sophistication of the entities performing the development and the end users.
The second major challenge associated with the use of the GPS data for real time or near real time operations is the interpretation and use of the data for decision-making purposes. As indicated previously, the data gathered for the Otay crossing over a one year period indicates a high degree of variability for travel time. The implication is that the establishment of a reliable baseline value for the "typical" travel time for a defined period is difficult. It is this baseline value that forms the basis upon which a determination can be made by a user whether current conditions warrant a change in border usage (e.g., a change in departure time or crossing location).
For instance, a user might determine that in order to affect their crossing decision the current travel time must exceed the baseline ("normal") value by 15 percent. That means that if the expected travel time is 60 minutes, the current travel time would need to reach 75 minutes for that user to decide to delay a trip until a later time period. Consequently, the user must trust that the values are accurate and reliable in order to have confidence in his/her decision. The level of variability in the data captured during the test suggests that this user scenario would be difficult to accommodate without some means to mitigate the effects of the variability.
Given this outcome, it is important to remember two things. First, there is no evidence to suggest that the data gathered during the project is not completely accurate. The data simply indicates that the identification of a reliable, precisely-defined mean (or median) value at the Otay crossing is difficult. Carrier input supports the conclusion that the degree of variability is accurate. Hence, this is not an indictment of the validity of GPS data. To the contrary, it appears to be evidence of its accuracy. As indicated above, this may become less of a concern as the volume of baseline data increases, and more stringent filtering can be applied to reduce the affect of outliers on the mean and median values.
Second, it is important to remember that there are few options for local carriers that use the crossing to apply the information in its current state. Carriers must use the Otay crossing since commercial vehicles are not permitted at San Ysidro, and the Calexico crossing is too far to the east to be of use for the local maquiladora traffic. Further, their delivery schedules-at least those of the carriers that participated in the project-are dictated by customers. Where real time or near real time data might be more useful is for the management of agency staff at border inspection points, or for long-haul truckers that use the port while they are at points distant enough that allow them to re-route or re-schedule to avoid delays.
Ultimately, a larger penetration level of trucks, from a broader cross-section of the user population at the crossing, may have a beneficial effect on the level of variability by driving down the standard deviation.
4.3 Data Accessibility
The data model that was used to access motor carrier data for this project is one of several different approaches employed by GPS data providers. Many data providers obtain GPS data in use for roadway monitoring from the provider of the GPS device and/or service. The data provider for this project chose to seek it directly from the carriers. Ultimately, regardless of the source of the data, if it is generated by a carrier's vehicles, then it is the property of the carrier. As such, the carrier can do with it whatever it desires. Fleet GPS device and service providers may have to obtain permission from their carrier customers in order to re-use the data, as was the case for this project.
This approach presents both benefits and challenges. One potential benefit is that, because the carrier already owns the data for its fleet, and only its fleet, it may be willing to grant access to the data for a modest sum. Further, by working directly with the carrier while agreeing to protect sensitive information from distribution, a data provider could structure its data agreements to allow for the re-use of information for multiple purposes without incurring additional cost.
Perhaps the most significant challenge to this approach is that which was encountered during the recruitment of carriers for participation in this project. The data provider was forced to endure a lengthy courtship process before the carriers finally agreed to grant access to their data. Ultimately, the carriers became more receptive as they recognized that the devices are useful for other functions, as well. Nonetheless, the level of time and effort necessary to execute agreements with the carriers indicates that this should be factored in whenever GPS fleet data is sought.
4.4 Data Value
Even with the level of variability demonstrated in the data gathered during this project there appears to be significant value associated with GPS fleet data. It is important to remember that the nature of the GPS system is such that it continuously generates data, regardless of time or location. Hence, the very same devices could easily be monitored for other movements, and the data processed to quickly multiply its planning and operations value.
For instance, the data provider already possesses raw GPS data for Mexico-bound movements at the Otay crossing. This means that with the completion of some processing logic, travel time for southbound trucks during the same period could be calculated without additional data collection. Further, raw GPS data has already been recorded for movements both within the areas adjacent to and those beyond the Otay border zone. This data could be used not only for travel time calculation, but also for the identification of origins and destinations and route selection. This also underscores the importance of carefully considering the terms of any data agreement to ensure that the maximum value can be obtained.
Finally, modern fleet GPS devices are becoming both more affordable and more capable, opening up additional opportunities to capture valuable information. One example is the capture of device from a vehicle's engine data bus, which would include such information as throttle position, travel speed and fuel use by location. This has potential value for the identification of locations where excessive idling is occurring-important data for carriers to establish mechanisms to improve fleet fuel efficiency and for agencies to examine the potential environmental value of infrastructure upgrades.
The key to accessing such data requires the definition of the appropriate levels of data access and use rights with the carriers or with service providers. This will require that significant attention is paid to the protection of carrier and driver identity, and restrictions pertaining to the use of the data for ancillary purposes, such as public sector analyses.
4.5 Data Business Model Options
The previous section discussed some of the conclusions associated with the various options for collection and use of GPS data. Obviously, this project was focused on the collection and analysis of a narrow segment of GPS data for the purposes of establishing the usefulness of GPS as a travel time measurement technology. It was also configured and priced accordingly, rather than might otherwise be appropriate for what might be considered a "production" solution.
In order to establish a more meaningful representation of what a customer body might incur for the implementation and operation of a GPS-based solution, the study team worked with Calmar to define and establish potential price ranges for a series of data packages. These are provided in Table 6 below.
The table highlights three different data packages, each containing different types and amounts of data, processed to different levels. Each package is described according to eight different characteristics:
- Content – this describes, at a high level, the data that will be made available by the vendor.
- Data Set Size – this reflects the approximate number of vehicles from which GPS data would be acquired.
- Geographic Scope – this defines the geographic area within which data will be collected.
- Potential Uses – this indicates a brief summary of some of the uses of the data, and what activities it might support.
- Potential Users – this contains a simplified list of potential consumers of the data, by type.
- Output – this defines the proposed format and delivery method for the dissemination of the data.
- Cost Range – this reflects an estimate of initial and ongoing costs associated with establishing and maintaining the proposed data feed.
- Time to Deploy – this offers an estimate of the time necessary to complete carrier recruitment, development of the data feed, and testing prior to deployment.
Data Package Characteristics | Basic Data Package | Enhanced Data Package | Full Data Package |
---|---|---|---|
Content | Mildly processed data including basic historical travel time, and raw GPS data with filtering to remove outliers | Fully processed travel time data, plus data traces and files indicating origins and destinations | Fully processed travel time data, plus data traces and files indicating origins and destinations |
Data Set Size | Approx. 200 trucks | Approx. 500 trucks | Approx. 1000 trucks |
Geographic Scope | Area similar to project area, which includes a limited rectangular are that encompasses length of measurement zone, in both directions across border, along the entire distance of the California border with Mexico | Area that extends from the border, along the entire distance of the California border with Mexico, to the Los Angeles basin to the north, and (to the extent available) to the Ensenada area in Baja California | Area that extends from the border, along the entire distance of the California border with Mexico, to the Los Angeles basin to the north, and (to the extent available) to the Ensenada area in Baja California |
Potential Uses | Establishment and refinement of historical travel time dataset to monitor changes over time due to various factors (e.g., facility alterations, operational procedure modification, affect of trade fluctuations, etc.) | Establishment and refinement of historical travel time dataset to monitor changes over time, plus information for route choice modeling and infrastructure investment planning, plus near real time data for integration with existing traveler information systems, for user decisions regarding facility staffing, traffic control, and trip planning | Establishment and refinement of historical travel time dataset to monitor changes over time, plus information for route choice modeling and infrastructure investment planning, plus near real time data for integration with existing traveler information systems, for user decisions regarding facility staffing, traffic control, and trip planning, provided via web site and/or wireless device notification |
Potential Users | Planning organizations, customs agencies, shippers, carriers | Transportation system managers, planning organizations, customs agencies, shippers, carriers | Transportation system managers, planning organizations, customs agencies, shippers, carriers, border users |
Output | Database file, updated monthly via ftp, plus basic graphical output similar to that produced for this project | Live XML (or similar) data feed suitable for supporting near real time operations and analysis, plus graphical summaries at regular intervals | Live XML (or similar) data feed suitable for supporting near real time operations and analysis, plus a fully functional user interface for interactive access to data |
Cost Range | $10K to 12K per month | $40K to $60K plus $14K to $20K per month | $80K to $120K plus $16K to $25K per month |
Time to Deploy | 3 months | 6-9 months | 8-12 months |
When viewing this information it is important to remember that the figures provided for price ranges are intentionally broad. This is necessary for two reasons. First, until detailed specifications are identified, precise estimates are impossible to make. Only after working with the potential customer can a data provider accurately establish pricing. Second, because this is a commercial venture for Calmar and other similar providers, sensitivity exists regarding the provision of specific pricing data in a public document such as this.
A few assumptions must be considered with regard to the data outlined in the table. First, all data feeds would include basic quality control/quality assurance (QA/QC) provisions (i.e., erroneous and redundant points would be removed), data would be tagged with and trip begin and end identifiers where GPS data patterns suggests such a case exists. Second, the data would primarily consist of latitude/longitude, date/time stamp, speed, link identification, and vehicle identification. Vehicle identification would be a sequentially applied anonymous number that would change daily, or upon some other user-defined criteria.
Finally, two limiting conditions would be placed on the data. Specifically, location information for carrier yards and other confidential locations (i.e., customer locations) would be removed, and activity within ports and rail yards would be excluded using geo-fencing. All locations would be designated as general originations/destinations.
As the information in the table implies, there appear to be ample opportunities to recoup the costs associated with the establishment and maintenance of fleet GPS data beyond that of simply generation of travel time values. Among the most immediately promising applications is the capture of transportation origin/destination data, which is of significant value for transportation planning purposes. Such data can be of use for establishing and validating travel demand models popular with planning staff, and can also offer significant insight into overall roadway usage patterns. Coupled with capabilities to connect to and extract information from engine control units (ECUs), this data could also be used to spatially examine truck operating speeds, idling, and emissions on the roadway network, allowing agencies and carriers to identify trouble spots and adjust investments and operations accordingly.
28 Information available on I-95 Corridor Coalition website at: http://www.i95coalition.org/i95/Projects/ProjectDatabase/tabid/120/agentType/View/PropertyID/107/Default.aspx [ Return to note 28. ]
29 The project is the Real Time Traffic Monitoring (RTTM) component of the Cross-Town Improvement Project (C-TIP). More information is available at: http://www.ctip-us.com/Ctip/home.htm [ Return to note 29. ]