Tracking Physical Asset Movement with LSTMs

Missing Pallets

The cost of missing pallets and other types of reusable packaging is sizeable industry-wide. For example, the Auto Industry Action Group estimated a loss of $750 million per year within the North American US auto industry, and the absolute percentage of pallet loss could be as high as 15-20% (Deloitte). Loss may occur either through misplacement of assets or through intentional theft, with misplacement generally being the larger problem (Jerry Welcome, president of the Reusable Packaging Association). While most theft comes in the form of individuals or small groups, such as use of pallets as furniture (Welcome), the threat of organized operations cannot be ignored. For example, a Los Angeles County Sheriff's Department task force (the Plastic Industrial Theft Task Force) recovered $7.4 million in stolen plastic pallets and containers over the span of two years, from 2011 to 2013.

Pallet, Asset Tracking

Transportation companies are invested in GPS & RFID tracking technology, capturing information faster and at a higher granularity than ever before. The hope is that higher visibility of goods will translate to improved operational processes. However, while more real-time data is captured, evidently it hasn't translated into the operational gains many have hoped. Manually interpreting this information is now far too slow and expensive to be realistic. Alternatively, we believe the opportunity is ripe to truly assimilate this data and put it to use by utilizing modern data analytics and machine learning.

Anomaly Detection via One Minute Look Ahead Prediction

In service of this goal we propose an automated method for alerting on pallets or other trackable assets that are at high risk of loss. Underlying our approach is the assumption that items deviating significantly from an expected trajectory have an elevated risk of going missing. Our method can take advantage of high granularity tracking data by leveraging modern machine learning techniques, in particular deep neural networks, to build predictive models for inferring an item’s location in the near future.

Neural networks are a class of machine learning model inspired by the architecture of the human brain. They consist of multiple sequentially connected layers that progressively learn new representations of the input data such that the final representation can be accurately used to predict an objective of interest. By feeding such a network a sequence of prior object positions we are able to predict  its location in the near future. To perform this task we make use of a class of neural network known as a Recurrent Neural Network (RNN), which is particularly well suited to handling ordered sequences of data-points (e.g. a time-series of object locations) as input (Karpathy, 2015). Specifically, we use the popular Long Short-Term Memory (LSTM) RNN layer, which has shown good performance on a variety of tasks. The architecture of an LSTM unit is illustrated in Figure 1 below.


Figure 1: Architecture of an LSTM layer. Figure courtesy of (Olah, 2015).

Though an in-depth explanation of LSTMs is beyond the scope of this post (interested readers are referred to Christopher Olah’s excellent description), the primary mechanism by which sequences are handled is the “hidden state” ht. The hidden state effectively encodes previously seen data-points into a single vector that is maintained and updated as data flows through the network. This vector is completely internal to the LSTM unit and thus “hidden” from outside view. The output of the LSTM layer is a weighted combination of the hidden state and current input, allowing both past and present to be used for predicting the future. The main innovations that distinguish LSTMs from previous RNN implementations are tunable “forget” and “input” gates, which allow the module to selectively erase and update portions of the hidden state. This enables important information to persist in the hidden state over long periods of time, endowing the model with both “long” and “short” term memory.

The LSTM unit receives sequential data as input. This includes information such as GPS coordinates, time of day, etc. that change as a function of time. However, we have additional data that we wish to incorporate into our model that remains constant. Primarily this consists of identity information, such as a unique driver ID in our case study below, that remains static over the course of a particular time sequence. Our approach for such data is to learn an embedding, and merge the outputs of the embedding and LSTM layers for combined use by subsequent layers. An embedding is a mapping that encodes sparse, categorical data into dense real-valued vectors. One prominent uses of this technique is the word2vec model, which turns words into vectors such that semantic meaning is preserved in the relative distances between embedding vectors (Mikolov, 2013). For instance, after training a word2vec model “woman” and “mother” should be closer than “woman” and “Paris” in the embedding space. By learning embeddings for our static high cardinality categorical features we are able to efficiently translate them into numbers our model can interpret without producing massive, sparse input vectors such as would be generated by a simpler encoding strategy like one-hot encoding.

Putting everything together, the full architecture of our model is diagrammed in Figure 2 below.


Figure 2: Architecture of lookahead prediction network.

It includes two previously discussed input layers operating in parallel, LSTM and embedding, for handling sequential and static categorical data respectively. A merge layer then concatenates the embedding and LSTM outputs so that they can then be fed to a dense fully connected layer. Activations from the dense layer are then passed to a final dense output layer which produces lookahead predictions, i.e. the predicted location of the object in the near future. This simple arrangement allows for seamless integration of both types of input data into a single cohesive model. Our implementation is developed in python using the high level Keras deep learning framework with Tensorflow as the computational backend.

Promising results on the potential of utilizing the aforementioned machine learning approach to asset tracking have already been demonstrated via analysis and prediction based on the Beijing Taxi Dataset 2007. Example taxi trajectories from the data-set are overlaid on a map of the city in Figure 3.


Figure 3: Visualization of Beijing taxi data-set.

To validate our approach we use the LSTM + embedding model to predict taxi locations one minute into the future. Continuous sequential input features include latitude, longitude, timestamp, orientation, speed, and occupancy status. Orientation (car travel direction) is mapped to a 2D position on the unit circle to capture its periodicity, and other features are scaled to approximately unit magnitude. Car unique ID serves as the lone categorical input feature fed to the embedding layer.

In Table 1 we compare one minute lookahead prediction results on a held-out test set from a fully trained model to other benchmarks. The first column lists average travel distance per minute; this is effectively the error we would expect if our model simply used the current car position as its expected location one minute into the future. The second column demonstrates results using the average of the three most recent positions as the predicted future location. Since cars travel mostly in straight lines, this actually increases error over the prior approach. The third column utilizes predictions from our LSTM+embedding model, but with a training set limited to data from just 5 cars. Unsurprisingly, this is too little data to train a robust model, and prediction accuracy is significantly degraded. Finally, the fourth column of Table 1 displays one minute lookahead prediction results from the LSTM+embedding model after training on data from 8000 cars. In this scenario generalization to unseen data has greatly improved and the benchmark from column 1 is easily surpassed.

Average  Distance to Travel/Min:  391m

Rolling Average with 3 Previous Positions:  786m

LSTM+embedding (with 5 cars):  2076 m

LSTM+embedding (with 8,000 cars):  152 m

Table 1: Comparison of average lookahead predictions accuracies on the test set.

In Figure 4 we investigate the effect of training set size in finer detail, and visualize prediction results using a single taxi trajectory as an example. In the plots the blue line represents the actual taxi trajectory, while the red lines denote taxi current position (where the red line is anchored to the blue) and predicted one minute lookahead position (other end of the red line). Ideally, the red and blue lines will overlap. Based on the plots, additional training data clearly improves prediction quality, with overlap steadily improving from non-existent to fairly robust. The largest remaining errors occur when the taxi makes an abrupt change in direction, i.e. turns right or left. This is to be expected as previous data-points should be less indicative of future position. However, prediction quality still improves with additional training data even for this challenging scenario, indicating that the model may be learning about popular routes and driver habit in addition to simply extrapolating position based on current trajectory.


Figure 4: Model accuracy as a function of training data size for a single taxi. The blue line represents the actual taxi trajectory, while red lines indicate predicted one minute lookahead positions. Greater overlap between red and blue lines correspond to more accurate model predictions.  Axes are in units of meters.

The availability of cheap and accurate tracking mechanisms based on GPS and RFID technology has enabled companies to monitor their mobile assets in real-time. It has also introduced a new conundrum: how to make use of the flood of data such devices generate. It is clear, that smart, automated methods must be used to extract actionable signal from this deluge. In this post we demonstrate an approach based on deep learning that can be used to alert on assets that have gone off track, and thus are at high risk of loss. Our proposed method hinges on the ability to accurately predict an object’s future location based on its recent trajectory. Using the publicly available Beijing Taxi data-set we show that such prediction is feasible using powerful LSTM and embedding neural network layers, given sufficient training data. Further investigation using new data-sets should be pursued to provide additional validation for this promising approach.


Karpathy, A. The Unreasonable Effectiveness of Recurrent Neural Networks. (2015, May).

Olah, C. Understanding LSTM Networks. (2015, August).

Mikolov T, Sutskever I, Chen K, Corrado GS, and Dean, J. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 3111-3119. (2013).


Jae Lew, Akash Sahoo, Pierre Dueck, Juan Carlos Asensio, Arshak Navruzyan



David Staub