I’m pretty entrenched in the Google ecosystem. I use all of Google’s web services, own a Pixel, and have my daily workflow tied to the whole platform. Like most people, I am aware that Google collects a lot of information from its users- but I never really stopped to question just how much and how often my data was being collected.
Not too long ago, Google launched Takeout, an online portal that allows users to export and download data from Google products that are used, like Gmail and Maps. Given my newly founded interest in Google’s tracking activity, I decided I would take a look and see what information Google Maps has on me. I have been using Google Maps and Location Services on Android since 2012, so the data I get from this service should be fairly comprehensive.
Once downloaded, the Takeout export is saved in a little zip file which contains a JSON file with entries like this:
{
“timestampMs” : “1544617468622”,
“latitudeE7” : 400225589,
“longitudeE7” : -750777557,
“accuracy” : 82
}
Each of these entries encodes a location measurement taken by Google, with GPS coordinates (latitude/longitude) and a timestamp, which can be converted to a “normal date” by dividing the number by 1,000. Although a bit large (387 MB in my case), we can use R to easily visualize this large dataset on a map (you can view the source code here):
Heatmap of my location during last 6 years living in Philadelphia, PA
This ‘heatmap’ shows the distribution of my location throughout the city of Philadelphia over the past six years. As seen on the map, I spend the majority of my time in Center City and Northeast Philly, with my commute patterns clearly visible, where the darker cluster of dots shows my commute along the Market-Frankford Line. From a personal perspective, the map is not all that insightful, but is nonetheless interesting for visualizing my travel patterns around the city. A deeper dive of the data is required to better understand how often Google is tracking my location.
The JSON export that I downloaded contains 1,123,198 observations over the course of 2,215 days (about 6 years). On average, that means that there are more than 21 measurements every hour, or one measurement every 3 minutes. This is slightly higher than the estimated average of 14 measurements per hour for Android devices (for reference, I’ve been exclusively using Android phones since 2012). If we take a closer look at the distributions of the timestamps, we can see that there is some variation depending on the time of day and day of the week (the R code can be found here):
The lines in the plot show the average number of location measurements taken by Google each hour, separated by days of the week. The dashed line indicates the average over all days of the week. Some key findings include:
- Between 4pm and 10 pm, Google takes on average more than one location measurement every 2 minutes
- In the nighttime, the average number of measurements is only once every 6 minutes
- Monday and Wednesday mornings are closely watched with many measurements, especially Wednesday mornings
- After a dip in the morning, measurements go up in the late afternoon and early evening, especially on Fridays
I am a little bit surprised to see how often Google tracks me, even (and especially) at times at which I am certain that I’m not using Google Location Services (e.g., 4AM). This sort of behavior seems to follow Douglas Schmidt’s research findings on Google’s passive data mining. Unlike active data mining, which happens every time you sign in to any of its services, passive data mining can happen without any user intervention or knowledge. The location measurements shown above is a good example of that—as measurements are being made, even while I am asleep and not engaging with Google’s services.
In general, Google collects far more data than any other tech company out there. Seeing as how the platform has a lot of appreciable data on its users, this may make some uncomfortable. Speaking personally for myself, I value the convenience of Google services over privacy. Even though the company is collecting a lot of my data frequently (i.e., tracking my location almost 500 times a day), Google is using the data collection services to offer a better user experience.
The reason why Google’s services are so advanced and useful is because it can leverage its massive data pool to train its machine learning algorithms, therefore improving its services. Google Maps, the de facto and most popular mapping app in the US, is a good example of this. Unlike direct competitors like Apple Maps, Google Maps maximizes its leverage of data to provide the most accurate traffic conditions and even suggestions to its users. Google Duplex is another recent example, which uses sophisticated machine learning to make restaurant reservations on behalf of users. This would not be possible without large amounts of data being collected, and it explains why Google takes the lead here. Ultimately, however, determining whether the benefits outweigh the detriments of convenience over privacy is up to the individual to decide. Luckily, tools like R and Takeout can help provide better context so we can make more informed decisions about our privacy.
Carlos Bonilla is an Analyst at Econsult Solutions, Inc. with a focus in economic and environmental issues in urban areas. At ESI, Carlos applies a strong background in spatial analysis, cartography, and data visualization. Prior to joining the team, Carlos worked as a fellow for Azavea’s Summer of Maps, a program that pairs students with local and national non-profits to perform geospatial analysis.