I recently presented at Strata Data Conference in New York, an event created for data scientists. I was interested in how many of the presentations and exhibitor displays would highlight the importance of spatial data and incorporating Location Intelligence into their data science. Considering that an estimated 80% of data has a spatial component, one might expect it to be part and parcel of every software solution. Would everyone already have everything they need for Location Intelligence?
In some ways, I already knew the answer. I’ve attended similar events in the past, but it still surprises me how spatial analysis is a fringe interest to this audience. At a basic level, everyone seems to get it – I walked by a significant number of vendors that were using a map for visual demonstrations. Did that mean they understood spatial processing and analysis? Unfortunately no.
The majority were vendors that have some serious number crunching capabilities and database technologies behind the scenes. The maps are just part of a series of charts that help the organization explain some capabilities of their technology. As one sales representative put it, “It’s pretty hard to show an in-memory database. People need to be able to see something to get their heads around it.” As a result, there were a lot of displays by the vendors that showed the freely available and commonly used New York taxi dataset. With over a billion taxi journeys representing pick-up and drop-off locations, the dataset become a very large number of dots on a map. Most vendors could then demonstrate how to filter the data – say by time of day. However, when I started asking about spatial queries, the faces soon turned blank.
In fact, by appreciating the importance of the spatial context, the depth of understanding of this New York taxi dataset is massively enhanced. Various blog posts document how spatial data joins to additional datasets to provide insights into why people use taxis in the way that they do. Todd Schneider provides spatial analysis that reveals unexpected insights. For example, joining the data to census tracts and neighborhoods allows potential house hunters to see if they are selecting a neighborhood which is the quietist at night. Todd provides a way to start to see the power in the spatial query and the responsibilities that come with it. He illustrates that, although the data has been stripped of the customer names in the suburbs, users can easily start to see a pattern of when a wealthy homeowner might be leaving for work every day. Being able to anonymize datasets and protect personally identifiable information is something that all organizations have an urgent need to resolve.
Pitney Bowes Spectrum for Big Data products are looking to solve exactly these kinds of issues for companies. There is a growing need to democratize data, to deal with large volumes of data, as well as the increasing volume of sensor-based data with location as a component. Plus, we are seeing a growing desire to protect personally identifiable information.
One way companies can protect personally identifiable information is to aggregate data, rather than supply individual transactions. Being able to spatially bin data into a hierarchical grid structure and then aggregate results means that much of the value of the spatial data can be retained but without the concern that individuals can be identified. Locations are overlaid on a geohash, square or rectangular grid and assigned a grid ID. The data within a grid cell is then aggregated and those cells with too sparse a dataset are not reported. The grid is hierarchical so if I have only one data point at a 10m resolution I might choose to aggregate at a 100m resolution where I have 100 data points.
As the volume, variety and velocity of datasets continue to grow along with a hunger for knowledge from a growing army of data scientists we continue to urge clients to consider location intelligence as an integral part of their business. Failure to understand the spatial component of data can lead to embarrassing mistakes over PII data, as well as missing some of the key drivers to customer behavior.
To find out more about how location analytics is driving new insights and providing tangible value for organizations, you can download this white paper.