Building a Custom Downloader for NYC’s PLUTO Data

I recently launched a custom download tool for NYC’s PLUTO data, which you can check out at http://chriswhong.github.io/plutoplus. Here’s a little bit about how and why I built it:

New York City’s Department of City Planning maintains a geospatial database of tax lots and their associated records that goes by the name of PLUTO (It’s a backronym that stands for Primary Land Use Tax Output). It includes polygons for the city’s 870,000+ tax lots, and over 80 attribute columns including assessments, easements, number of units, number of floors, zoning status, etc. It’s every Urban Planner’s dream dataset, and gives you a treasure trove of information for mapping and analyzing the built environment. PLUTO used to cost money, but is now available as a free download in ESRI Shapefile format from DCP’s time-honored Bytes of the Big Apple website (which pre-dates the Open Data portal and the Open Data law by many years. DCP are the hipsters of Open Data, they were doing it before it was cool).

The downloads consist of a shapefile for each of New York’s 5 boroughs, and range in size from 11MB for Manhattan and 65MB for Queens. Each shapefile contains many thousands of records, and can be quite unruly to work with depending on how much memory you have and what program you’re using. I found myself doing some freelance GIS work recently that involved analyzing a chunk of the city that spanned two boroughs. I ended up having to manually filter down the data from both shapefiles to the areas that I wanted, and then merge the two subsets together so that my study area and all attributes were together in a single file. Then, once I had tamed the geometries, there were 80 columns of attributes, most of which didn’t pertain to my analysis. I only needed 1!

Idea: Build a custom PLUTO downloader that allows the user to select a geographic area and a set of attributes, and get custom PLUTO data on demand.

To accomplish this, the data must live in a spatial database that can take in the parameters for the custom subset of data and return the results quickly over the web. CartoDB is a perfect fit for this problem, and will allow us to send a PostGIS SQL query via its SQL API and get back results in multiple formats.

At the end of the day, this project boils down to a single-page UI that helps the user build a SQL statement for the data they want.

Consolidate the data

The plan was to consolidate the PLUTO geometries for the entire city into a single table, and keep the attributes in a separate table. These two tables would be joined on the fly when the user requests data.

I loaded all 5 PLUTO shapefiles into CartoDB, and stripped them of all of their attribute columns except for BBL (BBL stands for borough, block, & lot and serves as a unique ID that we can use for joins). I then did successive UNIONs, combining them into a single dataset which I named plutoshapes.

I used the full-city non-spatial PLUTO CSV to populate the attributes table, which I named pluto14v2. (PLUTO 2014 version 2… the latest and greatest version) It occurred to me while writing this post that the row counts for these two datasets don’t match, so I’m going to have to figure out why sometime soon.

Make a Simple Visualization in CartoDB

The map used in the frontend will only be for reference purposes, and won’t actually show any data other than the geometries. Creating a CartoDB visualization with a simple SELECT * query and no data-driven styling will allow me to draw PLUTO’s polygons on the frontend’s map with just a few lines of code.

The Frontend

My frontend is a fullscreen map with sidebar, based on this CartoDB Twitter Bootstrap Template I built a while back. I’ll walk through the key functionality of script.js:

First, I set up the page. This involves creating a map using Cartodb’s javascript library, adding a dark basemap to it, then adding the generic PLUTO overlay that I created before.

//initialize map
var map = new L.Map('map', { 
  center: [40.70663644882689,-73.97815704345703],
  zoom: 14
});

//add a basemap
L.tileLayer('https://dnv9my2eseobd.cloudfront.net/v3/cartodb.map-4xtxp73f/{z}/{x}/{y}.png', {
  attribution: 'Mapbox Terms & Feedback'
}).addTo(map);

  //add cartodb named map
var layerUrl = 'https://cwhong.cartodb.com/api/v2/viz/2602ab80-0353-11e5-89a0-0e0c41326911/viz.json';

cartodb.createLayer(map, layerUrl)
  .addTo(map)
  .on('done', function(layer) {

  }).on('error', function() {
    //log the error
  });

layerUrl above is a CartoDB viz.json, which is essentially a structured version of the data query and styling rules that went into my generic PLUTO map. If you’re curious, you can load that URL and see what’s inside.

Next, I populate the list of PLUTO attributes from a static JSON file.

//populate fields list
$.getJSON('data/fields.json',function(data) {
  //iterate over the results and add each field name to the list in the sidebar
});

I could have retrieved this list of column names using CartoDB’s API and requesting one row of data from the pluto14v2 table, but I wanted to include the descriptions so I ended up making a static csv that I could easily transform into JSON.

Building the Query

With a preview map loaded, and a nice list of attribute columns with checkboxes (and a select all button if the user doesn’t want to scroll through 80 field names), the user can start panning and zooming the map and checking boxes for attributes. All the magic will happen when one of the download buttons is clicked:

$('.download').click(function(){

  var data = {};

  //get current view, download type, and checked fields
  var bbox = map.getBounds();
  data.type = $(this).attr('id');
  var checked = listChecked();

  //generate comma-separated list of fields
  data.fields = '';
  for(var i=0;i<checked.length;i++) {
    if(checked[i]!='bbl') {
      data.fields+= checked[i] + ',';
    }
  }
  data.fields=data.fields.slice(0,-1);


  data.bboxString = bbox._southWest.lng + ',' 
    + bbox._southWest.lat + ','
    + bbox._northEast.lng + ','
    + bbox._northEast.lat;

  var queryTemplate = 'https://cwhong.cartodb.com/api/v2/sql?skipfields=sbbl,cartodb_id,created_at,updated_at,name,description&format={{type}}&filename=pluto&q=SELECT * FROM plutoshapes a LEFT OUTER JOIN (SELECT bbl,{{fields}} FROM pluto14v2) b ON a.sbbl = b.bbl WHERE ST_INTERSECTS(ST_MakeEnvelope({{bboxString}},4326), a.the_geom)';


  var buildquery = Handlebars.compile(queryTemplate);

  var url = buildquery(data);

  console.log("Downloading " + url);

  window.open(url, 'My Download'); 

});

First, I get the bounding box (lat/lon for the southwest corner of the map & lat/lon for the northeast corner of the map) using leaflet’s .getBounds() method. I then call a function named listChecked() which returns an array of all of the PLUTO attributes that the user checked in the sidebar list.

I’m methodically building out an object named data, which I’ll use with handlebars.js to build out an API call. data.type is the file format the user has requested (based on which button they clicked). data.fields is a string consisting of the user’s selected attributes. data.bboxString is a transformation of the data inside the object returned by .getBounds() that I’ll be able to use in my PostGIS query. Handlebars takes this fully-formed data object and builds out a call to the CartoDB SQL API using this template:

https://cwhong.cartodb.com/api/v2/sql
?skipfields=sbbl,cartodb_id,created_at,updated_at,name,description
&format={{type}}
&filename=pluto
&q=SELECT * FROM plutoshapes a 
LEFT OUTER JOIN (
SELECT bbl,{{fields}} FROM pluto14v2
) b ON a.sbbl = b.bbl 
WHERE ST_INTERSECTS(ST_MakeEnvelope({{bboxString}},4326), a.the_geom)

https://cwhong.cartodb.com/api/v2/sql is the base SQL API url for my account. skipfields excludes columns from the results (columns that are present in all CartoDB tables but are not relevant to the user). format gets whatever value data.type holds, which will be either “csv”, “shapefile”, or “geojson”. filename simply defines the name of the resulting downloaded file. Finally, q is our SQL query, which I’ll break down below.

Remember, I stored the data in two separate tables, one with all of the geometries, and one with just the attributes. This query is doing a LEFT OUTER JOIN, starting with all of the polygons and then joining them to the attributes where the BBLs match. The comma-separated list of {{fields}} is injected by handlebars.js to specify which columns the user requested from the attributes table. The final WHERE clause contains the bounding box. ST_MAKEENVELOPE() is a PostGIS function that creates a rectangular polygon given a set of bounding box coordinates (which are also injected by handlebars.js as {{bboxString}}) Once the rectangle is defined, I use ST_INTERSECTS to select only those polygons in the PLUTO geometries dataset that are within the box or touching its edges (I could have also used ST_CONTAINS() here, but that would exclude all polygons that are not 100% within the bounding box)

Let’s see it in action. I’ll zoom in on Williamsburg, Brooklyn, and choose only the address, allzoning1, and assessland columns. When I click the geojson download button, the following API call is built:

https://cwhong.cartodb.com/api/v2/sql?skipfields=sbbl,cartodb_id,created_at,updated_at,name,description&format=geojson&filename=pluto&q=SELECT * FROM plutoshapes a LEFT OUTER JOIN (SELECT bbl,address,allzoning1,assessland FROM pluto14v2) b ON a.sbbl = b.bbl WHERE ST_INTERSECTS(ST_MakeEnvelope(-73.96154880523682,40.71133720811548,-73.95286917686462,40.71723302006985,4326), a.the_geom)

Compare this to the template and see where handlebars.js did its thing. Now I’ll open the downloaded file in QGIS.

There’s my data, containing only the 903 tax lots that were in my current map view, and only the three columns that I selected (plus BBL, the unique identifier for each polygon)! Woot woot.

So What?

This will be a major time saver for anyone who is uses PLUTO data and finds themselves starting from scratch with the city’s raw shapefiles. It’s also a proof of concept for how governments could better publish large geospatial datasets and I would LOVE to see someone fork this project and stand it up for another dataset in another city.

It turns out that fellow map nut and twitter pal Bill Morris has built a similar custom download tool for Burlington, Vermont, and has written a great blog post about what’s wrong with geo open data portals and why this “few bells and whistles” approach better serves a broad range of data consumers. His version has address search, topojson, and shareable URLs, all things I hope to add if there is any interest from the community.

For now, retrieving data for the current map view is the only option for choosing a specific area, but in the future I will allow the user to draw an arbitrary polygon OR choose a neighborhood, borough, or other relevant NYC geometry to get their data.

Check out the code on github.

Thanks for reading! Support Open Data, and build awesome things!

Jason Wheatley says:

May 26, 2015 at 9:19 pm

Chris – the seemingly orphan records between the assessment data and the parcel polys likely has to do with typical many to one owners to parcels in a planer space. This often occurs in apartments, where there are multiple owners within the same 2d land footprint. I can’t speak to NYCs data as I’ve not had a look, but I’ve produced several parcel datasets in my time that use this practice.

More importantly. Awesome tool!

David Brice says:

March 20, 2017 at 1:03 pm

Hi Chris

This tool has been incredibly useful for me and I recommend it to every nyc-based GIS user I speak to. Thank you so much for making this available. Could you update it to the current release of PLUTO (16v2, I think)? It would be very much appreciated!

David says:

May 7, 2017 at 8:39 am

Thanks for updating! Unfortunately though when I try to download anything with the entire city selected, the file that exports (whether it be .csv, shapefile, or geojson) only captures the parcels for JFK airport and the surrounding area (and random single parcels from around the city). I have tried using different browsers and computers, but this issue still occurs. It does not seem to matter which columns I select, or whether I export using the ‘Current Map View,’ ‘Neighborhood,’ or ‘Draw a Polygon,’ features to select the export area. See screenshot here https://ibb.co/fuc1rQ

It would be really awesome if you could look into this. Thanks!

John Baker says:

May 19, 2017 at 1:43 pm

Love this tool but it seems to have broken – I’m having trouble exporting specific columns for the whole city. It also seems to have issues with the .csv output.

Thanks so much for building this – I’m finding it super valuable.

Leave a Reply Cancel reply