Thursday 26 December 2013

Assigning addresses from Land Registry Prices Paid data

After the disappointment of the Land Registry INSPIRE land parcels, it is nice to report a large and useful open dataset from the same source: the Prices Paid data (LRPP). These are the actual prices paid for houses and flats in England and Wales from late-1995 or thereabouts to the present.

New Residential Roads in England and Wales
Roads were identified from Land Registry Prices Paid data
and matched by name to OSM highways within 2km of postcode centroid


The data are not geolocated but contain postcodes which are available as centroids. Most interestingly the data contain house numbers and names. However, most of this is of relatively little direct use in OpenStreetMap without at least a cursory ground survey of a street.

Except, that is, for all newly built streets! The data has a flag to indicate if a property is being sold for the first time or not. With this data it should be possible to identify streets where all the properties contained in the LRPP data have this flag set to yes for their earliest sale date.

After a couple of false starts I have found a reasonable compromise means of doing this. Apart from getting my algorithms to be too coarse or too fine, the biggest problem is that this flag is not particularly reliable. It may be set to yes for properties which had never been formally registered with the Land Registry (i.e., ones which had not been sold in the last 50 years or so), but some new build properties clearly have the wrong flag. I have therefore set my selection criteria for 80% of the earliest sales being 'newbuild' properties.

I used postcodes to localise the data, but there is enough information in the data to do it on the basis of local authority districts and placenames. The complication is that these reflect the local government organisation pertaining at the time of sale. Using postcode centroids to reduce the number of possible streetname matches was just easier. I then matched the streetname in the LRPP data to OpenStreetMap named highways and it is the latter data which are mapped above.

The advantage of looking at new roads is that all housenumbers should appear in the data and with judicious assumptions (based on arriving at the road from the centre of its named locality: odds on left, evens on right for longer roads; numbering running clockwise of shorter roads without an exit) it is possible to assign housenumbers without a ground survey. Some additional data in the LRPP can help too: the type of house (detached, semi-detached, terrace, flat) is indicated, and, naturally, its price.

I tried this out for a road in Send Marsh, Surrey, built since I last lived there: Danesfield. Unfortunately I can't remember what was there before.


Danesfield, Send Marsh View Larger Map

This road illustrates some of the likely issues: 
  • How is it numbered? This was quite a long street with 34 houses identified by LRPP data. Although it is a cul-de-sac, it could be numbered using either scheme. My guess was that it would be sequential numbering. The existence of a terrace of houses, numbers 27-29 and number 30 as a semi-detached house was easily related to the aerial photo data. (Note house 31 is incorrectly coded as detached in the data, the price paid belies this).
  • Is there a number 13? I would expect any street of houses built for sale in Britain not to have a number 13. They may still be created for social housing, but even this is becoming rarer. I have noticed that it is common for houses 12 and 14 to be separated by a service driveway (as here): presumably this is a device to minimize the obviousness of omitting 13 from the house numbering.
  • Is every house in the LRPP data? In this case no. There are two, possibly three extra houses on the road which are not accounted for by the LRPP data. However, using the obviously correctly assigned numbers the first house on the left entering the road is actually on Polesden Lane (see below). I have continued the numbering from 35 for the last two houses on the left, but this in practice needs a survey to confirm my assumption. I have no idea why these two properties are not in the data. These are substantial detached houses in a fairly desirable area of Surrey with good rail connections to London. It would seem unlikely that they have been rented out. Conceivably they may remain in the ownership of the original land owner.
  • Are there related new builds on adjacent streets? The house on the corner of Polesden Lane and Danesfield and it's neighbour are obviously new houses from inspecting the aerial photos (and my own recollection of the road). As stated above it took me a while to realise this house belonged on Polesden Lane. I then found two houses which had the right postcode, price. These are named Kelban and Kareela. I have added these names to the houses, but, of course, don't know which one is which. This does not materially matter for finding the houses and if incorrect can be corrected.
All in all there are about 106,000 postcodes covered by these new roads, and perhaps as many as 90,000 roads. For each road we can also derive a start_date for it's construction and for the houses on the road. So we have a starting point for doing nice mapping of building age in England and Wales too.

No comments:

Post a Comment

Sorry, as Google seem unable to filter obvious spam I now have to moderate comments. Please be patient.