Marguerite le Riche from our Geographic Information Systems (GIS) team was part of the first round of the Scottish Data Science Accelerator Programme, focusing on automatically extracting buildings from historic maps. Here’s what she learnt, and how this could inform future work at Registers of Scotland (RoS).
What is the Data Science Accelerator Programme?
The Data Science Accelerator Programme is a professional learning and development programme that gives aspiring data analysts from across the public sector the opportunity to develop their data science skills. The first round of the programme took place this year, as a collaborative initiative by the Scottish Government (SG), Registers of Scotland, National Records of Scotland (NRS) and the NHS Information Services Division, supported by the Data Lab at the University of Edinburgh.
Why did you want to be involved in the programme?
The GIS team is interested in building a house age dataset for Scotland by comparing historic maps with present-day Ordnance Survey maps and using the comparison to estimate when individual buildings were built. The National Library of Scotland (NLS) has a large collection of scanned historic maps, including the 25 inch to the mile OS County Series Maps that were produced between the 1840s and 1930s. Each county was mapped a number of times and the maps are a rich source of data about change over time in the Scottish built environment.
However, in order to extract the data from the paper, the scanned map images need to be digitised into vector data with polygons to represent individual buildings. Standard GIS approaches are not up to the task of automating the digitisation process, so when I saw the invitation for applications, I knew that this would be a golden opportunity to learn new tools and approaches that would be invaluable to the project at RoS.
What was your project?
The title of my project was ‘Detecting buildings in historic maps using Open Computer Vision.’ My mentor was James Crone, a GIS Analyst at EDINA, who has applied computer vision and machine learning techniques to scanned maps from the post-war period. He had been focused on finding the locations of buildings as opposed to extracting their shapes, so the latter was my original contribution to the body of work. In the process, I learned how to apply computer vision and entry level machine learning techniques, amongst other things. I had one day a week, for 17 weeks, to work on the project – which goes by really quickly, especially if your project is a bit on the ambitious side!
How will you bring what you’ve learned back to your GIS work at RoS?
I had never used either computer vision or machine learning before the accelerator and I am very pleased with the amount of learning that I was able to do in a limited amount of time. It has definitely opened up new avenues of work for me. Using the sample data supplied by NLS, I was able to extract considerably better polygons than previous attempts, particularly for buildings that are represented with speckles in the printed maps. There is, however still a lot of work to be done on buildings that are represented with hatching, shading and colouring – and there are plenty of unrelated problems to solve before we will have an actual ‘house age’ dataset! I’m looking forward to seeing where this research takes us in the future.