The Office for National Statistics Postcode Directory relates both current and terminated postcodes in the United Kingdom to a range of current statutory administrative, electoral, health and other area geographies.
The data we need for this exercise is postcode data and the field 'PCON' which contains the relevant Parliamentary Constituency code for each postcode in the UK.
The data in the ONS Postcode directory file is around 200MB in size in ZIP format and the data is presented in three different parts:
- The complete dataset in CSV format
- The complete dataset as a large text file
- The data split into 124 individual CSV files based on postcode
Additionally there is a user guide and a documents folder containing a number of other CSV and XLSX files.
Caution: Don't try to open the large CSV file by double-clicking it to open in Excel , as it will not open as it contains too many records.
Note that in the source CSV data, each field that contains text is wrapped in quotes. There are no commas within these fields, so it is safe to remove the quotes.
If you have a problem removing the quotes when importing the data, a workaround is to split the large file into smaller manageable chunks using external software and then remove the quotes. Splitting into nine files will give you batches of 300,000 records or so which is easy enough for Excel to cope with.
Free software is available to split the data, such as CSV Chunker at:
http://www.scaled-solutions.com/blog/open-source-csv-file-splitter
Once the file is split into smaller chunks, the fastest way to remove the quotes is to open each split CSV file in Excel and re-save the file as a CSV file. This is an almost instant process and is much quicker than trying to remove quotes using a text editor. This will also reduce the file size of the CSV files and should speed up the import process when you import the data into your database.
The fields that are important for the exercise are PCON, which contains the parliamentary constituency code and the postcode field.
Although the data is also split into 124 files, I wouldn't advise using these individually unless you have a specific purpose in doing so as 124 files is far too cumbersome to work with.