Data to Insight is not a software supplier in the traditional sense -- we're here to help local authority (LA) colleagues get work done and share it with each other. The best thing about Data to Insight is the community of analysts who are doing and sharing great work through this network.
We recognise that different LAs want different levels of engagement with our work. As such, we offer four key collaboration approaches to suit different users’ information needs.
The disproportionality tool allows you to compare the breakdown of different ethnic groups between populations. It uses user data, and DfE data to compare the numbers of people from different ethnic groups between populations. It also performs statistical tests to determine if differences in population proportions are statistically significant. It then produces plots visualising the breakdown of populations into ethnic groups.
There are three versions of the tool which all produce the same outputs. Guides for each version of the tool can be found below.
The first is the usability version of the tool. This allows users to select LAs and time periods from dropdowns, but does require user data to be entered in the code, but this isn't too hard, it just requires deleting sample numbers and replacing them with user numbers. This tool isn't recommended for more experienced Python users as the code added to make the tool more usable for non-Python users actually makes the code harder to read and understand as it uses packages new users may be unfamiliar with such as ipywidgets, and because the need for user inputs to select from drop downs means that this version of the tool does not allow the use of the restart and run all button in Jupyter notebooks, cells must be run one at a time.
The second version of the tool is the dev version. This is the recommended version of the tool for people with some Python experience as it is built in a way that allows the entire Notebook to be run in one go, only uses packages users are likely to be familiar with, and the removal of ease of use features like dropdowns makes reading and understanding and customizing the code much easier. To use, it requires the user to make some inputs into the code itself, such as replacing sample LA and time period inputs with the user's own, rather than selecting these from drop downs.
The final version of the tool is the learning version. This is built on the dev version of the tool. In the most simple sense, it is the dev version of the tool with lines of code removed, but with example code and prompts to guide users towards writing the missing lines themselves. This will get users used to reading existing code to understand it, writing the type of code needed for data analysis, and contributing to pre-existing code bases. Given that this version of the tool is built using the dev version, if you get stuck you can just look back at the code for the dev version to see what you should do.
Once you've replaced the example data with your own, ensure the cell with the data in is selected, and run it using the run button. Following this, click run twice, this should run a markdown cell (a cell with text descriptions), and a cell with code in. This cell with code in may take some time to run, but no more than 30 or so seconds based on your internet and computing power) as it gets data directly from the DfE website in the form of a zip file, unzips it, and then reads it into a dataframe in Python. Whilst Jupyter Notebooks is running code, an hourglass will appear in the left corner of the browser tab where the notebook is opened, note though that this isn't instant, so if you click run and the hourglass doesn't appear right away, you don't need to hit run again, you just need to wait for the cell to start running. Once you've run these cells, scroll down until you see the dropdown menus, as seen in the image below. These dropdowns won't arrear until the cell has finished running. You may see a red box with an error, don't worry about this.
Use the dropdowns to select the LA and time period you want to use for comparison. Then, make sure you don't re-run this cell, or it will reload the dropdowns. Instead, click on the next cell and run it. From here on, you can just hit run for every cell until they're all run. Once they've all run, the data tables and plots will be visible throughout the notebook, just scroll through to see the ones you want. There are three ways the comparison is made for ethnic population distribution: Relative Rate Index, Rate per 10,000, and Percentage, breakdowns are also done for ethnic sub-groups and ethnic main-groups. Simply scroll to the visualisation you want, and then you can copy and paste the visualisation to be used anywhere you want to use it.
When you load up the usability tool, you'll see text descriptions explaining how to use the tool, what it does. If you don't know how to use a notebook, see our Jupyter Notebooks guide here. If you give the descriptions a read, the first bit of code you get to will look like the image below. This is the setup section that asks the user to input their own ethnicity data, to do so, simply delete the numbers on the right hand side of the equals sign for each ethnicity, and replace it with your own data, ensuring you replace any comma you accidentally remove. This bit of code you're entering the values into is called a dictionary, you can tell because the variable ethnic_input is set equal to key:value pairs surrounded by curly braces. You don't need to understand this to use the tool, but it's worth explaining as dictionaries are very important in Python.
Once you've done this, you'll need to input the LA and time period of your choice. For this to work, you'll need to get the spellings and punctuation exactly as they are in the source data, but don't worry. To get a list of all LAs and time periods, scroll to the bottom of the setup cell and see where the code to print the list of LA names and time periods is commented out (preceeded by a #), if you want to see these lists, just delete the #s and run the cell.
If there is a # before code, it tells Python not to read it as code, normally this is used to write comments in code, but is also used to temporarily remove bits of code for testing. If you remove the #s from the code above and run the cell, it will print the list of time periods and LAs seen above (for the above, I removed the #s, ran the code to get the example lists, and re-added the #s to better illustrate the example). Once you've found the LA and time period you want, you'll need to copy paste them into the correct part of the code setup. The place for this can be found at the bottom of the setup section, just under the dictionary you entered your population data. Simply replace the example time period and LA (202122 and East Sussex) with the time period and LA you've chosen, ensuring that the name of the LA you'ver chosen is surrounded by either single quotes ('LA Name') or double quotes ("LA Name").
If you've followed these steps correctly, you can either run every individual cell until they've all run, or click the restart and run all cells button (looks like a fast forward button) in the tool bar. Once they've all run, the data tables and plots will be visible throughout the notebook, just scroll through to see the ones you want. There are three ways the comparison is made for ethnic population distribution: Relative Rate Index, Rate per 10,000, and Percentage, breakdowns are also done for ethnic sub-groups and ethnic main-groups. Simply scroll to the visualisation you want, and then you can copy and paste the visualisation to be used anywhere you want to use it.
A consideration about the dev version of the tool is that, because the code is written to be relatively simple (favouring simplicity over fast and optimised code), if you want to make changes and customize it, you can and should. Also, you can and should copy/paste bits of code from here into your own individual work as and when needed.
To use the dev version, you need to be just a little more comfortable with using Python than the usability version, but the skills you need essentially boil down to replacing some variables in the code with your own. First things first, scroll down to the setup section, pictured below, and replace the example population data with your own. If you've used the usability version of the tool, you may notice some things are ordered differently. in the usability tool, packages are imported in what is technically the wrong place just to make things a bit easier. In this version, the packages are imported first, as they should be.
The guide for the learning version of the tool is very minimal, by design. Working out how code works and how to use it yourself is a key part of learning to code. You'll notice when trying to run any cell in the learning version that it probably won't run, and you'll have all sorts of errors. This is because the cells are not completed, that is for the user to do by following the prompts and examples in the code. The idea is for the user to read through the code, and then, using the examples from the code, put that into practice to complete similar lines of code further through the worksheet. As you work through completing the worksheet version, you may want to refer to the dev version guide to help you get set up if things don't make sense. You can also refer to the completed code in the dev version of the tool if you get stuck.
Some tips for completing the worksheet:
When writing bits of code in cells, you may not be able to test the code you've just written or the line you've just completed because there are lines lower down in the cell that are uncompleted and waiting for the user to complete that will cause errors. Where this is the case, it might be worth commenting out lines of code below where you are working so you can properly run and test every line as you go.
A really nice and commonly used way to test that bits of code you've written work is to use print statements and check that what is printed is what you expected, for instance, check that a data frame you've made has the data you expected.
When writing and testing functions, you'll likely call the function later in the code, not in the cell you wrote the function in. To test that the function works, however, it's worth calling the function at the bottom of the cell you're writing and testing the function in so it's easy to run, check outputs, and make changes.
It's probably a good idea to keep a version of the worksheet downloaded with no changes so, if you mess up, you can revert back to it.
It's a good idea to check the official documentation or google when you get stuck before checking the answers, as this is a skill you'll need to develop and will use all the time when coding.