Python for PATCh
This page contains resources collated to help you learn enough Python to get started on the PATCh tool. It's intended to help new starters with D2I projects, like PATCh get up to speed. It's not meant to be an exhaustive list of resources or things you'll need to know, it's enough to get you going and to learn what kind of things we can do with Python so you can get to work, and so you know what to Google or ask for help with if you get stuck.
The list is split into sections, with the first three sections, Intro to Python, Intro to data handling, and Intro to plotting being the minimum things you'll want to be comfortable with to get started with the PATCh tool. By the time you've covered these things you'll already have skills you can add to your current workflow, or replace elements of your current workflow with. The last three sections, Intro to stats, Advancing plotting, and Advancing utility, take skills from earlier sections and extend them to more complex outputs and tasks. It's here that Python really starts to shine, but it must be noted that these are a) just the start of what Python can do, and b) not necessary for those interested in PATCh.
If, rather than using a set of videos, as in this resource, you'd rather follow one video from start to finish, free code camp videos like this one tend to be very good: Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn) Although, I haven't followed the video so I can't vouch for it personally.
Finally, if you want to get started or involved but don't have Python installed yet, or can't, we have a guide here explaining how to set up JupyterLite, a way to use Python in your browser in a way that can be used safely with data stored locally. If using JL, you might also want to look at our how to use JupyterNotebooks guide here as it's slightly different to other ways of coding Python.
Intro to Python
This section contains resources to help you get started with Python. Whilst the examples don't have any obvious CS use cases out of the gate, they're essential concepts for people who have never written any code before to understand. If you've never done any coding before, it's worth starting here. If you've coded other languages you can probably skip this section, but it's worth looking up the Python documentation to see how the topics in this section are handled in Python as it may be different, for instance some key differences to other languages: Python doesn't need you to define the type of a variable, and for loops are defined quite differently to most languages.
Variables, what are they, how they work, how to use them:
Functions - how to write code once to perform repeated actions:
For loops - getting code to repeat multiple times based:
If/Else statements - getting things to happen only in certain instances:
Intro to Data handling
This section introduces a package called Pandas in Python, this is the main package we'll use to handle data. In its most simple usage it is what we use to read tabular into Python like CSVs and Excel documents, and what we use to manipulate it. Even for just these things it is very powerful, allowing complex selection, manipulating, statistical analyses, and exploration with very simple commands. Note, if you are using JupyterLite, you'll need to follow different instructions to those in following videos if you want to read files directly from your computer rather than loading them into JL but once you have a DataFrame set to a variable, you're all good: Accessing local files — JupyterLite 0.1.0-beta.18 documentation
Intro to Pandas - this starts simply but covers key things like loading data from CSVs and directly from SQL databases:
Basic Pandas usage and features - ignore how the files are read in, we won't be using the method in this video, but once the files are in, everything is the same:
Basic Pandas data manipulation, working with dates and times, and exploration:
Intro to plotting
This section introduces a package called matplotlib, it's the most widely used plotting library in Python. Whilst the plots aren't the most flashy or exciting, they are highly customizable and can take a lot of the statistical work needed to make plots off your hands, such as performing regression analysis, adding error bars, and even doing the calculations needed to make plots where, in things like Excel, you need to do separate calculations to make plots in many instances. This section only includes one, in-depth long form video covering key features, the best use is probbaly to watch the first bit to get the basics, then skip ahead to plots that interest you: Matplotlib Tutorial : Matplotlib Full Course
Intro to stats in Python
Whilst matplotlib and other libraries can handle the stats for us, they tend to use packages called numpy and scipy behind the scenes to do the statistical work. It's a good idea to learn how to do statistical work yourself, to see what's being done and so you can do it independently when you want. The videos in this section cover a very small amount of what's possible, for demonstration you can take a look at the statistical functions section of the scipy docs to see how full featured just that small element of it is: Statistical functions (scipy.stats) — SciPy v1.10.1 Manual.
Intro to stats in Python, what different types of stats mean, and how to use them:
Linear regression in Python - performing and understanding linear regressions is a key starting place for stats so I've also included a video on specifically that in this section:
Advancing plotting: Seaborn and Plotly
Seaborn and Plotly are actually easier to use than matplotlib once you're comfortable with plotting as they allow users to write far less code, and produce more visually pleasing graphs out of the box, however, an understanding of matplotlib is really essential as the starting place for understanding plotting in Python. Whilst Seaborn, built directly on top of matplotlib produces pleasing graphs and can be heavily customised using matplotlib commands, Plotly is totally separate and, unlike Seaborn and matplotlib, is aimed at being used to create more than individual plots. Whilst it can be used to do that, it's real power is in how easily it can be used to make impressive interactive dashboards. The videos in this section, like the matplotlib section, are long form videos that'll give you a good grounding and which can be skipped around as necessary.
Advancing Python skills and use cases
This section will evolve as time goes on as we find more topic analysts would like introductions to. It includes resources for more advanced packages and more advanced usage of key concepts like Pandas and functions. Particularly, the aim of this section is to provide resources which are a little more project oriented. Rather than showing the use of Python in abstract, they show how the skills can be used in real, although not directly CS related, areas.
Jupyter Widgets - is a library that allows you to add interactive elements to plots and dashboards both inside and outside of Jupyter. These videos are very introductory as your best port of call is the relevant docs (Jupyter Widgets — Jupyter Widgets 8.0.2 documentation (ipywidgets.readthedocs.io):
Advancing functions - how to really start using functions in your work, including use cases:
Advancing Pandas - how to perform more complex work with Pandas, including use cases: