Type 3 CIN rules

Type 3 CIN rules are rules that check whether or not things happen within a group, a group is something like an individual CP plan or CIN plan. For instance, 8935 checks that you haven't opened a second CP plan whilst another is already open.

As always, select a rule, this time type 3, and assign yourself to it. Make a new branch named for the rule, and open a codespace on that branch. for type 3 rules, the template rule is 8896. If, at the time of reading this guide, 8896 is not merged and is still in template form, you can copy the code from the pull requests section of the cin_validator repo.

First off, as always, you need to get the tables and columns of interest by replacing the ones in the template. As rule 3 uses groups, you'll need to have the column that identifies the group of interest, so, for CP plans, that's CPPID, for CIN plans it's CINdetailsID and so on. This is needed to organise and sort rows by groups, and to make the link_id.

Next you need to update the @rule_definition decorator. This is the same process as for beginner and type 1 rules. Update code with the rule code, module with CINTable.TableName, message witht he message from your rule,and affected affected_fields with the field your rule checks.

Next is the validate function. This is probably easier than writing the validate function for type 2 rules. First off, we need a dataframe containing the data for our table. This dataframe is called df, typical for naming dataframes inside functions. To populate it with the data from your table, simply replace ChildProtectionPlans with the table for your rule in the line:

df = data_container[ChildProtectionPlans]

Now time for the logic. First up, you'll notice that a new dataframe is made, called df_check. It's made by using .copy() on df. This is done so that chenges can be made to this new dataframe, without affecting the original, so they can be compared later. Next up, you'll need to write your rule to work on df_check. Most type 3 rules are similar so for most instance you'll simply need to replace the column names in the template with your own.

Rule 8935 checks that only one CP plan group has no end date. To find rows that fail then, we need to return rows where children have more than one open CP plan group as the failing rows. The easiest way to do this is to find rows where CP plan end dates are missing, then find out the LAchildID and CPPID for children where this happens more than once, and return those as failing rows.

The logic in the template returns True for rows where the CPPendDate column of df_check is an NaN, then takes those rows from df_check, and passes them back to the variable df_check so that it only contains those rows.

After this, as Python can't count NaNs, the NaNs are replaced with 1s in the CPPendDate column. This allows us to use the .count() method, but if we just use that right off the bat, we wouldn't be counting anything useful. We need to count the number of times a child has a CPPID without a CPPendDate. We can do this by using the .groupby() method on df_check, and giving it a list of appropriate columns to group the data by. We want to group by LAchildID and by CPPID so we know how many CPPIDs children have with missing end dates. Finally, to complete the logic, we filter out df_check returning only rows where the number in CPPendDate (which is now the number of CPPendDates for each child that were empty for CP plan groups) is greater than 1.

Now, as with type 1 and 2 rules, you can make your issue_ids and ERROR ID. It's important to make this up of things that can't be NaNs or things that df_check has changed, so, for instance, as my rule uses empty CPPendDates, I can't use that as the issue orERROR ID. LAchildID and CPPID as a good choice in my case as that should be a unique combination. Fill in both issue_ids and df['Error_ID'] appropriately so that you can select rows from df according to df_check appropriately.

Remember to also update the table and columns in rule_context.push_type_3(). As this type of rule only checks one table, it is still a type 1 rule!

Now for the test_validate function. It's pretty much the same as previous rule types. Write a sample dataframe with rows that should pass and fail to check all possible combinations, update date rows to datetime, in result = run_rule(), replace the table names with the table for your rule before the colon, and your sample dataframe after. Then update the assert statements for issue_columns/table to the rule relevant columns and table, update the number of rows failing to reflect your sample data.

As always, you'll need to update expected_df to contain the error ID of failing rows. Also, as rows fail based on the result from other rows, you'll need to have the index number of both rows causing the conflict in ROW_ID.

Last step now! Update the assert statement for rule definition and message to reflect your rule. Then run black on your rule's filepath in the terminal, commit your code, and make a pull request!