fantasyfootball.features
Module Contents
Classes
Create lag features for each column in the dataframe by group. |
|
Create a moving average feature for each column in the dataframe by group |
|
Reduce the number of categories in a categorical column. |
|
Replace a categorical column with the average target value for each category. |
|
Create common fantasy football features for predictive modeling |
Attributes
- fantasyfootball.features.logger
- class fantasyfootball.features.LagFeatureTransformer(n_week_lag: list, lag_columns: list, player_group_columns: list)[source]
Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreate lag features for each column in the dataframe by group.
- Parameters:
n_week_lag (list) – Number of weeks to lag the data
lag_columns (list) – Names of columns to lag
player_group_columns (list) – Names of columns to group by. For example, if you want to lag the data by player and season, you would pass in the list [“name”, “season_year”]
- Returns:
Dataframe with lag features
- Return type:
X (pd.DataFrame)
- class fantasyfootball.features.MAFeatureTransformer(n_week_window: list, window_columns: list, player_group_columns: list)[source]
Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreate a moving average feature for each column in the dataframe by group
- Parameters:
n_week_window (list) – Number of weeks to average over
window_columns (list) – Names of columns to average over
player_group_columns (list) – Names of columns to group by. For example, if you want to lag the data by player and season, you would pass in the list [“name”, “season_year”]
- Returns:
Dataframe with moving average features
- Return type:
X (pd.DataFrame)
- class fantasyfootball.features.CategoryConsolidatorFeatureTransformer(category_columns: list, threshold: float)[source]
Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinReduce the number of categories in a categorical column.
Bins column values that fall below a threshold into a single ‘other’ category.
- Parameters:
category_columns (list) – Names of columns to consolidate
threshold (float) – Threshold for consolidating categories. For example, if you want to consolidate categories with less than 1% of the data, you would pass in the float 0.01.
- Returns:
Dataframe with consolidated categories
- Return type:
X (pd.DataFrame)
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class fantasyfootball.features.TargetEncoderFeatureTransformer(category_columns: list)[source]
Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinReplace a categorical column with the average target value for each category.
- Parameters:
category_columns (list) – Names of columns to target encode.
- Returns:
Dataframe with target encoded columns
- Return type:
X (pd.DataFrame)
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class fantasyfootball.features.FantasyFeatures(df: pandas.DataFrame, y: str, position: str, player_group_columns: list = ['pid', 'name', 'team', 'season_year'], game_week_column: str = 'week')[source]
Create common fantasy football features for predictive modeling
- Parameters:
df (pd.DataFrame) – Dataframe containing player data by season
position (str) – Position of players to include in the dataframe
player_group_columns (list, optional) – Indicates which columns should be used to group players by. Defaults to [“pid”, “name” ,”team” ,”season_year”].
game_week_column (str, optional) – Indicates week of season. Defaults to “week”.
- Raises:
ValueError – If position is not a valid position
ValueError – If player_group_columns are not present in the dataframe
- property data: pandas.DataFrame
Returns the dataframe of the historical NFL Fantasy data.
- Returns:
Historical NFL Fantasy data.
- Return type:
pd.DataFrame
- filter_inactive_games(status_column: str = 'is_active') FantasyFeatures[source]
Filter out inactive games.
- Parameters:
status_column (str, optional) – Name of column indicating whether
"is_active". (a player was active in a game. Defaults to) –
- Returns:
FantasyFeatures object with inactive games removed.
- Return type:
- static _calculate_n_games_played(df: pandas.DataFrame, player_group_columns: list) pandas.DataFrame[source]
Calculate the number of games played by each player in each season.
Helpful for filtering out players who have not played a certain number of games in a season, which can lead to issues when creating lag features.
- Parameters:
df (pd.DataFrame) – Dataframe to calculate the number of games played for each player.
player_group_columns (list) – Collection of columns that are unique to each player.
- Returns:
Dataframe with the number of games played for each player in each season.
- Return type:
pd.DataFrame
- filter_n_games_played_by_season(min_games_played: int) FantasyFeatures[source]
Filter out players who have not played a certain number of games in a season.
- Parameters:
min_games_played (int) – Minimum number of games a player
season. (must have played in a) –
- Returns:
FantasyFeatures object with filtered dataframe.
- Return type:
- static _save_pipeline_feature_names(columns: List[str], feature_type: str, *values: Union[int, str]) List[str][source]
Saves the names of the features created by the pipeline to a string.
The feature names are saved in the format: <column_name>_<feature_type>_<value> For example:
is_active_lag_1
rush_yds_lag_4
passing_yds_ma_2
- Parameters:
List[str] (columns) – List of column names to save.
feature_type (str) – Type of feature (e.g. lag, moving average, etc.).
values (Union[int, str]) – Additional parameters passed to the transformer.
- Returns:
List of feature names.
- Return type:
List[str]
- static _validate_max_week(season_year: int, week_number: int) None[source]
Validates that the week number is not greater than the max week for the season.
- Parameters:
season_year (int) – Year of the season.
week_number (int) – Week number of the season.
- Raises:
ValueError – If the week number is greater than the max week (18) for the season for games after 2020.
ValueError – If the week number is greater than the max week (17) for the season for games prior to 2020.
- static _validate_future_data_is_present(ff_data_dir: pathlib.PosixPath, max_week: int, data_sources: dict) bool[source]
Validates that the future data is present for the upcoming week. For example, if it is week
- Parameters:
ff_data_dir (PosixPath) – Path to the directory containing the future data.
max_week (int) – Max week number for the season + 1. For example, if the max week is 8, then the future, yet to-be-played week is 9
data_sources (dict) –
A dictionary indicating the names of the data sources used in the fantasyfootball package. Note that when validating the future data, only those data sources with ‘is_forward_looking’ set to True are checked. Example: data_sources = {
- ”calendar”: {
“keys”: [“team”, “season_year”], “cols”: [“date”, “week”, “team”, “opp”,
”is_away”, “season_year”],
”is_required”: True, “is_forward_looking”: False,
}}
- Raises:
ValueError – If ‘week’ or ‘date’ is not present
ValueError – If ‘week’ is not equal to max_week
- log_transform_y() FantasyFeatures[source]
Log transform the y column.
- Parameters:
None –
- Returns:
FantasyFeatures object with log transformed y.
- Return type:
- create_future_week() FantasyFeatures[source]
Creates a dataframe of future features for an upcoming NFL game week. For example, if ‘Week 8’ is the most recent completed set of games, a single row of features will be created for each player for ‘Week 9’.
- Returns:
Appends a dataframe of future features to the historical data.
- Return type:
- static _create_step_str(step: str, transformer_name: str, **params) str[source]
Creates a string representation of a pipeline step.
- Parameters:
step (str) – Description of what feature transformer is being used.
transformer_name (str) – Name of the feature transformer class.
**params (dict) – Parameters for the transformer.
- Returns:
String representation of the pipeline step.
- Return type:
str
- _validate_column_present(feature_columns: Union[str, list]) None[source]
Validates that a column is present in the dataframe prior to adding a new feature.
- Parameters:
feature_columns (Union[str,list]) – Columns to validate as present.
- Raises:
ValueError – If any of the feature columns are not present.
- add_lag_feature(n_week_lag: Union[int, List[int]], lag_columns: Union[str, List[str]]) FantasyFeatures[source]
Adds string representation of a lag step to the pipeline.
- Parameters:
n_week_lag (Union[int, List[int]]) – Number of weeks to lag.
lag_columns (Union[str, List[str]]) – Columns to lag.
- Returns:
Updated string representation of the pipeline steps.
- Return type:
- add_moving_avg_feature(n_week_window: Union[int, List[int]], window_columns: Union[str, List[str]]) FantasyFeatures[source]
Adds string representation of a moving average step to the pipeline.
- Parameters:
n_week_window (Union[int, List[int]]) – Number of weeks to average across.
window_columns (Union[str, List[str]]) – Columns to average.
- Returns:
Updated string representation of the pipeline steps.
- Return type:
- add_target_encoded_feature(category_columns: Union[str, list]) FantasyFeatures[source]
Adds string representation of a target encoded step to the pipeline.
- Parameters:
category_columns (Union[str, list]) – Columns to target encode.
- Returns:
Updated string representation of the pipeline steps.
- Return type:
- consolidate_category_feature(category_columns: Union[str, list], threshold: float) FantasyFeatures[source]
Adds string representation of a category consolidator step to the pipeline.
- Parameters:
category_columns (Union[str, list]) – Columns to consolidate.
threshold (float) – Threshold for consolidating categories.
- Returns:
Updated string representation of the pipeline steps.
- Return type:
- _remove_missing_feature_values(feature_df: pandas.DataFrame) pandas.DataFrame[source]
Removes rows that have missing values related to lag or salary columns.
When creating a lag, the first N weeks of data will be NA. This function removes those rows. Likewise, Draftkings and Fanduel do not publish salary data for the first week of each season. This function removes those rows if any fields from the salary data are included. If both salary and lag data are included, the maximum of the two will be used when removing rows.
- Parameters:
feature_df (pd.DataFrame) – Dataframe to remove rows from.
- Returns:
Dataframe with missing lag values or salary data removed.
- Return type:
pd.DataFrame
- _replace_missing_salary_values_with_zero(feature_df: pandas.DataFrame) pandas.DataFrame[source]
Replaces missing salary values with zero.
Salary data is missing for the first week of each season. Also, when players are injured and questionable to play, a salary value may not be published.
- Parameters:
feature_df (pd.DataFrame) – Dataframe to replace missing
zero. (salary values with) –
- Returns:
Dataframe with missing salary values replaced with zero.
- Return type:
pd.DataFrame
- _forward_fill_future_week_cv(feature_df: pandas.DataFrame) pandas.DataFrame[source]
Forward fills CV values for future weeks.
- add_coefficient_of_variation(n_week_window: int) FantasyFeatures[source]
Add coefficient of variation (cv) for each player based the trailing standard deviation and average of weekly fantasy points scored.
- Parameters:
n_week_window (int) – Number of trailing weeks to
calculation (use for calculating the cv. Note that) –
seasons. (occurs across) –
- Returns:
Dataframe with cv added as a column.
- Return type:
- create_ff_signature() dict[source]
Creates a fantasy football ‘signature’, which includes the following steps:
Executes the previously created pipeline data transformations
Removes missing values stemming from lagged features or salary features
Replaces missing salary values with zero
- Returns:
The names of the new features created by the pipeline and the transformed dataframe.
- Return type:
dict