`fantasyfootball.features`

Module Contents

Classes

`LagFeatureTransformer`	Create lag features for each column in the dataframe by group.
`MAFeatureTransformer`	Create a moving average feature for each column in the dataframe by group
`CategoryConsolidatorFeatureTransformer`	Reduce the number of categories in a categorical column.
`TargetEncoderFeatureTransformer`	Replace a categorical column with the average target value for each category.
`FantasyFeatures`	Create common fantasy football features for predictive modeling

Attributes

logger

fantasyfootball.features.logger

class fantasyfootball.features.LagFeatureTransformer(n_week_lag: list, lag_columns: list, player_group_columns: list)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Create lag features for each column in the dataframe by group.

Parameters:

n_week_lag (list) – Number of weeks to lag the data
lag_columns (list) – Names of columns to lag
player_group_columns (list) – Names of columns to group by. For example, if you want to lag the data by player and season, you would pass in the list [“name”, “season_year”]

Returns:

Dataframe with lag features

Return type:

X (pd.DataFrame)

fit(X, y=None)[source]

transform(X, y=None)[source]

class fantasyfootball.features.MAFeatureTransformer(n_week_window: list, window_columns: list, player_group_columns: list)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Create a moving average feature for each column in the dataframe by group

Parameters:

n_week_window (list) – Number of weeks to average over
window_columns (list) – Names of columns to average over
player_group_columns (list) – Names of columns to group by. For example, if you want to lag the data by player and season, you would pass in the list [“name”, “season_year”]

Returns:

Dataframe with moving average features

Return type:

X (pd.DataFrame)

fit(X, y=None)[source]

transform(X, y=None)[source]

class fantasyfootball.features.CategoryConsolidatorFeatureTransformer(category_columns: list, threshold: float)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Reduce the number of categories in a categorical column.

Bins column values that fall below a threshold into a single ‘other’ category.

Parameters:

category_columns (list) – Names of columns to consolidate
threshold (float) – Threshold for consolidating categories. For example, if you want to consolidate categories with less than 1% of the data, you would pass in the float 0.01.

Returns:

Dataframe with consolidated categories

Return type:

X (pd.DataFrame)

fit(X, y=None)[source]

transform(X, y=None)[source]

fit_transform(X, y=None)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

class fantasyfootball.features.TargetEncoderFeatureTransformer(category_columns: list)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Replace a categorical column with the average target value for each category.

Parameters:: category_columns (list) – Names of columns to target encode.
Returns:: Dataframe with target encoded columns
Return type:: X (pd.DataFrame)

fit(X, y)[source]

transform(X, y=None)[source]

fit_transform(X, y=None)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

class fantasyfootball.features.FantasyFeatures(df: pandas.DataFrame, y: str, position: str, player_group_columns: list = ['pid', 'name', 'team', 'season_year'], game_week_column: str = 'week')[source]

Create common fantasy football features for predictive modeling

Parameters:

df (pd.DataFrame) – Dataframe containing player data by season
position (str) – Position of players to include in the dataframe
player_group_columns (list, optional) – Indicates which columns should be used to group players by. Defaults to [“pid”, “name” ,”team” ,”season_year”].
game_week_column (str, optional) – Indicates week of season. Defaults to “week”.

Raises:

ValueError – If position is not a valid position
ValueError – If player_group_columns are not present in the dataframe

property data: pandas.DataFrame

Returns the dataframe of the historical NFL Fantasy data.

Returns:: Historical NFL Fantasy data.
Return type:: pd.DataFrame

filter_inactive_games(status_column: str = 'is_active') → FantasyFeatures[source]

Filter out inactive games.

Parameters:

status_column (str, optional) – Name of column indicating whether
"is_active". (a player was active in a game. Defaults to) –

Returns:

FantasyFeatures object with inactive games removed.

Return type:

FantasyFeatures

static _calculate_n_games_played(df: pandas.DataFrame, player_group_columns: list) → pandas.DataFrame[source]

Calculate the number of games played by each player in each season.

Helpful for filtering out players who have not played a certain number of games in a season, which can lead to issues when creating lag features.

Parameters:

df (pd.DataFrame) – Dataframe to calculate the number of games played for each player.
player_group_columns (list) – Collection of columns that are unique to each player.

Returns:

Dataframe with the number of games played for each player in each season.

Return type:

pd.DataFrame

filter_n_games_played_by_season(min_games_played: int) → FantasyFeatures[source]

Filter out players who have not played a certain number of games in a season.

Parameters:

min_games_played (int) – Minimum number of games a player
season. (must have played in a) –

Returns:

FantasyFeatures object with filtered dataframe.

Return type:

FantasyFeatures

static _save_pipeline_feature_names(columns: List[str], feature_type: str, *values: Union[int, str]) → List[str][source]

Saves the names of the features created by the pipeline to a string.

The feature names are saved in the format: <column_name>_<feature_type>_<value> For example:

is_active_lag_1

rush_yds_lag_4

passing_yds_ma_2

Parameters:

List[str] (columns) – List of column names to save.
feature_type (str) – Type of feature (e.g. lag, moving average, etc.).
values (Union[int, str]) – Additional parameters passed to the transformer.

Returns:

List of feature names.

Return type:

List[str]

static _validate_max_week(season_year: int, week_number: int) → None[source]

Validates that the week number is not greater than the max week for the season.

Parameters:

season_year (int) – Year of the season.
week_number (int) – Week number of the season.

Raises:

ValueError – If the week number is greater than the max week (18) for the season for games after 2020.
ValueError – If the week number is greater than the max week (17) for the season for games prior to 2020.

static _validate_future_data_is_present(ff_data_dir: pathlib.PosixPath, max_week: int, data_sources: dict) → bool[source]

Validates that the future data is present for the upcoming week. For example, if it is week

Parameters:

ff_data_dir (PosixPath) – Path to the directory containing the future data.
max_week (int) – Max week number for the season + 1. For example, if the max week is 8, then the future, yet to-be-played week is 9
data_sources (dict) –
A dictionary indicating the names of the data sources used in the fantasyfootball package. Note that when validating the future data, only those data sources with ‘is_forward_looking’ set to True are checked. Example: data_sources = {

”calendar”: {
“keys”: [“team”, “season_year”], “cols”: [“date”, “week”, “team”, “opp”,

”is_away”, “season_year”],

”is_required”: True, “is_forward_looking”: False,

}}

Raises:

ValueError – If ‘week’ or ‘date’ is not present
ValueError – If ‘week’ is not equal to max_week

log_transform_y() → FantasyFeatures[source]

Log transform the y column.

Parameters:: None –
Returns:: FantasyFeatures object with log transformed y.
Return type:: FantasyFeatures

create_future_week() → FantasyFeatures[source]

Creates a dataframe of future features for an upcoming NFL game week. For example, if ‘Week 8’ is the most recent completed set of games, a single row of features will be created for each player for ‘Week 9’.

Returns:: Appends a dataframe of future features to the historical data.
Return type:: FantasyFeatures

static _create_step_str(step: str, transformer_name: str, **params) → str[source]

Creates a string representation of a pipeline step.

Parameters:

step (str) – Description of what feature transformer is being used.
transformer_name (str) – Name of the feature transformer class.
**params (dict) – Parameters for the transformer.

Returns:

String representation of the pipeline step.

Return type:

str

_validate_column_present(feature_columns: Union[str, list]) → None[source]

Validates that a column is present in the dataframe prior to adding a new feature.

Parameters:: feature_columns (Union[str,list]) – Columns to validate as present.
Raises:: ValueError – If any of the feature columns are not present.

add_lag_feature(n_week_lag: Union[int, List[int]], lag_columns: Union[str, List[str]]) → FantasyFeatures[source]

Adds string representation of a lag step to the pipeline.

Parameters:

n_week_lag (Union[int, List[int]]) – Number of weeks to lag.
lag_columns (Union[str, List[str]]) – Columns to lag.

Returns:

Updated string representation of the pipeline steps.

Return type:

FantasyFeatures

add_moving_avg_feature(n_week_window: Union[int, List[int]], window_columns: Union[str, List[str]]) → FantasyFeatures[source]

Adds string representation of a moving average step to the pipeline.

Parameters:

n_week_window (Union[int, List[int]]) – Number of weeks to average across.
window_columns (Union[str, List[str]]) – Columns to average.

Returns:

Updated string representation of the pipeline steps.

Return type:

FantasyFeatures

add_target_encoded_feature(category_columns: Union[str, list]) → FantasyFeatures[source]

Adds string representation of a target encoded step to the pipeline.

Parameters:: category_columns (Union[str, list]) – Columns to target encode.
Returns:: Updated string representation of the pipeline steps.
Return type:: FantasyFeatures

consolidate_category_feature(category_columns: Union[str, list], threshold: float) → FantasyFeatures[source]

Adds string representation of a category consolidator step to the pipeline.

Parameters:

category_columns (Union[str, list]) – Columns to consolidate.
threshold (float) – Threshold for consolidating categories.

Returns:

Updated string representation of the pipeline steps.

Return type:

FantasyFeatures

_remove_missing_feature_values(feature_df: pandas.DataFrame) → pandas.DataFrame[source]

Removes rows that have missing values related to lag or salary columns.

When creating a lag, the first N weeks of data will be NA. This function removes those rows. Likewise, Draftkings and Fanduel do not publish salary data for the first week of each season. This function removes those rows if any fields from the salary data are included. If both salary and lag data are included, the maximum of the two will be used when removing rows.

Parameters:: feature_df (pd.DataFrame) – Dataframe to remove rows from.
Returns:: Dataframe with missing lag values or salary data removed.
Return type:: pd.DataFrame

_replace_missing_salary_values_with_zero(feature_df: pandas.DataFrame) → pandas.DataFrame[source]

Replaces missing salary values with zero.

Salary data is missing for the first week of each season. Also, when players are injured and questionable to play, a salary value may not be published.

Parameters:

feature_df (pd.DataFrame) – Dataframe to replace missing
zero. (salary values with) –

Returns:

Dataframe with missing salary values replaced with zero.

Return type:

pd.DataFrame

_forward_fill_future_week_cv(feature_df: pandas.DataFrame) → pandas.DataFrame[source]: Forward fills CV values for future weeks.

add_coefficient_of_variation(n_week_window: int) → FantasyFeatures[source]

Add coefficient of variation (cv) for each player based the trailing standard deviation and average of weekly fantasy points scored.

Parameters:

n_week_window (int) – Number of trailing weeks to
calculation (use for calculating the cv. Note that) –
seasons. (occurs across) –

Returns:

Dataframe with cv added as a column.

Return type:

FantasyFeatures

create_ff_signature() → dict[source]

Creates a fantasy football ‘signature’, which includes the following steps:

Executes the previously created pipeline data transformations

Removes missing values stemming from lagged features or salary features

Replaces missing salary values with zero

Returns:: The names of the new features created by the pipeline and the transformed dataframe.
Return type:: dict

fantasyfootball.features

Module Contents

Classes

Attributes

`fantasyfootball.features`