pyrplib package

Subpackages

Submodules

pyrplib.artificial module

pyrplib.artificial.addmossimple(D, start_index, end_index)

For a binary matrix D, create simple multiple optimal solutions in the range of teams specified. Indices are inclusive.

pyrplib.artificial.addnoise(D, percentnoise, low=0, high=1)

ADD NOISE

Function replaces random off diagonal elements in D with values from low to high

pyrplib.artificial.create_dataset(create_func, options)

Create a dataset using a create function and a function to generate the options used.

See example create_func and get_create_options_func

pyrplib.artificial.create_dataset_manual(D_matrices, options, create_code='manual')

Create a dataset by manually passing the D matrices as a list. The options are not used in any way. They are here if you want to include them.

pyrplib.artificial.cyclic(n)

Create a simple cycle D matrix of size n x n.

pyrplib.artificial.domfromranking(n, r, ngames, upset_func=<function <lambda>>)

DOM matrix from ranking

Simulates win/loss of individual games using the ranking vector (r) and the upset function. The upset function must take two rankings r1 and r2. r1 > r2. This function must return True/False depending on whether an upset occurred.

pyrplib.artificial.domplusnoise(n, percentnoise, low=0, high=1)

function creates a dominance graph and adds noise.

Input: n = number of rows/cols in D matrix
percentnoise = integer between 1 and n^2 representing the

percentage of noise to add to D domgraph, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise

Example: ‘D = domplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise

added to the dominance graph

pyrplib.artificial.emptyplusnoise(n, percentnoise, low=0, high=3)

EMPTY + NOISE

Function starts with an empty graph and adds some amount of noise.

Input: n = number of rows/cols in D matrix percentnoise = integer between 1 and n^2 representing the percentage of noise to add to D hillside, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise

Example: ‘D = emptyplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise added to the empty graph

pyrplib.artificial.example_create(options={'num_games': 1000, 'number_matrices': 10, 'number_of_rows_columns': 20, 'threshold': 3})

Example create function. These functions must return a dominance (D) matrix that is a pandas dataframe. Options is a dictionary. There is one required key/value which is the number_of_rows_columns. It may also have additional arguments.

pyrplib.artificial.example_create2(options={'num_games': 1000, 'number_matrices': 10, 'number_of_rows_columns': 20})

Example create function. These functions must return a dominance (D) matrix that is a pandas dataframe. Options is a dictionary. There is one required key/value which is the number_of_rows_columns. It may also have additional arguments.

pyrplib.artificial.example_create3(options)

Example create function. These functions must return a dominance (D) matrix that is a pandas dataframe. Options is a dictionary. There is one required key/value which is the number_of_rows_columns. It may also have additional arguments.

pyrplib.artificial.example_get_create_options()

Example set of options to be paired with example_create function.

pyrplib.artificial.example_get_create_options2()

Example set of options to be paired with example_create2 function.

pyrplib.artificial.example_get_create_options3()
pyrplib.artificial.hillsideplusnoise(n, percentnoise, low=1, high=5)

HILLSIDE + NOISE

Starts with a perfect hillside graph and then randomly perturbs the matrix at user specified percentage.

Input: n = number of rows/cols in D matrix
percentnoise = integer between 1 and n^2 representing the

percentage of noise to add to D hillside, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise

Example: ‘D = hillsideplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise

added to the hillside graph

CONVERT TO UNWEIGHTED

Function returns a modified version of D with percent of nonzero links removed

pyrplib.artificial.unweighted(D)

CONVERT TO UNWEIGHTED

Function returns an unweighted version of D

pyrplib.artificial.weakdomplusnoise(n, percentnoise, low=0, high=1)

function creates a weak dominance graph and adds noise.

Input: n = number of rows/cols in D matrix
percentnoise = integer between 1 and n^2 representing the

percentage of noise to add to D domgraph, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise

Example: ‘D = weakdomplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise

added to the dominance graph

pyrplib.base module

class pyrplib.base.DInfo

Bases: object

A class to represent information about a dominance (D) matrix.

property D
property D_type
property command
property dataset_id
static from_json(file)

Static method that reads a DInfo object from a JSON file.

Returns

Returns a DInfo object

Return type

DInfo

property source_dataset_id
to_json()

Returns a JSON string representing the object.

Returns

Returns a JSON string representing the object.

Return type

str

class pyrplib.base.HillsideCard

Bases: LOPCard

class pyrplib.base.LOPCard

Bases: object

A class that represents the analysis, results, and metrics associated with running LOP algorithm.

LOPCard can be saved as a JSON file that contains the following:

{
    "D": "<Dominance matrix and input to the LOP solver>",
    "obj": "<Optimal value of LOP>",
    "solutions": "<List of optimal orderings/permutations that result in an optimal value>",
    "max_tau_solutions": "<Two farthest orderings/permutations measured by Kendall tau (when available)>",
    "centroid_x": "<X*>",
    "outlier_solution": "<Optimal ordering/permutation that is farthest from centroid_x>",
    "dataset_id": "<Identifying ID>"
}
property D
add_solution(sol)

Adds a solution specified by a permutation/ordering.

Parameters

[sol] – [A permutation/ordering of type list or tuple]

property centroid_solution
property centroid_x
property dataset_id
static from_json(file)

Static method that reads a LOPCard object from a JSON file.

Returns

Returns a LOPCard object

Return type

LOPCard

property obj
property outlier_solution
property solutions
property source_dataset_id
to_json(file)

Returns a JSON string representing the object.

Returns

Returns a JSON string representing the object.

Return type

str

class pyrplib.base.MatricesInfo

Bases: object

A class to represent information about matrices M and b. i.e., MX=b

property b
property command
property dataset_id
static from_json(file)

Static method that reads a MatricesInfo object from a JSON file.

Returns

Returns a MatricesInfo object

Return type

MatricesInfo

property matrix
property source_dataset_id
to_json()

Returns a JSON string representing the object.

Returns

Returns a JSON string representing the object.

Return type

str

pyrplib.card module

class pyrplib.card.Card

Bases: ABC

The base Card abstract class.

property dataset_id
static get_contents(file)

Static method that reads a Card from a JSON file.

Parameters

[file] ([str]) – [file path or URL path to JSON file]

Returns

Returns a Pandas Series object

Return type

pandas.Series

load(dataset_id, options)

Load a Card using the dataset_id and the options.

Parameters
  • [dataset_id] – [Dataset ID]

  • [options] ([dict]) – [Dictionary of options]

property options
abstract prepare(processed_dataset)
abstract run()
property source_dataset_id
to_json()

Returns a JSON string representing the object.

Returns

Returns a JSON string representing the object.

Return type

str

abstract view()
class pyrplib.card.Hillside

Bases: LOP

A class that represents the analysis, results, and metrics associated with running Hillside algorithm.

Hillside finds the optimal solution in Hillside form:

Chartier, Timothy P., et al. “Minimum violations sports ranking using evolutionary optimization and binary integer linear program approaches.” Proceedings of the Tenth Australian Conference on Mathematics and Computers in Sport, A. Bedford and M. Ovens, eds., MathSport (ANZIAM), New South Wales, Australia. 2010.

Hillside and LOP share the same metrics and analysis.

class pyrplib.card.LOP

Bases: Card

A class that represents the analysis, results, and metrics associated with running LOP algorithm.

LOP card can be saved as a JSON file that contains the following:

{
    "dataset_id": "<Identifying Dataset ID>"
    "source_dataset_id": "<Identifying Source Dataset ID>"
    "D": "<Dominance matrix and input to the LOP solver>",
    "obj": "<Optimal value of LOP>",
    "solutions": "<List of optimal orderings/permutations that result in an optimal value>",
    "farthest_pair": "<Two farthest orderings/permutations measured by Kendall tau (when available)>",
    "tau_farthest_pair": "<Associated Kendall tau value (when available)>",
    "closest_pair": "<Two (not identical) closest orderings/permutations measured by Kendall tau (when available)>",
    "tau_closest_pair": "<Associated Kendall tau value (when available)>",
    "centroid_x": "<X*>",
    "outlier_solution": "<Optimal ordering/permutation that is farthest from centroid_x>",
    "method": "<Method which is LOP or Hillside>"
}
property D
add_solution(sol)

Adds a solution specified by a permutation/ordering.

Parameters

[sol] – [A permutation/ordering of type list or tuple]

property beta
property centroid_solution
property centroid_x
property closest_pair
property farthest_pair
static from_json(file_link)

Static method that reads a LOP card object from a JSON file.

Returns

Returns a LOP card object

Return type

LOP

get_visuals()

Returns a diciontary with both dash and notebook ready visualization.

Returns

Dash and notebook visuals

Return type

dict

property method
property obj
property outlier_solution
prepare(processed_dataset)

Prepare the data for analysis. For LOP this means filling in missing values in the dominance matrix and removing rows and columns with all 0’s.

Parameters

[processed_dataset] ([dataset.Processed]) – [Processed dataset object]

Returns

self

Return type

LOP

property r

Returns a rating vector using X*.

Returns

Rating vector derived from X*

Return type

pandas.Series

run()

Run the LOP analysis and compute the metrics.

property solutions
property tau_closest_pair
property tau_farthest_pair
view()

Returns a dictionary in dash ready format.

Returns

List of HTML dash ready objects

Return type

list

property xstar

Return X* as a dataframe using the row and column names of D.

property xstar_r_r

Return X* optimally reordered.

class pyrplib.card.SystemOfEquations(method)

Bases: Card

A class that represents the analysis, results, and metrics associated with solving a system of equations to produce a ranking.

SystemOfEquations card can be saved as a JSON file that contains the following:

{
    "dataset_id": "<Identifying Dataset ID>"
    "source_dataset_id": "<Identifying Source Dataset ID>"
    "M": "<Matrix from Mx=b>",
    "b": "<Vector from Mx=b>",
    "r": "<Rating vector>",
    "ranking": "<Ranking vector>",
    "perm": "<Ordering/permutation>",
    "options": "<dictionary of options>",
    "games": "<Games (or more generally matchups) that are processed to produce M and b>",
    "teams": "<List of teams (or more generally items)>",
    "method": "<Method which is Massey or Colley>"
}
property M
property b
static from_json(file_link)

Static method that reads a SystemOfEquations card object from a JSON file.

Returns

Returns a SystemOfEquations card object

Return type

SystemOfEquations

property games
property method
property perm
prepare(processed_dataset)

Prepare the data for analysis.

Parameters

[processed_dataset] ([dataset.Processed]) – [Processed dataset object]

Returns

self

Return type

SystemOfEquations

property r
property ranking
run()

Solve the system of equations and store the results.

property teams
view()

Returns a dictionary in dash ready format.

Returns

List of HTML dash ready objects

Return type

list

pyrplib.data module

class pyrplib.data.Data(DATA_PREFIX)

Bases: object

A class that facilitates accessing the datasets for RPLIB.

This class reads the following TSV files:
  • {DATA_PREFIX}/unprocessed_datasets.tsv
    • Columns: Dataset ID, Dataset Name, Description, Type, Loader, Download links

    • Dataset ID - persistant unique ID for each dataset

    • Dataset Name - Short human readable name for the dataset

    • Description - Longer human readable description of the dataset

    • Type - Games|D matrix|Features|Structured Artificial

    • Loader - Class that is used to load the dataset (e.g., marchmadness.base.Unprocessed)

    • Download links - String of comma separated file links

  • {DATA_PREFIX}/processed_datasets.tsv
    • Columns: Dataset ID, Source Dataset ID, Index, Command, Type, Collection, Options, Last Processed Datetime, Identifier

    • Dataset ID - persistant unique ID for each processed dataset

    • Source Dataset ID - source dataset ID

    • Index - Index pointing into the source dataset to extract the specific value

    • Command - Python functional code statement describing how to process the data. May assume the following variables: data and index.

    • Type - resulting type of dataset (D|Games)

    • Collection - Name of collection for organization in the data directory

    • Options - JSON string of optional options

    • Last Processed Datetime - Last time this dataset was processed/updated

    • Identifier - Optional identifying string for the dataset

  • {DATA_PREFIX}/lop_cards.tsv
    • Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime

    • Dataset ID - persistant unique ID for each card

    • Processed Dataset ID - processed dataset ID used as input

    • Options - JSON string of optional options

    • Last Processed Datetime - Last time this dataset was processed/updated

  • {DATA_PREFIX}/hillside_cards.tsv
    • Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime

    • Dataset ID - persistant unique ID for each card

    • Processed Dataset ID - processed dataset ID used as input

    • Options - JSON string of optional options

    • Last Processed Datetime - Last time this dataset was processed/updated

  • {DATA_PREFIX}/massey_cards.tsv
    • Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime

    • Dataset ID - persistant unique ID for each card

    • Processed Dataset ID - processed dataset ID used as input

    • Options - JSON string of optional options

    • Last Processed Datetime - Last time this dataset was processed/updated

  • {DATA_PREFIX}/colley_cards.tsv
    • Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime

    • Dataset ID - persistant unique ID for each card

    • Processed Dataset ID - processed dataset ID used as input

    • Options - JSON string of optional options

    • Last Processed Datetime - Last time this dataset was processed/updated

load_card(dataset_id, card_type)
load_processed(dataset_id)
load_unprocessed(dataset_id)
save_colley_datasets()
save_hillside_datasets()
save_lop_datasets()
save_massey_datasets()
save_processed_datasets()

pyrplib.dataset module

class pyrplib.dataset.Processed

Bases: Unprocessed

Processed dataset labeled with a persistant and unique dataset_id

property command
dash_ready_data()

Returns dash ready data

property data

Returns a dataframe

property dataset_id
abstract static from_json(file)
property short_type
abstract size_str()
property source_dataset_id
to_json()
property type

Return the high level type of an element in data() as a string

class pyrplib.dataset.ProcessedD

Bases: Processed

Processed dominance (D) dataset object

static from_json(file)

Loads a ProcessedD file from a JSON file.

Parameters

[file] – [Path to local or http JSON file]

Returns

Returns a ProcessedD object.

Return type

ProcessedD

load(options={})

Load a processed dominance (D) dataset with options

size_str()

Size of dataset as a string

class pyrplib.dataset.ProcessedGames

Bases: Processed

Processed games dataset object

static from_json(file)

Loads a ProcessedGames file from a JSON file.

Parameters

[file] – [Path to local or http JSON file]

Returns

Returns a ProcessedGames object.

Return type

ProcessedGames

load(options={})

Load a processed games dataset with options

size_str()

Size of dataset as a string

class pyrplib.dataset.Unprocessed(dataset_id, links)

Bases: ABC

Unprocessed dataset labeled with a persistant and unique dataset_id

abstract dash_ready_data()

Returns dash ready data

data()

Returns a dataframe

abstract load(options={})

Code that loads the data from the links

abstract type()

Return the high level type of an element in data() as a string

view()

Standard view function for a dataset

view_item(index)

Standard view function for an item from a dataset

class pyrplib.dataset.UnprocessedType(value)

Bases: Enum

An enumeration.

D = 0
Features = 2
Games = 1
pyrplib.dataset.load_unprocessed(unprocessed_source_id, datasets_df)

Helper function to load unprocessed dataset.

Parameters
  • [unprocessed_source_id] – [Unprocessed dataset ID]

  • [datasets_df] – [Dataframe of datasets read from data.Data(DATA_PREFIX)]

Returns

Unprocessed dataset

Return type

dataset.Unprocessed

pyrplib.style module

pyrplib.style.get_standard_data_table(df, id)

Returns a dash data table with standard configuration.

pyrplib.style.get_standard_download_all_button(button_id, download_id, progress_id=None, collapse_id=None)

Return a standard download button.

pyrplib.style.view_item(item, id)

Helper function to view a single item.

pyrplib.transformers module

class pyrplib.transformers.ColumnCountTransformer(columns)

Bases: BaseEstimator, TransformerMixin

A class to convert a feature matrix to a dominance matrix in the standard sklearn transformer paradigm.

fit(X, y=None)
transform(X, y=None)
class pyrplib.transformers.ComputeDTransformer(direct_thres=0, spread_thres=0, team_range=None)

Bases: BaseEstimator, TransformerMixin

A class to convert games to a dominance matrix in the standard sklearn transformer paradigm.

fit(X, y=None)
transform(X, y=None)
pyrplib.transformers.count(games, teams)

Returns a processed direct matchup dominance matrix, processed indirect matchup dominance matrix, and the transformer used.

Parameters
  • [games] ([pandas.DataFrame]) – [DataFrame of games (matchups between items)]

  • [teams] ([list]) – [list of teams/items]

Returns

Tuple of processed D from direct matchups, processed D from indirect matchups, and the transformer

Return type

tuple

pyrplib.transformers.direct(D, ID, trans)

Returns the direct matchup (D) matrix from the arguments.

pyrplib.transformers.directplusindirect(D, ID, trans, indirect_weight=1.0)

Returns a processed D object that is a combination of D and ID using the indirect weight.

Returns

Processed dominance matrix that is a weighted combination of D and ID

Return type

processed_D

pyrplib.transformers.features_to_D(df_features, options={})

Convert a features matrix to a dominance matrix.

options[“columns”] = list of columns you would like to convert options[“items”] = list of items you would like to use. Items must be in the index

pyrplib.transformers.indirect(D, ID, trans)

Returns the indirect matchup (ID) matrix from the arguments.

pyrplib.transformers.process_D(D)

Returns a processed D object from a dominance matrix (pandas.DataFrame).

pyrplib.transformers.standardize_games_teams(games, teams, options={})

Returns a standardized version of games and teams with the expected column names as a ProcessedGames object.

options[“team1_name”] = column in your dataframe that has team 1 names options[“team2_name”] = column in your dataframe that has team 2 names options[“team1_score”] = column in your dataframe that has team 1 score options[“team2_score”] = column in your dataframe that has team 2 score options[“team1_H_A_N”] = column in your dataframe to specifies home = 1, away = -1, or neutral = 0 for team 1 options[“team2_H_A_N”] = column in your dataframe to specifies home = 1, away = -1, or neutral = 0 for team 2

Module contents