pyrplib package¶
Subpackages¶
Submodules¶
pyrplib.artificial module¶
- pyrplib.artificial.addmossimple(D, start_index, end_index)¶
For a binary matrix D, create simple multiple optimal solutions in the range of teams specified. Indices are inclusive.
- pyrplib.artificial.addnoise(D, percentnoise, low=0, high=1)¶
ADD NOISE
Function replaces random off diagonal elements in D with values from low to high
- pyrplib.artificial.create_dataset(create_func, options)¶
Create a dataset using a create function and a function to generate the options used.
See example create_func and get_create_options_func
- pyrplib.artificial.create_dataset_manual(D_matrices, options, create_code='manual')¶
Create a dataset by manually passing the D matrices as a list. The options are not used in any way. They are here if you want to include them.
- pyrplib.artificial.cyclic(n)¶
Create a simple cycle D matrix of size n x n.
- pyrplib.artificial.domfromranking(n, r, ngames, upset_func=<function <lambda>>)¶
DOM matrix from ranking
Simulates win/loss of individual games using the ranking vector (r) and the upset function. The upset function must take two rankings r1 and r2. r1 > r2. This function must return True/False depending on whether an upset occurred.
- pyrplib.artificial.domplusnoise(n, percentnoise, low=0, high=1)¶
function creates a dominance graph and adds noise.
- Input: n = number of rows/cols in D matrix
- percentnoise = integer between 1 and n^2 representing the
percentage of noise to add to D domgraph, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise
- Example: ‘D = domplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise
added to the dominance graph
- pyrplib.artificial.emptyplusnoise(n, percentnoise, low=0, high=3)¶
EMPTY + NOISE
Function starts with an empty graph and adds some amount of noise.
Input: n = number of rows/cols in D matrix percentnoise = integer between 1 and n^2 representing the percentage of noise to add to D hillside, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise
Example: ‘D = emptyplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise added to the empty graph
- pyrplib.artificial.example_create(options={'num_games': 1000, 'number_matrices': 10, 'number_of_rows_columns': 20, 'threshold': 3})¶
Example create function. These functions must return a dominance (D) matrix that is a pandas dataframe. Options is a dictionary. There is one required key/value which is the number_of_rows_columns. It may also have additional arguments.
- pyrplib.artificial.example_create2(options={'num_games': 1000, 'number_matrices': 10, 'number_of_rows_columns': 20})¶
Example create function. These functions must return a dominance (D) matrix that is a pandas dataframe. Options is a dictionary. There is one required key/value which is the number_of_rows_columns. It may also have additional arguments.
- pyrplib.artificial.example_create3(options)¶
Example create function. These functions must return a dominance (D) matrix that is a pandas dataframe. Options is a dictionary. There is one required key/value which is the number_of_rows_columns. It may also have additional arguments.
- pyrplib.artificial.example_get_create_options()¶
Example set of options to be paired with example_create function.
- pyrplib.artificial.example_get_create_options2()¶
Example set of options to be paired with example_create2 function.
- pyrplib.artificial.example_get_create_options3()¶
- pyrplib.artificial.hillsideplusnoise(n, percentnoise, low=1, high=5)¶
HILLSIDE + NOISE
Starts with a perfect hillside graph and then randomly perturbs the matrix at user specified percentage.
- Input: n = number of rows/cols in D matrix
- percentnoise = integer between 1 and n^2 representing the
percentage of noise to add to D hillside, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise
- Example: ‘D = hillsideplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise
added to the hillside graph
- pyrplib.artificial.removelinks(D, percent)¶
CONVERT TO UNWEIGHTED
Function returns a modified version of D with percent of nonzero links removed
- pyrplib.artificial.unweighted(D)¶
CONVERT TO UNWEIGHTED
Function returns an unweighted version of D
- pyrplib.artificial.weakdomplusnoise(n, percentnoise, low=0, high=1)¶
function creates a weak dominance graph and adds noise.
- Input: n = number of rows/cols in D matrix
- percentnoise = integer between 1 and n^2 representing the
percentage of noise to add to D domgraph, e.g., if percentnoise = 10, then 10% of the n^2 elements will be noise
- Example: ‘D = weakdomplusnoise(6,20)’ creates a 6 by 6 matrix with 20% noise
added to the dominance graph
pyrplib.base module¶
- class pyrplib.base.DInfo¶
Bases:
object
A class to represent information about a dominance (D) matrix.
- property D¶
- property D_type¶
- property command¶
- property dataset_id¶
- static from_json(file)¶
Static method that reads a DInfo object from a JSON file.
- Returns
Returns a DInfo object
- Return type
- property source_dataset_id¶
- to_json()¶
Returns a JSON string representing the object.
- Returns
Returns a JSON string representing the object.
- Return type
str
- class pyrplib.base.LOPCard¶
Bases:
object
A class that represents the analysis, results, and metrics associated with running LOP algorithm.
LOPCard can be saved as a JSON file that contains the following:
{ "D": "<Dominance matrix and input to the LOP solver>", "obj": "<Optimal value of LOP>", "solutions": "<List of optimal orderings/permutations that result in an optimal value>", "max_tau_solutions": "<Two farthest orderings/permutations measured by Kendall tau (when available)>", "centroid_x": "<X*>", "outlier_solution": "<Optimal ordering/permutation that is farthest from centroid_x>", "dataset_id": "<Identifying ID>" }
- property D¶
- add_solution(sol)¶
Adds a solution specified by a permutation/ordering.
- Parameters
[sol] – [A permutation/ordering of type list or tuple]
- property centroid_solution¶
- property centroid_x¶
- property dataset_id¶
- static from_json(file)¶
Static method that reads a LOPCard object from a JSON file.
- Returns
Returns a LOPCard object
- Return type
- property obj¶
- property outlier_solution¶
- property solutions¶
- property source_dataset_id¶
- to_json(file)¶
Returns a JSON string representing the object.
- Returns
Returns a JSON string representing the object.
- Return type
str
- class pyrplib.base.MatricesInfo¶
Bases:
object
A class to represent information about matrices M and b. i.e., MX=b
- property b¶
- property command¶
- property dataset_id¶
- static from_json(file)¶
Static method that reads a MatricesInfo object from a JSON file.
- Returns
Returns a MatricesInfo object
- Return type
- property matrix¶
- property source_dataset_id¶
- to_json()¶
Returns a JSON string representing the object.
- Returns
Returns a JSON string representing the object.
- Return type
str
pyrplib.card module¶
- class pyrplib.card.Card¶
Bases:
ABC
The base Card abstract class.
- property dataset_id¶
- static get_contents(file)¶
Static method that reads a Card from a JSON file.
- Parameters
[file] ([str]) – [file path or URL path to JSON file]
- Returns
Returns a Pandas Series object
- Return type
pandas.Series
- load(dataset_id, options)¶
Load a Card using the dataset_id and the options.
- Parameters
[dataset_id] – [Dataset ID]
[options] ([dict]) – [Dictionary of options]
- property options¶
- abstract prepare(processed_dataset)¶
- abstract run()¶
- property source_dataset_id¶
- to_json()¶
Returns a JSON string representing the object.
- Returns
Returns a JSON string representing the object.
- Return type
str
- abstract view()¶
- class pyrplib.card.Hillside¶
Bases:
LOP
A class that represents the analysis, results, and metrics associated with running Hillside algorithm.
Hillside finds the optimal solution in Hillside form:
Chartier, Timothy P., et al. “Minimum violations sports ranking using evolutionary optimization and binary integer linear program approaches.” Proceedings of the Tenth Australian Conference on Mathematics and Computers in Sport, A. Bedford and M. Ovens, eds., MathSport (ANZIAM), New South Wales, Australia. 2010.
Hillside and LOP share the same metrics and analysis.
- class pyrplib.card.LOP¶
Bases:
Card
A class that represents the analysis, results, and metrics associated with running LOP algorithm.
LOP card can be saved as a JSON file that contains the following:
{ "dataset_id": "<Identifying Dataset ID>" "source_dataset_id": "<Identifying Source Dataset ID>" "D": "<Dominance matrix and input to the LOP solver>", "obj": "<Optimal value of LOP>", "solutions": "<List of optimal orderings/permutations that result in an optimal value>", "farthest_pair": "<Two farthest orderings/permutations measured by Kendall tau (when available)>", "tau_farthest_pair": "<Associated Kendall tau value (when available)>", "closest_pair": "<Two (not identical) closest orderings/permutations measured by Kendall tau (when available)>", "tau_closest_pair": "<Associated Kendall tau value (when available)>", "centroid_x": "<X*>", "outlier_solution": "<Optimal ordering/permutation that is farthest from centroid_x>", "method": "<Method which is LOP or Hillside>" }
- property D¶
- add_solution(sol)¶
Adds a solution specified by a permutation/ordering.
- Parameters
[sol] – [A permutation/ordering of type list or tuple]
- property beta¶
- property centroid_solution¶
- property centroid_x¶
- property closest_pair¶
- property farthest_pair¶
- static from_json(file_link)¶
Static method that reads a LOP card object from a JSON file.
- Returns
Returns a LOP card object
- Return type
- get_visuals()¶
Returns a diciontary with both dash and notebook ready visualization.
- Returns
Dash and notebook visuals
- Return type
dict
- property method¶
- property obj¶
- property outlier_solution¶
- prepare(processed_dataset)¶
Prepare the data for analysis. For LOP this means filling in missing values in the dominance matrix and removing rows and columns with all 0’s.
- Parameters
[processed_dataset] ([dataset.Processed]) – [Processed dataset object]
- Returns
self
- Return type
- property r¶
Returns a rating vector using X*.
- Returns
Rating vector derived from X*
- Return type
pandas.Series
- run()¶
Run the LOP analysis and compute the metrics.
- property solutions¶
- property tau_closest_pair¶
- property tau_farthest_pair¶
- view()¶
Returns a dictionary in dash ready format.
- Returns
List of HTML dash ready objects
- Return type
list
- property xstar¶
Return X* as a dataframe using the row and column names of D.
- property xstar_r_r¶
Return X* optimally reordered.
- class pyrplib.card.SystemOfEquations(method)¶
Bases:
Card
A class that represents the analysis, results, and metrics associated with solving a system of equations to produce a ranking.
SystemOfEquations card can be saved as a JSON file that contains the following:
{ "dataset_id": "<Identifying Dataset ID>" "source_dataset_id": "<Identifying Source Dataset ID>" "M": "<Matrix from Mx=b>", "b": "<Vector from Mx=b>", "r": "<Rating vector>", "ranking": "<Ranking vector>", "perm": "<Ordering/permutation>", "options": "<dictionary of options>", "games": "<Games (or more generally matchups) that are processed to produce M and b>", "teams": "<List of teams (or more generally items)>", "method": "<Method which is Massey or Colley>" }
- property M¶
- property b¶
- static from_json(file_link)¶
Static method that reads a SystemOfEquations card object from a JSON file.
- Returns
Returns a SystemOfEquations card object
- Return type
- property games¶
- property method¶
- property perm¶
- prepare(processed_dataset)¶
Prepare the data for analysis.
- Parameters
[processed_dataset] ([dataset.Processed]) – [Processed dataset object]
- Returns
self
- Return type
- property r¶
- property ranking¶
- run()¶
Solve the system of equations and store the results.
- property teams¶
- view()¶
Returns a dictionary in dash ready format.
- Returns
List of HTML dash ready objects
- Return type
list
pyrplib.data module¶
- class pyrplib.data.Data(DATA_PREFIX)¶
Bases:
object
A class that facilitates accessing the datasets for RPLIB.
- This class reads the following TSV files:
- {DATA_PREFIX}/unprocessed_datasets.tsv
Columns: Dataset ID, Dataset Name, Description, Type, Loader, Download links
Dataset ID - persistant unique ID for each dataset
Dataset Name - Short human readable name for the dataset
Description - Longer human readable description of the dataset
Type - Games|D matrix|Features|Structured Artificial
Loader - Class that is used to load the dataset (e.g., marchmadness.base.Unprocessed)
Download links - String of comma separated file links
- {DATA_PREFIX}/processed_datasets.tsv
Columns: Dataset ID, Source Dataset ID, Index, Command, Type, Collection, Options, Last Processed Datetime, Identifier
Dataset ID - persistant unique ID for each processed dataset
Source Dataset ID - source dataset ID
Index - Index pointing into the source dataset to extract the specific value
Command - Python functional code statement describing how to process the data. May assume the following variables: data and index.
Type - resulting type of dataset (D|Games)
Collection - Name of collection for organization in the data directory
Options - JSON string of optional options
Last Processed Datetime - Last time this dataset was processed/updated
Identifier - Optional identifying string for the dataset
- {DATA_PREFIX}/lop_cards.tsv
Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime
Dataset ID - persistant unique ID for each card
Processed Dataset ID - processed dataset ID used as input
Options - JSON string of optional options
Last Processed Datetime - Last time this dataset was processed/updated
- {DATA_PREFIX}/hillside_cards.tsv
Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime
Dataset ID - persistant unique ID for each card
Processed Dataset ID - processed dataset ID used as input
Options - JSON string of optional options
Last Processed Datetime - Last time this dataset was processed/updated
- {DATA_PREFIX}/massey_cards.tsv
Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime
Dataset ID - persistant unique ID for each card
Processed Dataset ID - processed dataset ID used as input
Options - JSON string of optional options
Last Processed Datetime - Last time this dataset was processed/updated
- {DATA_PREFIX}/colley_cards.tsv
Columns: Dataset ID, Processed Dataset ID, Options, Last Processed Datetime
Dataset ID - persistant unique ID for each card
Processed Dataset ID - processed dataset ID used as input
Options - JSON string of optional options
Last Processed Datetime - Last time this dataset was processed/updated
- load_card(dataset_id, card_type)¶
- load_processed(dataset_id)¶
- load_unprocessed(dataset_id)¶
- save_colley_datasets()¶
- save_hillside_datasets()¶
- save_lop_datasets()¶
- save_massey_datasets()¶
- save_processed_datasets()¶
pyrplib.dataset module¶
- class pyrplib.dataset.Processed¶
Bases:
Unprocessed
Processed dataset labeled with a persistant and unique dataset_id
- property command¶
- dash_ready_data()¶
Returns dash ready data
- property data¶
Returns a dataframe
- property dataset_id¶
- abstract static from_json(file)¶
- property short_type¶
- abstract size_str()¶
- property source_dataset_id¶
- to_json()¶
- property type¶
Return the high level type of an element in data() as a string
- class pyrplib.dataset.ProcessedD¶
Bases:
Processed
Processed dominance (D) dataset object
- static from_json(file)¶
Loads a ProcessedD file from a JSON file.
- Parameters
[file] – [Path to local or http JSON file]
- Returns
Returns a ProcessedD object.
- Return type
- load(options={})¶
Load a processed dominance (D) dataset with options
- size_str()¶
Size of dataset as a string
- class pyrplib.dataset.ProcessedGames¶
Bases:
Processed
Processed games dataset object
- static from_json(file)¶
Loads a ProcessedGames file from a JSON file.
- Parameters
[file] – [Path to local or http JSON file]
- Returns
Returns a ProcessedGames object.
- Return type
- load(options={})¶
Load a processed games dataset with options
- size_str()¶
Size of dataset as a string
- class pyrplib.dataset.Unprocessed(dataset_id, links)¶
Bases:
ABC
Unprocessed dataset labeled with a persistant and unique dataset_id
- abstract dash_ready_data()¶
Returns dash ready data
- data()¶
Returns a dataframe
- abstract load(options={})¶
Code that loads the data from the links
- abstract type()¶
Return the high level type of an element in data() as a string
- view()¶
Standard view function for a dataset
- view_item(index)¶
Standard view function for an item from a dataset
- class pyrplib.dataset.UnprocessedType(value)¶
Bases:
Enum
An enumeration.
- D = 0¶
- Features = 2¶
- Games = 1¶
- pyrplib.dataset.load_unprocessed(unprocessed_source_id, datasets_df)¶
Helper function to load unprocessed dataset.
- Parameters
[unprocessed_source_id] – [Unprocessed dataset ID]
[datasets_df] – [Dataframe of datasets read from data.Data(DATA_PREFIX)]
- Returns
Unprocessed dataset
- Return type
pyrplib.style module¶
- pyrplib.style.get_standard_data_table(df, id)¶
Returns a dash data table with standard configuration.
- pyrplib.style.get_standard_download_all_button(button_id, download_id, progress_id=None, collapse_id=None)¶
Return a standard download button.
- pyrplib.style.view_item(item, id)¶
Helper function to view a single item.
pyrplib.transformers module¶
- class pyrplib.transformers.ColumnCountTransformer(columns)¶
Bases:
BaseEstimator
,TransformerMixin
A class to convert a feature matrix to a dominance matrix in the standard sklearn transformer paradigm.
- fit(X, y=None)¶
- transform(X, y=None)¶
- class pyrplib.transformers.ComputeDTransformer(direct_thres=0, spread_thres=0, team_range=None)¶
Bases:
BaseEstimator
,TransformerMixin
A class to convert games to a dominance matrix in the standard sklearn transformer paradigm.
- fit(X, y=None)¶
- transform(X, y=None)¶
- pyrplib.transformers.count(games, teams)¶
Returns a processed direct matchup dominance matrix, processed indirect matchup dominance matrix, and the transformer used.
- Parameters
[games] ([pandas.DataFrame]) – [DataFrame of games (matchups between items)]
[teams] ([list]) – [list of teams/items]
- Returns
Tuple of processed D from direct matchups, processed D from indirect matchups, and the transformer
- Return type
tuple
- pyrplib.transformers.direct(D, ID, trans)¶
Returns the direct matchup (D) matrix from the arguments.
- pyrplib.transformers.directplusindirect(D, ID, trans, indirect_weight=1.0)¶
Returns a processed D object that is a combination of D and ID using the indirect weight.
- Returns
Processed dominance matrix that is a weighted combination of D and ID
- Return type
processed_D
- pyrplib.transformers.features_to_D(df_features, options={})¶
Convert a features matrix to a dominance matrix.
options[“columns”] = list of columns you would like to convert options[“items”] = list of items you would like to use. Items must be in the index
- pyrplib.transformers.indirect(D, ID, trans)¶
Returns the indirect matchup (ID) matrix from the arguments.
- pyrplib.transformers.process_D(D)¶
Returns a processed D object from a dominance matrix (pandas.DataFrame).
- pyrplib.transformers.standardize_games_teams(games, teams, options={})¶
Returns a standardized version of games and teams with the expected column names as a ProcessedGames object.
options[“team1_name”] = column in your dataframe that has team 1 names options[“team2_name”] = column in your dataframe that has team 2 names options[“team1_score”] = column in your dataframe that has team 1 score options[“team2_score”] = column in your dataframe that has team 2 score options[“team1_H_A_N”] = column in your dataframe to specifies home = 1, away = -1, or neutral = 0 for team 1 options[“team2_H_A_N”] = column in your dataframe to specifies home = 1, away = -1, or neutral = 0 for team 2