ultra.utils package¶

Submodules¶

ultra.utils.data_utils module¶

class ultra.utils.data_utils.Raw_data(data_path=None, file_prefix=None, rank_cut=None)¶

Bases: object

__init__(data_path=None, file_prefix=None, rank_cut=None)¶

Initialize a dataset

Parameters

data_path – (string) the root directory of the experimental dataset.
file_prefix – (string) the prefix of the data to process, e.g. ‘train’, ‘valid’, or ‘test’.
rank_cut – (int) the maximum number of top documents considered in each list.

Returns

None

load_basic_data_information(data_path=None, file_prefix=None, rank_cut=None)¶

Load basic dataset information from data_path including:: feature_size: the number of features for each query-document pair. removed_feature_ids: the idxs of the features to ignore.

Parameters: data_path – (string) the root directory of the experimental dataset.
Returns: None

load_data_in_ULTRA_format(data_path=None, file_prefix=None, rank_cut=None)¶

Read dataset in ULTRA format including:: rank_list_size: the maximum number of documents for a query in the data. features: the feature vectors of each query-document pair. dids: the doc ids for each query-document pair. initial_list: the initial ranking list for each query qids: the query ids for each query. labels: the relevance label for each query-document pair in the initial_list. initial_scores: (if exists) the initial ranking scores in the initial list for each query-document pair. initial_list_lengths: the length of the initial list for each query.

Parameters

data_path – (string) the root directory of the experimental dataset.
file_prefix – (string) the prefix of the data to process, e.g. ‘train’, ‘valid’, or ‘test’.
rank_cut – (int) the maximum number of top documents considered in each list.

Returns

None

load_data_in_libsvm_format(data_path=None, file_prefix=None, rank_cut=None)¶

Read dataset in libsvm format including:: rank_list_size: the maximum number of documents for a query in the data. features: the feature vectors of each query-document pair. dids: the doc ids for each query-document pair (created by this program). initial_list: the initial ranking list for each query (created according to data sequence in the libsvm file) qids: the query ids for each query (created by this program). labels: the relevance label for each query-document pair in the initial_list. initial_list_lengths: the number of candidate documents for each query.

Parameters

data_path – (string) the root directory of the experimental dataset.
file_prefix – (string) the prefix of the data to process, e.g. ‘train’, ‘valid’, or ‘test’.
rank_cut – (int) the maximum number of top documents considered in each list.

Returns

None

pad(rank_list_size, pad_tails=True)¶

Pad a rank list with zero feature vectors when it is shorter than the required rank list size.

Parameters

rank_list_size – (int) the required size of a ranked list
pad_tails – (bool) Add padding vectors to the tails of each list (True) or the heads of each list (False)

Returns

None

remove_invalid_data()¶

Remove query lists with no relevant items or less than 2 items

self.feature_size = -1 self.rank_list_size = -1 self.removed_feature_ids = [] self.features = [] self.dids = [] self.initial_list = [] self.qids = [] self.labels = [] self.initial_scores = [] self.initial_list_lengths = []

Returns: None

ultra.utils.data_utils.generate_ranklist(data, rerank_lists)¶

Create a reranked lists based on the data and rerank documents ids.

Parameters

data – (Raw_data) the dataset that contains the raw data
rerank_lists – (list<list<int>>) a list of rerank list in which each element represents the original rank of the documents in the initial list.

Returns

(map<list<int>>) a map of qid with the reranked document id list.

Return type

qid_list_map

ultra.utils.data_utils.generate_ranklist_by_scores(data, rerank_scores)¶

Create a reranked lists based on the data and rerank scores.

Parameters

data – (Raw_data) the dataset that contains the raw data
rerank_scores – (list<list<float>>) a list of rerank list in which each element represents the reranking scores for the documents on that position in the initial list.

Returns

(map<list<int>>) a map of qid with the reranked document id list.

Return type

qid_list_map

ultra.utils.data_utils.merge_TFSummary(summary_list, weights)¶

ultra.utils.data_utils.output_ranklist(data, rerank_scores, output_path, file_name='test')¶

Create a trec format rank list by reranking the initial list with reranking scores.

Parameters

data – (Raw_data) the dataset that contains the raw data
rerank_scores – (list<list<float>>) a list of rerank list in which each element represents the reranking scores for the documents on that position in the initial list.
output_path – (string) the path for the output
file_name – (string) the name of the output set, e.g., ‘train’, ‘valid’, ‘text’.

Returns

None

ultra.utils.data_utils.parse_TFSummary_from_bytes(summary_bytes)¶

ultra.utils.data_utils.read_data(data_path, file_prefix, rank_cut=None)¶

ultra.utils.hparams module¶

Hyperparameter values.

class ultra.utils.hparams.HParams(hparam_def=None, model_structure=None, **kwargs)¶

Bases: object

Class to hold a set of hyperparameters as name-value pairs. A HParams object holds hyperparameters used to build and train a model, such as the number of hidden units in a neural net layer or the learning rate to use when training. You first create a HParams object by specifying the names and values of the hyperparameters. To make them easily accessible the parameter names are added as direct attributes of the class. A typical usage is as follows: `python # Create a HParams object specifying names and values of the model # hyperparameters: hparams = HParams(learning_rate=0.1, num_hidden_units=100) # The hyperparameter are available as attributes of the HParams object: hparams.learning_rate ==> 0.1 hparams.num_hidden_units ==> 100 ` Hyperparameters have type, which is inferred from the type of their value passed at construction type. The currently supported types are: integer, float, string, and list of integer, float, or string. You can override hyperparameter values by calling the [parse()](#HParams.parse) method, passing a string of comma separated name=value pairs. This is intended to make it possible to override any hyperparameter values from a single command-line flag to which the user passes ‘hyper-param=value’ pairs. It avoids having to define one flag for each hyperparameter. The syntax expected for each value depends on the type of the parameter. See parse() for a description of the syntax. Example: ```python # Define a command line flag to pass name=value pairs. # For example using argparse: import argparse parser = argparse.ArgumentParser(description=’Train my model.’) parser.add_argument(‘–hparams’, type=str,

help=’Comma separated list of “name=value” pairs.’)

args = parser.parse_args() … def my_program():

# Create a HParams object specifying the names and values of the # model hyperparameters: hparams = tf.HParams(learning_rate=0.1, num_hidden_units=100,

activations=[‘relu’, ‘tanh’])

# Override hyperparameters values by parsing the command line hparams.parse(args.hparams) # If the user passed –hparams=learning_rate=0.3 on the command line # then ‘hparams’ has the following attributes: hparams.learning_rate ==> 0.3 hparams.num_hidden_units ==> 100 hparams.activations ==> [‘relu’, ‘tanh’] # If the hyperparameters are in json format use parse_json: hparams.parse_json(‘{“learning_rate”: 0.3, “activations”: “relu”}’)

```

__init__(hparam_def=None, model_structure=None, **kwargs)¶

Create an instance of HParams from keyword arguments. The keyword arguments specify name-values pairs for the hyperparameters. The parameter types are inferred from the type of the values passed. The parameter names are added as attributes of HParams object, so they can be accessed directly with the dot notation hparams._name_. Example: ```python # Define 3 hyperparameters: ‘learning_rate’ is a float parameter, # ‘num_hidden_units’ an integer parameter, and ‘activation’ a string # parameter. hparams = tf.HParams(

learning_rate=0.1, num_hidden_units=100, activation=’relu’)

hparams.activation ==> ‘relu’ ``` Note that a few names are reserved and cannot be used as hyperparameter names. If you use one of the reserved name the constructor raises a ValueError. :param hparam_def: Serialized hyperparameters, encoded as a hparam_pb2.HParamDef

protocol buffer. If provided, this object is initialized by deserializing hparam_def. Otherwise **kwargs is used.

Parameters

model_structure – An instance of ModelStructure, defining the feature crosses to be used in the Trial.
**kwargs – Key-value pairs where the key is the hyperparameter name and the value is the value for the parameter.

Raises

ValueError – If both hparam_def and initialization values are provided, or if one of the arguments is invalid.

add_hparam(name, value)¶

Adds {name, value} pair to hyperparameters. :param name: Name of the hyperparameter. :param value: Value of the hyperparameter. Can be one of the following types:

int, float, string, int list, float list, or string list.

Raises: ValueError – if one of the arguments is invalid.

static from_proto(hparam_def, import_scope=None)¶

get(key, default=None)¶: Returns the value of key if it exists, else default.

get_model_structure()¶

override_from_dict(values_dict)¶

Override hyperparameter values, parsing new values from a dictionary. :param values_dict: Dictionary of name:value pairs.

Returns: The HParams instance.
Raises: ValueError – If values_dict cannot be parsed.

parse(values, ignore_unknown_hyperparameters=True)¶

Override hyperparameter values, parsing new values from a string. See parse_values for more detail on the allowed format for values. :param values: String. Comma separated list of name=value pairs where

‘value’ must follow the syntax described above.

Parameters: ignore_unknown_hyperparameters – Bool. Set false to raise ValueError if the hyper-parameter is unknown.
Returns: The HParams instance.
Raises: ValueError – If values cannot be parsed.

parse_json(values_json)¶

Override hyperparameter values, parsing new values from a json object. :param values_json: String containing a json object of name:value pairs.

Returns: The HParams instance.
Raises: ValueError – If values_json cannot be parsed.

set_from_map(values_map)¶: DEPRECATED. Use override_from_dict.

set_hparam(name, value)¶

Set the value of an existing hyperparameter. This function verifies that the type of the value matches the type of the existing hyperparameter. :param name: Name of the hyperparameter. :param value: New value of the hyperparameter.

Raises: ValueError – If there is a type mismatch.

set_model_structure(model_structure)¶

to_json(indent=None, separators=None, sort_keys=False)¶

Serializes the hyperparameters into JSON. :param indent: If a non-negative integer, JSON array elements and object members

will be pretty-printed with that indent level. An indent level of 0, or negative, will only insert newlines. None (the default) selects the most compact representation.

Parameters

separators – Optional (item_separator, key_separator) tuple. Default is (‘, ‘, ‘: ‘).
sort_keys – If True, the output dictionaries will be sorted by key.

Returns

A JSON string.

values()¶: Return the hyperparameter values as a Python dictionary. :returns: A dictionary with hyperparameter names as keys. The values are the

hyperparameter values.

ultra.utils.hparams.parse_values(values, type_map, ignore_unknown_hyperparameters)¶

Parses hyperparameter values from a string into a python map. values is a string containing comma-separated name=value pairs. For each pair, the value of the hyperparameter named name is set to value. If a hyperparameter name appears multiple times in values, a ValueError is raised (e.g. ‘a=1,a=2’, ‘a[1]=1,a[1]=2’). If a hyperparameter name in both an index assignment and scalar assignment, a ValueError is raised. (e.g. ‘a=[1,2,3],a[0] = 1’). The value in name=value must follows the syntax according to the type of the parameter: * Scalar integer: A Python-parsable integer point value. E.g.: 1,

100, -12.

Scalar float: A Python-parsable floating point value. E.g.: 1.0, -.54e89.
Boolean: Either true or false.
Scalar string: A non-empty sequence of characters, excluding comma, spaces, and square brackets. E.g.: foo, bar_1.
List: A comma separated list of scalar values of the parameter type enclosed in square brackets. E.g.: [1,2,3], [1.0,1e-12], [high,low].

When index assignment is used, the corresponding type_map key should be the list name. E.g. for “arr[1]=0” the type_map must have the key “arr” (not “arr[1]”). :param values: String. Comma separated list of name=value pairs where

‘value’ must follow the syntax described above.

Parameters

type_map – A dictionary mapping hyperparameter names to types. Note every parameter name in values must be a key in type_map. The values must conform to the types indicated, where a value V is said to conform to a type T if either V has type T, or V is a list of elements of type T. Hence, for a multidimensional parameter ‘x’ taking float values, ‘x=[0.1,0.2]’ will parse successfully if type_map[‘x’] = float.
ignore_unknown_hyperparameters – Bool. Set false to raise ValueError if the hyper-parameter is unknown.

Returns

A scalar value.
A list of scalar values.
A dictionary mapping index numbers to scalar values.

(e.g. “x=5,L=[1,2],arr[1]=3” results in {‘x’:5,’L’:[1,2],’arr’:{1:3}}”)

Return type

A python map mapping each name to either

Raises

ValueError – If there is a problem with input.
* If values cannot be parsed. –
* If a list is assigned to a list index (e.g. 'a[1] = [1,2,3]') –
* If the same rvalue is assigned two different values (e.g. 'a=1,a=2', – ‘a[1]=1,a[1]=2’, or ‘a=1,a=[1]’)

ultra.utils.metrics module¶

Defines ranking metrics as TF ops.

The metrics here are meant to be used during the TF training. That is, a batch of instances in the Tensor format are evaluated by ops. It works with listwise Tensors only.

class ultra.utils.metrics.RankingMetricKey¶

Bases: object

Ranking metric key strings.

ARP = 'arp'¶

DCG = 'dcg'¶

ERR = 'err'¶

MAP = 'map'¶

MAX_LABEL = None¶

MRR = 'mrr'¶

NDCG = 'ndcg'¶

ORDERED_PAIR_ACCURACY = 'ordered_pair_accuracy'¶

PRECISION = 'precision'¶

ultra.utils.metrics.average_relevance_position(labels, predictions, weights=None, name=None)¶

Computes average relevance position (ARP).

This can also be named as average_relevance_rank, but this can be confusing with mean_reciprocal_rank in acronyms. This name is more distinguishing and has been used historically for binary relevance as average_click_position.

Parameters

labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
name – A string used as the name for this metric.

Returns

A metric for the weighted average relevance position.

ultra.utils.metrics.discounted_cumulative_gain(labels, predictions, weights=None, topn=None, name=None)¶

Computes discounted cumulative gain (DCG).

Parameters

labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.

Returns

A metric for the weighted discounted cumulative gain of the batch.

ultra.utils.metrics.expected_reciprocal_rank(labels, predictions, weights=None, topn=None, name=None)¶

Computes expected reciprocal rank (ERR).

Parameters

labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.

Returns

A metric for the weighted expected reciprocal rank of the batch.

ultra.utils.metrics.make_ranking_metric_fn(metric_key, topn=None, name=None)¶

Factory method to create a ranking metric function.

Parameters

metric_key – A key in RankingMetricKey.
topn – An integer specifying the cutoff of how many items are considered in the metric.
name – A string used as the name for this metric.

Returns

labels: A Tensor of the same shape as predictions representing

graded: relevance.

predictions: A Tensor with shape [batch_size, list_size]. Each value

is: the ranking score of the corresponding example.

weights: A Tensor of weights (read more from each metric function).

Return type

A metric fn with the following Args

ultra.utils.metrics.mean_average_precision(labels, predictions, weights=None, topn=None, name=None)¶

Computes mean average precision (MAP). The implementation of MAP is based on Equation (1.7) in the following: Liu, T-Y “Learning to Rank for Information Retrieval” found at https://www.nowpublishers.com/article/DownloadSummary/INR-016

Parameters

labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.

Returns

A metric for the mean average precision.

ultra.utils.metrics.mean_reciprocal_rank(labels, predictions, weights=None, name=None)¶

Computes mean reciprocal rank (MRR).

Parameters

labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
name – A string used as the name for this metric.

Returns

A metric for the weighted mean reciprocal rank of the batch.

ultra.utils.metrics.normalized_discounted_cumulative_gain(labels, predictions, weights=None, topn=None, name=None)¶

Computes normalized discounted cumulative gain (NDCG).

Parameters

labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.

Returns

A metric for the weighted normalized discounted cumulative gain of the batch.

ultra.utils.metrics.ordered_pair_accuracy(labels, predictions, weights=None, name=None)¶

Computes the percentage of correctedly ordered pair.

For any pair of examples, we compare their orders determined by labels and predictions. They are correctly ordered if the two orders are compatible. That is, labels l_i > l_j and predictions s_i > s_j and the weight for this pair is the weight from the l_i.

Parameters

labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
name – A string used as the name for this metric.

Returns

A metric for the accuracy or ordered pairs.

ultra.utils.metrics.precision(labels, predictions, weights=None, topn=None, name=None)¶

Computes precision as weighted average of relevant examples.

Parameters

labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.

Returns

A metric for the weighted precision of the batch.

ultra.utils.propensity_estimator module¶

class ultra.utils.propensity_estimator.BasicPropensityEstimator(file_name=None)¶

Bases: object

__init__(file_name=None)¶

Initialize a propensity estimator.

Parameters: file_name – (string) The path to the json file of the propensity estimator. ‘None’ means creating from scratches.

getPropensityForOneList(click_list, use_non_clicked_data=False)¶

Computing the propensity weights for each result in a list with clicks.

Parameters

click_list – list<int> The list of clicks indicating whether each result are clicked (>0) or not (=0).
use_non_clicked_data – Set True to give weights to non-clicked results, otherwise the non-clicked results would have 0 weights.

Returns

list<float> A list of propensity weights for the corresponding results.

Return type

propensity_weights

loadEstimatorFromFile(file_name)¶

Load a propensity estimator from a json file.

Parameters: file_name – (string) The path to the json file of the propensity estimator.

outputEstimatorToFile(file_name)¶

Export a propensity estimator to a json file.

Parameters: file_name – (string) The path to the json file of the propensity estimator.

class ultra.utils.propensity_estimator.OraclePropensityEstimator(click_model)¶

Bases: ultra.utils.propensity_estimator.BasicPropensityEstimator

__init__(click_model)¶

Initialize a propensity estimator.

Parameters: file_name – (string) The path to the json file of the propensity estimator. ‘None’ means creating from scratches.

getPropensityForOneList(click_list, use_non_clicked_data=False)¶

Computing the propensity weights for each result in a list with clicks.

Parameters

click_list – list<int> The list of clicks indicating whether each result are clicked (>0) or not (=0).
use_non_clicked_data – Set True to give weights to non-clicked results, otherwise the non-clicked results would have 0 weights.

Returns

list<float> A list of propensity weights for the corresponding results.

Return type

propensity_weights

loadEstimatorFromFile(file_name)¶

Load a propensity estimator from a json file.

Parameters: file_name – (string) The path to the json file of the propensity estimator.

outputEstimatorToFile(file_name)¶

Export a propensity estimator to a json file.

Parameters: file_name – (string) The path to the json file of the propensity estimator.

class ultra.utils.propensity_estimator.RandomizedPropensityEstimator(file_name=None)¶

Bases: ultra.utils.propensity_estimator.BasicPropensityEstimator

__init__(file_name=None)¶

Initialize a propensity estimator.

Parameters: file_name – (string) The path to the json file of the propensity estimator. ‘None’ means creating from scratches.

estimateParametersFromModel(click_model, data_set)¶

Estimate propensity weights based on clicks simulated with a click model.

Parameters

click_model – (ClickModel) The click model used to generate clicks.
data_set – (Raw_data) The data set with rank lists and labels.

loadEstimatorFromFile(file_name)¶

Load a propensity estimator from a json file.

Parameters: file_name – (string) The path to the json file of the propensity estimator.

outputEstimatorToFile(file_name)¶

Export a propensity estimator to a json file.

Parameters: file_name – (string) The path to the json file of the propensity estimator.

ultra.utils.propensity_estimator.main()¶

ultra.utils.sys_tools module¶

ultra.utils.sys_tools.create_object(class_str, *args, **kwargs)¶

Find the corresponding class based on a string of class name and create an object.

Parameters: class_str – a string containing the name of the class
Raises: ValueError – If there is no class with the name.

ultra.utils.sys_tools.find_class(class_str)¶

Find the corresponding class based on a string of class name.

Parameters: class_str – a string containing the name of the class
Raises: ValueError – If there is no class with the name.

ultra.utils.sys_tools.list_recursive_concrete_subclasses(base)¶

List all concrete subclasses of base recursively.

Parameters: base – a string containing the name of the class

ultra.utils.team_draft_interleave module¶

class ultra.utils.team_draft_interleave.TeamDraftInterleaving¶

Bases: object

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

infer_winner(clicks)¶

interleave(rankings)¶

next_index_to_add(inter_result, inter_n, ranking, index)¶

ultra.utils package¶

Submodules¶

ultra.utils.data_utils module¶

ultra.utils.hparams module¶

ultra.utils.metrics module¶

ultra.utils.propensity_estimator module¶

ultra.utils.sys_tools module¶

ultra.utils.team_draft_interleave module¶

Module contents¶