ultra.utils package¶
Submodules¶
ultra.utils.data_utils module¶
-
class
ultra.utils.data_utils.
Raw_data
(data_path=None, file_prefix=None, rank_cut=None)¶ Bases:
object
-
__init__
(data_path=None, file_prefix=None, rank_cut=None)¶ Initialize a dataset
- Parameters
data_path – (string) the root directory of the experimental dataset.
file_prefix – (string) the prefix of the data to process, e.g. ‘train’, ‘valid’, or ‘test’.
rank_cut – (int) the maximum number of top documents considered in each list.
- Returns
None
-
load_basic_data_information
(data_path=None, file_prefix=None, rank_cut=None)¶ - Load basic dataset information from data_path including:
feature_size: the number of features for each query-document pair. removed_feature_ids: the idxs of the features to ignore.
- Parameters
data_path – (string) the root directory of the experimental dataset.
- Returns
None
-
load_data_in_ULTRA_format
(data_path=None, file_prefix=None, rank_cut=None)¶ - Read dataset in ULTRA format including:
rank_list_size: the maximum number of documents for a query in the data. features: the feature vectors of each query-document pair. dids: the doc ids for each query-document pair. initial_list: the initial ranking list for each query qids: the query ids for each query. labels: the relevance label for each query-document pair in the initial_list. initial_scores: (if exists) the initial ranking scores in the initial list for each query-document pair. initial_list_lengths: the length of the initial list for each query.
- Parameters
data_path – (string) the root directory of the experimental dataset.
file_prefix – (string) the prefix of the data to process, e.g. ‘train’, ‘valid’, or ‘test’.
rank_cut – (int) the maximum number of top documents considered in each list.
- Returns
None
-
load_data_in_libsvm_format
(data_path=None, file_prefix=None, rank_cut=None)¶ - Read dataset in libsvm format including:
rank_list_size: the maximum number of documents for a query in the data. features: the feature vectors of each query-document pair. dids: the doc ids for each query-document pair (created by this program). initial_list: the initial ranking list for each query (created according to data sequence in the libsvm file) qids: the query ids for each query (created by this program). labels: the relevance label for each query-document pair in the initial_list. initial_list_lengths: the number of candidate documents for each query.
- Parameters
data_path – (string) the root directory of the experimental dataset.
file_prefix – (string) the prefix of the data to process, e.g. ‘train’, ‘valid’, or ‘test’.
rank_cut – (int) the maximum number of top documents considered in each list.
- Returns
None
-
pad
(rank_list_size, pad_tails=True)¶ Pad a rank list with zero feature vectors when it is shorter than the required rank list size.
- Parameters
rank_list_size – (int) the required size of a ranked list
pad_tails – (bool) Add padding vectors to the tails of each list (True) or the heads of each list (False)
- Returns
None
-
remove_invalid_data
()¶ Remove query lists with no relevant items or less than 2 items
self.feature_size = -1 self.rank_list_size = -1 self.removed_feature_ids = [] self.features = [] self.dids = [] self.initial_list = [] self.qids = [] self.labels = [] self.initial_scores = [] self.initial_list_lengths = []
- Returns
None
-
-
ultra.utils.data_utils.
generate_ranklist
(data, rerank_lists)¶ Create a reranked lists based on the data and rerank documents ids.
- Parameters
data – (Raw_data) the dataset that contains the raw data
rerank_lists – (list<list<int>>) a list of rerank list in which each element represents the original rank of the documents in the initial list.
- Returns
(map<list<int>>) a map of qid with the reranked document id list.
- Return type
qid_list_map
-
ultra.utils.data_utils.
generate_ranklist_by_scores
(data, rerank_scores)¶ Create a reranked lists based on the data and rerank scores.
- Parameters
data – (Raw_data) the dataset that contains the raw data
rerank_scores – (list<list<float>>) a list of rerank list in which each element represents the reranking scores for the documents on that position in the initial list.
- Returns
(map<list<int>>) a map of qid with the reranked document id list.
- Return type
qid_list_map
-
ultra.utils.data_utils.
merge_Summary
(summary_list, weights)¶
-
ultra.utils.data_utils.
output_ranklist
(data, rerank_scores, output_path, file_name='test')¶ Create a trec format rank list by reranking the initial list with reranking scores.
- Parameters
data – (Raw_data) the dataset that contains the raw data
rerank_scores – (list<list<float>>) a list of rerank list in which each element represents the reranking scores for the documents on that position in the initial list.
output_path – (string) the path for the output
file_name – (string) the name of the output set, e.g., ‘train’, ‘valid’, ‘text’.
- Returns
None
-
ultra.utils.data_utils.
read_data
(data_path, file_prefix, rank_cut=None)¶
ultra.utils.hparams module¶
Hyperparameter values.
-
class
ultra.utils.hparams.
HParams
(hparam_def=None, model_structure=None, **kwargs)¶ Bases:
object
Class to hold a set of hyperparameters as name-value pairs. A HParams object holds hyperparameters used to build and train a model, such as the number of hidden units in a neural net layer or the learning rate to use when training. You first create a HParams object by specifying the names and values of the hyperparameters. To make them easily accessible the parameter names are added as direct attributes of the class. A typical usage is as follows:
`python # Create a HParams object specifying names and values of the model # hyperparameters: hparams = HParams(learning_rate=0.1, num_hidden_units=100) # The hyperparameter are available as attributes of the HParams object: hparams.learning_rate ==> 0.1 hparams.num_hidden_units ==> 100 `
Hyperparameters have type, which is inferred from the type of their value passed at construction type. The currently supported types are: integer, float, string, and list of integer, float, or string. You can override hyperparameter values by calling the [parse()](#HParams.parse) method, passing a string of comma separated name=value pairs. This is intended to make it possible to override any hyperparameter values from a single command-line flag to which the user passes ‘hyper-param=value’ pairs. It avoids having to define one flag for each hyperparameter. The syntax expected for each value depends on the type of the parameter. See parse() for a description of the syntax. Example: ```python # Define a command line flag to pass name=value pairs. # For example using argparse: import argparse parser = argparse.ArgumentParser(description=’Train my model.’) parser.add_argument(‘–hparams’, type=str,help=’Comma separated list of “name=value” pairs.’)
args = parser.parse_args() … def my_program():
# Create a HParams object specifying the names and values of the # model hyperparameters: hparams = tf.HParams(learning_rate=0.1, num_hidden_units=100,
activations=[‘relu’, ‘tanh’])
# Override hyperparameters values by parsing the command line hparams.parse(args.hparams) # If the user passed –hparams=learning_rate=0.3 on the command line # then ‘hparams’ has the following attributes: hparams.learning_rate ==> 0.3 hparams.num_hidden_units ==> 100 hparams.activations ==> [‘relu’, ‘tanh’] # If the hyperparameters are in json format use parse_json: hparams.parse_json(‘{“learning_rate”: 0.3, “activations”: “relu”}’)
-
__init__
(hparam_def=None, model_structure=None, **kwargs)¶ Create an instance of HParams from keyword arguments. The keyword arguments specify name-values pairs for the hyperparameters. The parameter types are inferred from the type of the values passed. The parameter names are added as attributes of HParams object, so they can be accessed directly with the dot notation hparams._name_. Example: ```python # Define 3 hyperparameters: ‘learning_rate’ is a float parameter, # ‘num_hidden_units’ an integer parameter, and ‘activation’ a string # parameter. hparams = tf.HParams(
learning_rate=0.1, num_hidden_units=100, activation=’relu’)
hparams.activation ==> ‘relu’ ``` Note that a few names are reserved and cannot be used as hyperparameter names. If you use one of the reserved name the constructor raises a ValueError. :param hparam_def: Serialized hyperparameters, encoded as a hparam_pb2.HParamDef
protocol buffer. If provided, this object is initialized by deserializing hparam_def. Otherwise **kwargs is used.
- Parameters
model_structure – An instance of ModelStructure, defining the feature crosses to be used in the Trial.
**kwargs – Key-value pairs where the key is the hyperparameter name and the value is the value for the parameter.
- Raises
ValueError – If both hparam_def and initialization values are provided, or if one of the arguments is invalid.
-
add_hparam
(name, value)¶ Adds {name, value} pair to hyperparameters. :param name: Name of the hyperparameter. :param value: Value of the hyperparameter. Can be one of the following types:
int, float, string, int list, float list, or string list.
- Raises
ValueError – if one of the arguments is invalid.
-
static
from_proto
(hparam_def, import_scope=None)¶
-
get
(key, default=None)¶ Returns the value of key if it exists, else default.
-
get_model_structure
()¶
-
override_from_dict
(values_dict)¶ Override hyperparameter values, parsing new values from a dictionary. :param values_dict: Dictionary of name:value pairs.
- Returns
The HParams instance.
- Raises
ValueError – If values_dict cannot be parsed.
-
parse
(values, ignore_unknown_hyperparameters=True)¶ Override hyperparameter values, parsing new values from a string. See parse_values for more detail on the allowed format for values. :param values: String. Comma separated list of name=value pairs where
‘value’ must follow the syntax described above.
- Parameters
ignore_unknown_hyperparameters – Bool. Set false to raise ValueError if the hyper-parameter is unknown.
- Returns
The HParams instance.
- Raises
ValueError – If values cannot be parsed.
-
parse_json
(values_json)¶ Override hyperparameter values, parsing new values from a json object. :param values_json: String containing a json object of name:value pairs.
- Returns
The HParams instance.
- Raises
ValueError – If values_json cannot be parsed.
-
set_from_map
(values_map)¶ DEPRECATED. Use override_from_dict.
-
set_hparam
(name, value)¶ Set the value of an existing hyperparameter. This function verifies that the type of the value matches the type of the existing hyperparameter. :param name: Name of the hyperparameter. :param value: New value of the hyperparameter.
- Raises
ValueError – If there is a type mismatch.
-
set_model_structure
(model_structure)¶
-
to_json
(indent=None, separators=None, sort_keys=False)¶ Serializes the hyperparameters into JSON. :param indent: If a non-negative integer, JSON array elements and object members
will be pretty-printed with that indent level. An indent level of 0, or negative, will only insert newlines. None (the default) selects the most compact representation.
- Parameters
separators – Optional (item_separator, key_separator) tuple. Default is (‘, ‘, ‘: ‘).
sort_keys – If True, the output dictionaries will be sorted by key.
- Returns
A JSON string.
-
values
()¶ Return the hyperparameter values as a Python dictionary. :returns: A dictionary with hyperparameter names as keys. The values are the
hyperparameter values.
-
-
ultra.utils.hparams.
parse_values
(values, type_map, ignore_unknown_hyperparameters)¶ Parses hyperparameter values from a string into a python map. values is a string containing comma-separated name=value pairs. For each pair, the value of the hyperparameter named name is set to value. If a hyperparameter name appears multiple times in values, a ValueError is raised (e.g. ‘a=1,a=2’, ‘a[1]=1,a[1]=2’). If a hyperparameter name in both an index assignment and scalar assignment, a ValueError is raised. (e.g. ‘a=[1,2,3],a[0] = 1’). The value in name=value must follows the syntax according to the type of the parameter: * Scalar integer: A Python-parsable integer point value. E.g.: 1,
100, -12.
Scalar float: A Python-parsable floating point value. E.g.: 1.0, -.54e89.
Boolean: Either true or false.
Scalar string: A non-empty sequence of characters, excluding comma, spaces, and square brackets. E.g.: foo, bar_1.
List: A comma separated list of scalar values of the parameter type enclosed in square brackets. E.g.: [1,2,3], [1.0,1e-12], [high,low].
When index assignment is used, the corresponding type_map key should be the list name. E.g. for “arr[1]=0” the type_map must have the key “arr” (not “arr[1]”). :param values: String. Comma separated list of name=value pairs where
‘value’ must follow the syntax described above.
- Parameters
type_map – A dictionary mapping hyperparameter names to types. Note every parameter name in values must be a key in type_map. The values must conform to the types indicated, where a value V is said to conform to a type T if either V has type T, or V is a list of elements of type T. Hence, for a multidimensional parameter ‘x’ taking float values, ‘x=[0.1,0.2]’ will parse successfully if type_map[‘x’] = float.
ignore_unknown_hyperparameters – Bool. Set false to raise ValueError if the hyper-parameter is unknown.
- Returns
A scalar value.
A list of scalar values.
A dictionary mapping index numbers to scalar values.
(e.g. “x=5,L=[1,2],arr[1]=3” results in {‘x’:5,’L’:[1,2],’arr’:{1:3}}”)
- Return type
A python map mapping each name to either
- Raises
ValueError – If there is a problem with input.
* If values cannot be parsed. –
* If a list is assigned to a list index (e.g. 'a[1] = [1,2,3]') –
* If the same rvalue is assigned two different values (e.g. 'a=1,a=2', – ‘a[1]=1,a[1]=2’, or ‘a=1,a=[1]’)
ultra.utils.metrics module¶
Defines ranking metrics as TF ops.
The metrics here are meant to be used during the TF training. That is, a batch of instances in the Tensor format are evaluated by ops. It works with listwise Tensors only.
-
class
ultra.utils.metrics.
RankingMetricKey
¶ Bases:
object
Ranking metric key strings.
-
ARP
= 'arp'¶
-
DCG
= 'dcg'¶
-
ERR
= 'err'¶
-
MAP
= 'map'¶
-
MAX_LABEL
= None¶
-
MRR
= 'mrr'¶
-
NDCG
= 'ndcg'¶
-
ORDERED_PAIR_ACCURACY
= 'ordered_pair_accuracy'¶
-
PRECISION
= 'precision'¶
-
-
ultra.utils.metrics.
average_relevance_position
(labels, predictions, weights=None, name=None)¶ Computes average relevance position (ARP).
This can also be named as average_relevance_rank, but this can be confusing with mean_reciprocal_rank in acronyms. This name is more distinguishing and has been used historically for binary relevance as average_click_position.
- Parameters
labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
name – A string used as the name for this metric.
- Returns
A metric for the weighted average relevance position.
-
ultra.utils.metrics.
discounted_cumulative_gain
(labels, predictions, weights=None, topn=None, name=None)¶ Computes discounted cumulative gain (DCG).
- Parameters
labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.
- Returns
A metric for the weighted discounted cumulative gain of the batch.
-
ultra.utils.metrics.
expected_reciprocal_rank
(labels, predictions, weights=None, topn=None, name=None)¶ Computes expected reciprocal rank (ERR).
- Parameters
labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.
- Returns
A metric for the weighted expected reciprocal rank of the batch.
-
ultra.utils.metrics.
make_ranking_metric_fn
(metric_key, topn=None, name=None)¶ Factory method to create a ranking metric function.
- Parameters
metric_key – A key in RankingMetricKey.
topn – An integer specifying the cutoff of how many items are considered in the metric.
name – A string used as the name for this metric.
- Returns
labels: A Tensor of the same shape as predictions representing
- graded
relevance.
predictions: A Tensor with shape [batch_size, list_size]. Each value
- is
the ranking score of the corresponding example.
weights: A Tensor of weights (read more from each metric function).
- Return type
A metric fn with the following Args
-
ultra.utils.metrics.
mean_average_precision
(labels, predictions, weights=None, topn=None, name=None)¶ Computes mean average precision (MAP). The implementation of MAP is based on Equation (1.7) in the following: Liu, T-Y “Learning to Rank for Information Retrieval” found at https://www.nowpublishers.com/article/DownloadSummary/INR-016
- Parameters
labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.
- Returns
A metric for the mean average precision.
-
ultra.utils.metrics.
mean_reciprocal_rank
(labels, predictions, weights=None, name=None)¶ Computes mean reciprocal rank (MRR).
- Parameters
labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
name – A string used as the name for this metric.
- Returns
A metric for the weighted mean reciprocal rank of the batch.
-
ultra.utils.metrics.
normalized_discounted_cumulative_gain
(labels, predictions, weights=None, topn=None, name=None)¶ Computes normalized discounted cumulative gain (NDCG).
- Parameters
labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.
- Returns
A metric for the weighted normalized discounted cumulative gain of the batch.
-
ultra.utils.metrics.
ordered_pair_accuracy
(labels, predictions, weights=None)¶ Computes the percentage of correctedly ordered pair.
For any pair of examples, we compare their orders determined by labels and predictions. They are correctly ordered if the two orders are compatible. That is, labels l_i > l_j and predictions s_i > s_j and the weight for this pair is the weight from the l_i.
- Parameters
labels – A Tensor of the same shape as predictions.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
name – A string used as the name for this metric.
- Returns
A metric for the accuracy or ordered pairs.
-
ultra.utils.metrics.
precision
(labels, predictions, weights=None, topn=None, name=None)¶ Computes precision as weighted average of relevant examples.
- Parameters
labels – A Tensor of the same shape as predictions. A value >= 1 means a relevant example.
predictions – A Tensor with shape [batch_size, list_size]. Each value is the ranking score of the corresponding example.
weights – A Tensor of the same shape of predictions or [batch_size, 1]. The former case is per-example and the latter case is per-list.
topn – A cutoff for how many examples to consider for this metric.
name – A string used as the name for this metric.
- Returns
A metric for the weighted precision of the batch.
ultra.utils.propensity_estimator module¶
-
class
ultra.utils.propensity_estimator.
BasicPropensityEstimator
(file_name=None)¶ Bases:
object
-
__init__
(file_name=None)¶ Initialize a propensity estimator.
- Parameters
file_name – (string) The path to the json file of the propensity estimator. ‘None’ means creating from scratches.
-
getPropensityForOneList
(click_list, use_non_clicked_data=False)¶ Computing the propensity weights for each result in a list with clicks.
- Parameters
click_list – list<int> The list of clicks indicating whether each result are clicked (>0) or not (=0).
use_non_clicked_data – Set True to give weights to non-clicked results, otherwise the non-clicked results would have 0 weights.
- Returns
list<float> A list of propensity weights for the corresponding results.
- Return type
propensity_weights
-
loadEstimatorFromFile
(file_name)¶ Load a propensity estimator from a json file.
- Parameters
file_name – (string) The path to the json file of the propensity estimator.
-
outputEstimatorToFile
(file_name)¶ Export a propensity estimator to a json file.
- Parameters
file_name – (string) The path to the json file of the propensity estimator.
-
-
class
ultra.utils.propensity_estimator.
OraclePropensityEstimator
(click_model)¶ Bases:
ultra.utils.propensity_estimator.BasicPropensityEstimator
-
__init__
(click_model)¶ Initialize a propensity estimator.
- Parameters
file_name – (string) The path to the json file of the propensity estimator. ‘None’ means creating from scratches.
-
getPropensityForOneList
(click_list, use_non_clicked_data=False)¶ Computing the propensity weights for each result in a list with clicks.
- Parameters
click_list – list<int> The list of clicks indicating whether each result are clicked (>0) or not (=0).
use_non_clicked_data – Set True to give weights to non-clicked results, otherwise the non-clicked results would have 0 weights.
- Returns
list<float> A list of propensity weights for the corresponding results.
- Return type
propensity_weights
-
loadEstimatorFromFile
(file_name)¶ Load a propensity estimator from a json file.
- Parameters
file_name – (string) The path to the json file of the propensity estimator.
-
outputEstimatorToFile
(file_name)¶ Export a propensity estimator to a json file.
- Parameters
file_name – (string) The path to the json file of the propensity estimator.
-
-
class
ultra.utils.propensity_estimator.
RandomizedPropensityEstimator
(file_name=None)¶ Bases:
ultra.utils.propensity_estimator.BasicPropensityEstimator
-
__init__
(file_name=None)¶ Initialize a propensity estimator.
- Parameters
file_name – (string) The path to the json file of the propensity estimator. ‘None’ means creating from scratches.
-
estimateParametersFromModel
(click_model, data_set)¶ Estimate propensity weights based on clicks simulated with a click model.
- Parameters
click_model – (ClickModel) The click model used to generate clicks.
data_set – (Raw_data) The data set with rank lists and labels.
-
loadEstimatorFromFile
(file_name)¶ Load a propensity estimator from a json file.
- Parameters
file_name – (string) The path to the json file of the propensity estimator.
-
outputEstimatorToFile
(file_name)¶ Export a propensity estimator to a json file.
- Parameters
file_name – (string) The path to the json file of the propensity estimator.
-
-
ultra.utils.propensity_estimator.
main
()¶
ultra.utils.sys_tools module¶
-
ultra.utils.sys_tools.
create_object
(class_str, *args, **kwargs)¶ Find the corresponding class based on a string of class name and create an object.
- Parameters
class_str – a string containing the name of the class
- Raises
ValueError – If there is no class with the name.
-
ultra.utils.sys_tools.
find_class
(class_str)¶ Find the corresponding class based on a string of class name.
- Parameters
class_str – a string containing the name of the class
- Raises
ValueError – If there is no class with the name.
-
ultra.utils.sys_tools.
list_recursive_concrete_subclasses
(base)¶ List all concrete subclasses of base recursively.
- Parameters
base – a string containing the name of the class