ultra.input_layer package¶

Submodules¶

ultra.input_layer.base_input_feed module¶

The basic class that contains all the API needed for the implementation of a input data feed.

class ultra.input_layer.base_input_feed.BaseInputFeed(model, batch_size, hparam_str)¶

Bases: abc.ABC

This class implements a input layer for unbiased learning to rank experiments.

MAX_SAMPLE_ROUND_NUM = 100¶

abstract __init__(model, batch_size, hparam_str)¶

Create the model.

Parameters

model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
session – the current tensorflow Session (used for online learning).

abstract get_batch(data_set, check_validation=False)¶

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

abstract get_data_by_index(data_set, index, check_validation=False)¶

Get one data from the specified index, prepare for step.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

abstract get_next_batch(index, data_set, check_validation=False)¶

Get the next batch of data from a specific index, prepare for step.: Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

static preprocess_data(data_set, hparam_str, exp_settings)¶

Preprocess the data for model creation based on the input feed.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
hparam_str – the hyper-parameters for the input layer.
exp_settings – (dictionary) The dictionary containing the model settings.

ultra.input_layer.click_models module¶

ultra.input_layer.click_simulation_feed module¶

Simulate click data based on human annotations.

See the following paper for more information on the simulation data.

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.click_simulation_feed.ClickSimulationFeed(model, batch_size, hparam_str)¶

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Simulate clicks based on human annotations.

This class implements a input layer for unbiased learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.

__init__(model, batch_size, hparam_str)¶

Create the model.

Parameters

model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.

get_batch(data_set, check_validation=False)¶

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)¶

Get one data from the specified index, prepare for step.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)¶

Get the next batch of data from a specific index, prepare for step.: Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_sim_clicks_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=True)¶

ultra.input_layer.deterministic_online_simulation_feed module¶

Simulate online learning process and click data based on human annotations.

See the following paper for more information on the simulation data.

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.deterministic_online_simulation_feed.DeterministicOnlineSimulationFeed(model, batch_size, hparam_str)¶

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Simulate online learning to rank and click data based on human annotations.

This class implements a input layer for online learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.

__init__(model, batch_size, hparam_str)¶

Create the model.

Parameters

model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
session – the current tensorflow Session (used for online learning).

get_batch(data_set, check_validation=False)¶

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)¶

Get one data from the specified index, prepare for step.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)¶

Get the next batch of data from a specific index, prepare for step.: Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_true_labels_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=False)¶

simulate_clicks_online(input_feed, check_validation=False)¶

Simulate online environment by reranking documents and collect clicks.

Parameters

input_feed – (dict) The input_feed data.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

ultra.input_layer.direct_label_feed module¶

Create batch data directly based on labels.

See the following paper for more information on the simulation data.

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.direct_label_feed.DirectLabelFeed(model, batch_size, hparam_str)¶

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Feed data with human annotations.

This class implements a input layer for unbiased learning to rank experiments by directly feeding the model with the true labels of each query-document pair.

__init__(model, batch_size, hparam_str)¶

Create the model.

Parameters

model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.

get_batch(data_set, check_validation=False)¶

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)¶

Get one data from the specified index, prepare for step.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)¶

Get the next batch of data from a specific index, prepare for step.: Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_true_labels_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=True)¶

ultra.input_layer.interleaving_deterministic_online_simulation_feed module¶

ultra.input_layer.stochastic_online_simulation_feed module¶

Simulate online learning process and click data based on human annotations.

See the following paper for more information on the simulation data.

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.stochastic_online_simulation_feed.StochasticOnlineSimulationFeed(model, batch_size, hparam_str)¶

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Simulate online learning to rank and click data based on human annotations.

This class implements a input layer for online learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.

__init__(model, batch_size, hparam_str)¶

Create the model.

Parameters

model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
session – the current tensorflow Session (used for online learning).

get_batch(data_set, check_validation=False)¶

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)¶

Get one data from the specified index, prepare for step.

Parameters

data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)¶

Get the next batch of data from a specific index, prepare for step.: Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters

index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_true_labels_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=False)¶

simulate_clicks_online(input_feed, check_validation=False)¶

Simulate online environment by reranking documents and collect clicks.

Parameters

input_feed – (dict) The input_feed data.
check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

Module contents¶

ultra.input_layer.list_available()¶