ultra.input_layer package¶
Submodules¶
ultra.input_layer.base_input_feed module¶
The basic class that contains all the API needed for the implementation of a input data feed.
-
class
ultra.input_layer.base_input_feed.
BaseInputFeed
(model, batch_size, hparam_str)¶ Bases:
abc.ABC
This class implements a input layer for unbiased learning to rank experiments.
-
MAX_SAMPLE_ROUND_NUM
= 100¶
-
abstract
__init__
(model, batch_size, hparam_str)¶ Create the model.
- Parameters
model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
session – the current tensorflow Session (used for online learning).
-
abstract
get_batch
(data_set, check_validation=False)¶ Get a random batch of data, prepare for step. Typically used for training.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
abstract
get_data_by_index
(data_set, index, check_validation=False)¶ Get one data from the specified index, prepare for step.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.
-
abstract
get_next_batch
(index, data_set, check_validation=False)¶ - Get the next batch of data from a specific index, prepare for step.
Typically used for validation.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
static
preprocess_data
(data_set, hparam_str, exp_settings)¶ Preprocess the data for model creation based on the input feed.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
hparam_str – the hyper-parameters for the input layer.
exp_settings – (dictionary) The dictionary containing the model settings.
-
ultra.input_layer.click_models module¶
ultra.input_layer.click_simulation_feed module¶
Simulate click data based on human annotations.
See the following paper for more information on the simulation data.
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18
-
class
ultra.input_layer.click_simulation_feed.
ClickSimulationFeed
(model, batch_size, hparam_str)¶ Bases:
ultra.input_layer.base_input_feed.BaseInputFeed
Simulate clicks based on human annotations.
This class implements a input layer for unbiased learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.
-
__init__
(model, batch_size, hparam_str)¶ Create the model.
- Parameters
model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
-
get_batch
(data_set, check_validation=False)¶ Get a random batch of data, prepare for step. Typically used for training.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
get_data_by_index
(data_set, index, check_validation=False)¶ Get one data from the specified index, prepare for step.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.
-
get_next_batch
(index, data_set, check_validation=False)¶ - Get the next batch of data from a specific index, prepare for step.
Typically used for validation.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
prepare_sim_clicks_with_index
(data_set, index, docid_inputs, letor_features, labels, check_validation=True)¶
-
ultra.input_layer.deterministic_online_simulation_feed module¶
Simulate online learning process and click data based on human annotations.
See the following paper for more information on the simulation data.
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18
-
class
ultra.input_layer.deterministic_online_simulation_feed.
DeterministicOnlineSimulationFeed
(model, batch_size, hparam_str)¶ Bases:
ultra.input_layer.base_input_feed.BaseInputFeed
Simulate online learning to rank and click data based on human annotations.
This class implements a input layer for online learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.
-
__init__
(model, batch_size, hparam_str)¶ Create the model.
- Parameters
model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
session – the current tensorflow Session (used for online learning).
-
get_batch
(data_set, check_validation=False)¶ Get a random batch of data, prepare for step. Typically used for training.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
get_data_by_index
(data_set, index, check_validation=False)¶ Get one data from the specified index, prepare for step.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.
-
get_next_batch
(index, data_set, check_validation=False)¶ - Get the next batch of data from a specific index, prepare for step.
Typically used for validation.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
prepare_true_labels_with_index
(data_set, index, docid_inputs, letor_features, labels, check_validation=False)¶
-
simulate_clicks_online
(input_feed, check_validation=False)¶ Simulate online environment by reranking documents and collect clicks.
- Parameters
input_feed – (dict) The input_feed data.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
ultra.input_layer.direct_label_feed module¶
Create batch data directly based on labels.
See the following paper for more information on the simulation data.
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18
-
class
ultra.input_layer.direct_label_feed.
DirectLabelFeed
(model, batch_size, hparam_str)¶ Bases:
ultra.input_layer.base_input_feed.BaseInputFeed
Feed data with human annotations.
This class implements a input layer for unbiased learning to rank experiments by directly feeding the model with the true labels of each query-document pair.
-
__init__
(model, batch_size, hparam_str)¶ Create the model.
- Parameters
model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
-
get_batch
(data_set, check_validation=False)¶ Get a random batch of data, prepare for step. Typically used for training.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
get_data_by_index
(data_set, index, check_validation=False)¶ Get one data from the specified index, prepare for step.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.
-
get_next_batch
(index, data_set, check_validation=False)¶ - Get the next batch of data from a specific index, prepare for step.
Typically used for validation.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
prepare_true_labels_with_index
(data_set, index, docid_inputs, letor_features, labels, check_validation=True)¶
-
ultra.input_layer.interleaving_deterministic_online_simulation_feed module¶
ultra.input_layer.stochastic_online_simulation_feed module¶
Simulate online learning process and click data based on human annotations.
See the following paper for more information on the simulation data.
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18
-
class
ultra.input_layer.stochastic_online_simulation_feed.
StochasticOnlineSimulationFeed
(model, batch_size, hparam_str)¶ Bases:
ultra.input_layer.base_input_feed.BaseInputFeed
Simulate online learning to rank and click data based on human annotations.
This class implements a input layer for online learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.
-
__init__
(model, batch_size, hparam_str)¶ Create the model.
- Parameters
model – (BasicModel) The model we are going to train.
batch_size – the size of the batches generated in each iteration.
hparam_str – the hyper-parameters for the input layer.
session – the current tensorflow Session (used for online learning).
-
get_batch
(data_set, check_validation=False)¶ Get a random batch of data, prepare for step. Typically used for training.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
get_data_by_index
(data_set, index, check_validation=False)¶ Get one data from the specified index, prepare for step.
- Parameters
data_set – (Raw_data) The dataset used to build the input layer.
index – the index of the data
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.
-
get_next_batch
(index, data_set, check_validation=False)¶ - Get the next batch of data from a specific index, prepare for step.
Typically used for validation.
To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.
- Parameters
index – the index of the data before which we will use to create the data batch.
data_set – (Raw_data) The dataset used to build the input layer.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-
prepare_true_labels_with_index
(data_set, index, docid_inputs, letor_features, labels, check_validation=False)¶
-
simulate_clicks_online
(input_feed, check_validation=False)¶ Simulate online environment by reranking documents and collect clicks.
- Parameters
input_feed – (dict) The input_feed data.
check_validation – (bool) Set True to ignore data with no positive labels.
- Returns
a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).
- Return type
input_feed
-