ultra.input_layer package

Submodules

ultra.input_layer.base_input_feed module

The basic class that contains all the API needed for the implementation of a input data feed.

class ultra.input_layer.base_input_feed.BaseInputFeed(model, batch_size, hparam_str)

Bases: abc.ABC

This class implements a input layer for unbiased learning to rank experiments.

MAX_SAMPLE_ROUND_NUM = 100
abstract __init__(model, batch_size, hparam_str)

Create the model.

Parameters
  • model – (BasicModel) The model we are going to train.

  • batch_size – the size of the batches generated in each iteration.

  • hparam_str – the hyper-parameters for the input layer.

  • session – the current tensorflow Session (used for online learning).

abstract get_batch(data_set, check_validation=False)

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

abstract get_data_by_index(data_set, index, check_validation=False)

Get one data from the specified index, prepare for step.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • index – the index of the data

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

abstract get_next_batch(index, data_set, check_validation=False)
Get the next batch of data from a specific index, prepare for step.

Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • index – the index of the data before which we will use to create the data batch.

  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

static preprocess_data(data_set, hparam_str, exp_settings)

Preprocess the data for model creation based on the input feed.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • hparam_str – the hyper-parameters for the input layer.

  • exp_settings – (dictionary) The dictionary containing the model settings.

ultra.input_layer.click_models module

ultra.input_layer.click_simulation_feed module

Simulate click data based on human annotations.

See the following paper for more information on the simulation data.

  • Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.click_simulation_feed.ClickSimulationFeed(model, batch_size, hparam_str)

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Simulate clicks based on human annotations.

This class implements a input layer for unbiased learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.

__init__(model, batch_size, hparam_str)

Create the model.

Parameters
  • model – (BasicModel) The model we are going to train.

  • batch_size – the size of the batches generated in each iteration.

  • hparam_str – the hyper-parameters for the input layer.

get_batch(data_set, check_validation=False)

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)

Get one data from the specified index, prepare for step.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • index – the index of the data

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)
Get the next batch of data from a specific index, prepare for step.

Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • index – the index of the data before which we will use to create the data batch.

  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_sim_clicks_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=True)

ultra.input_layer.deterministic_online_simulation_feed module

Simulate online learning process and click data based on human annotations.

See the following paper for more information on the simulation data.

  • Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.deterministic_online_simulation_feed.DeterministicOnlineSimulationFeed(model, batch_size, hparam_str)

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Simulate online learning to rank and click data based on human annotations.

This class implements a input layer for online learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.

__init__(model, batch_size, hparam_str)

Create the model.

Parameters
  • model – (BasicModel) The model we are going to train.

  • batch_size – the size of the batches generated in each iteration.

  • hparam_str – the hyper-parameters for the input layer.

  • session – the current tensorflow Session (used for online learning).

get_batch(data_set, check_validation=False)

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)

Get one data from the specified index, prepare for step.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • index – the index of the data

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)
Get the next batch of data from a specific index, prepare for step.

Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • index – the index of the data before which we will use to create the data batch.

  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_true_labels_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=False)
simulate_clicks_online(input_feed, check_validation=False)

Simulate online environment by reranking documents and collect clicks.

Parameters
  • input_feed – (dict) The input_feed data.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

ultra.input_layer.direct_label_feed module

Create batch data directly based on labels.

See the following paper for more information on the simulation data.

  • Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.direct_label_feed.DirectLabelFeed(model, batch_size, hparam_str)

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Feed data with human annotations.

This class implements a input layer for unbiased learning to rank experiments by directly feeding the model with the true labels of each query-document pair.

__init__(model, batch_size, hparam_str)

Create the model.

Parameters
  • model – (BasicModel) The model we are going to train.

  • batch_size – the size of the batches generated in each iteration.

  • hparam_str – the hyper-parameters for the input layer.

get_batch(data_set, check_validation=False)

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)

Get one data from the specified index, prepare for step.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • index – the index of the data

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)
Get the next batch of data from a specific index, prepare for step.

Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • index – the index of the data before which we will use to create the data batch.

  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_true_labels_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=True)

ultra.input_layer.interleaving_deterministic_online_simulation_feed module

ultra.input_layer.stochastic_online_simulation_feed module

Simulate online learning process and click data based on human annotations.

See the following paper for more information on the simulation data.

  • Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR ‘18

class ultra.input_layer.stochastic_online_simulation_feed.StochasticOnlineSimulationFeed(model, batch_size, hparam_str)

Bases: ultra.input_layer.base_input_feed.BaseInputFeed

Simulate online learning to rank and click data based on human annotations.

This class implements a input layer for online learning to rank experiments by simulating click data based on both the human relevance annotation of each query-document pair and a predefined click model.

__init__(model, batch_size, hparam_str)

Create the model.

Parameters
  • model – (BasicModel) The model we are going to train.

  • batch_size – the size of the batches generated in each iteration.

  • hparam_str – the hyper-parameters for the input layer.

  • session – the current tensorflow Session (used for online learning).

get_batch(data_set, check_validation=False)

Get a random batch of data, prepare for step. Typically used for training.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

get_data_by_index(data_set, index, check_validation=False)

Get one data from the specified index, prepare for step.

Parameters
  • data_set – (Raw_data) The dataset used to build the input layer.

  • index – the index of the data

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

The triple (docid_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(…) later.

get_next_batch(index, data_set, check_validation=False)
Get the next batch of data from a specific index, prepare for step.

Typically used for validation.

To feed data in step(..) it must be a list of batch-major vectors, while data here contains single length-major cases. So the main logic of this function is to re-index data cases to be in the proper format for feeding.

Parameters
  • index – the index of the data before which we will use to create the data batch.

  • data_set – (Raw_data) The dataset used to build the input layer.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

prepare_true_labels_with_index(data_set, index, docid_inputs, letor_features, labels, check_validation=False)
simulate_clicks_online(input_feed, check_validation=False)

Simulate online environment by reranking documents and collect clicks.

Parameters
  • input_feed – (dict) The input_feed data.

  • check_validation – (bool) Set True to ignore data with no positive labels.

Returns

a feed dictionary for the next step info_map: a dictionary contain some basic information about the batch (for debugging).

Return type

input_feed

Module contents

ultra.input_layer.list_available()