abpytools.core package

Submodules

abpytools.core.base module

class abpytools.core.base.CollectionBase[source]

Bases: object

CollectionBase is the abpytools base class to develop the collection APIs

classmethod load_from_fasta(path, numbering_scheme='chothia', n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_file(path, n_threads=20, verbose=True, show_progressbar=True, **kwargs)[source]
Args:
path: n_threads: int to specify number of threads to use in loading process verbose: bool controls the level of verbose show_progressbar: bool whether to display the progressbar kwargs:

Returns:

classmethod load_from_json(path, n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_pb2(path, n_threads=20, verbose=True, show_progressbar=True)[source]
save(file_format, path, update=True)[source]
Args:
file_format: path: update:

Returns:

save_to_fasta(path, update=True)[source]
save_to_json(path, update=True)[source]
save_to_pb2(path, update=True)[source]

abpytools.core.cache module

class abpytools.core.cache.Cache(max_cache_size=10)[source]

Bases: object

add(key, data)[source]
empty_cache()[source]
remove(key)[source]
update(key, data, override=True)[source]

abpytools.core.chain module

class abpytools.core.chain.Chain(sequence, name='Chain1', numbering_scheme='chothia')[source]

Bases: object

The Chain object represent a single chain variable fragment (scFv) antibody.

A scFv can be part of either the heavy or light chain of an antibody. The nature of the chain is determined by querying the sequence to the Abnum server, and is implemented with the Chain.ab_numbering() method.

Attributes:
numbering (list): the name of each position occupied by amino acids in sequence mw (float): the cached molecular weight pI (float): the cached isoelectric point of the sequence cdr (tuple): tuple with two dictionaries for CDR and FR with the index of the amino acids in each region germline_identity (dict):
ab_charge(align=True, ph=7.4, pka_database='Wikipedia')[source]

Method to calculate the charges for each amino acid of antibody :param pka_database: :param ph: :param align: if set to True an alignment will be performed,

if it hasn’t been done already using the ab_numbering method
Returns:array with amino acid charges
ab_ec(extinction_coefficient_database='Standard', reduced=False, normalise=False, **kwargs)[source]
ab_format()[source]
ab_hydrophobicity_matrix(hydrophobicity_scores='ew')[source]
ab_molecular_weight(monoisotopic=False)[source]
ab_numbering(server='abysis', **kwargs)[source]

Return list

Returns:
list:
ab_numbering_table(as_array=False, replacement='-', region='all')[source]
Parameters:
  • region
  • as_array – if True returns numpy.array object, if False returns a pandas.DataFrame
  • replacement – value to replace empty positions
Returns:

ab_pi(pi_database='Wikipedia')[source]
ab_regions()[source]

method to determine Chain regions (CDR and Framework) of each amino acid in sequence

Returns:
ab_total_charge(ph=7.4, pka_database='Wikipedia')[source]
aligned_sequence
chain
static determine_chain_type(numbering)[source]
load()[source]

Generates all the data: - Chain Numbering - Hydrophobicity matrix - Molecular weight - pI

All the data is then stored in its respective attributes

Returns:
classmethod load_from_string(sequence, name='Chain1', numbering_scheme='chothia')[source]

Returns an instantiated Chain object from a sequence Args:

sequence: name: numbering_scheme:

Returns:

name
numbering_scheme
sequence
set_name(name)[source]
status
abpytools.core.chain.amino_acid_charge(amino_acid, ph, pka_values)[source]
abpytools.core.chain.calculate_cdr(numbering, cdr_positions, framework_positions)[source]
Parameters:
  • numbering
  • cdr_positions
  • framework_positions
Returns:

abpytools.core.chain.calculate_charge(sequence, ph, pka_values)[source]
abpytools.core.chain.calculate_ec(sequence, ec_data)[source]
abpytools.core.chain.calculate_hydrophobicity_matrix(whole_sequence, numbering, aa_hydrophobicity_scores, sequence)[source]
abpytools.core.chain.calculate_mw(sequence, mw_data)[source]
abpytools.core.chain.calculate_pi(sequence, pi_data)[source]
abpytools.core.chain.get_ab_numbering(sequence, server, numbering_scheme, timeout=30)[source]
Return type:list

abpytools.core.chain_collection module

class abpytools.core.chain_collection.ChainCollection(antibody_objects=None, load=True, **kwargs)[source]

Bases: abpytools.core.base.CollectionBase

Object containing Chain objects and to perform analysis on the ensemble.

ab_region_index()[source]

method to determine index of amino acids in CDR regions :return: dictionary with names as keys and each value is a dictionary with keys CDR and FR ‘CDR’ entry contains dictionaries with CDR1, CDR2 and CDR3 regions ‘FR’ entry contains dictionaries with FR1, FR2, FR3 and FR4 regions

aligned_sequences
append(antibody_obj)[source]
chain
charge
composition(method='count')[source]

Amino acid composition of each sequence. Each resulting list is organised alphabetically (see composition.py) :param method: :return:

distance_matrix(feature=None, metric='cosine_similarity', multiprocessing=False)[source]

Returns the distance matrix using a given feature and distance metric :param feature: string with the name of the feature to use :param metric: string with the name of the metric to use :param multiprocessing: bool to turn multiprocessing on/off (True/False) :return: list of lists with distances between all sequences of len(data) with each list of len(data)

when i==j M_i,j = 0
extinction_coefficients(extinction_coefficient_database='Standard', reduced=False)[source]
Parameters:
  • extinction_coefficient_database – string with the name of the database to use
  • reduced – bool whether to consider the cysteines to be reduced
Returns:

list

germline
germline_identity
get_object(name='')[source]
Parameters:name – str
Returns:
hydrophobicity_matrix()[source]
igblast_local_query(file_path)[source]
igblast_server_query(chunk_size=50, show_progressbar=True, **kwargs)[source]
Parameters:
  • show_progressbar
  • chunk_size
  • kwargs – keyword arguments to pass to igblast_options
Returns:

load(show_progressbar=True, n_threads=4, verbose=True)[source]
classmethod load_from_fasta(path, numbering_scheme='chothia', n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_json(path, n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_pb2(path, n_threads=20, verbose=True, show_progressbar=True)[source]
loading_status()[source]
molecular_weights(monoisotopic=False)[source]
Parameters:monoisotopic – bool whether to use monoisotopic values
Returns:list
n_ab
names
numbering_scheme
numbering_table(as_array=False, region='all')[source]
pop(index=-1)[source]
save_to_fasta(path, update=True)[source]
save_to_json(path, update=True)[source]
save_to_pb2(path, update=True)[source]
sequences
set_numbering_scheme(numbering_scheme, realign=True)[source]
total_charge
abpytools.core.chain_collection.igblast_options(sequences, domain='imgt', germline_db_V='IG_DB/imgt.Homo_sapiens.V.f.orf.p', germline_db_D='IG_DB/imgt.Homo_sapiens.D.f.orf', germline_db_J='IG_DB / imgt.Homo_sapiens.J.f.orf', num_alignments_V=1, num_alignments_D=1, num_alignments_J=1)[source]
abpytools.core.chain_collection.load_antibody_object(antibody_object)[source]
abpytools.core.chain_collection.load_from_antibody_object(antibody_objects, show_progressbar=True, n_threads=20, verbose=True)[source]
Args:
antibody_objects (list): show_progressbar (bool): n_threads (int): verbose (bool):

Returns:

abpytools.core.chain_collection.load_igblast_query(igblast_result, names)[source]
Parameters:
  • names
  • igblast_result
Returns:

abpytools.core.chain_collection.make_fasta(names, sequences)[source]
abpytools.core.chain_collection.worker(q)[source]

abpytools.core.fab module

class abpytools.core.fab.Fab(heavy_chain=None, light_chain=None, load=True, name=None)[source]

Bases: object

aligned_sequence
charge(**kwargs)[source]
extinction_coefficient(reduced=False, normalise=False, **kwargs)[source]
germline_identity
hydrophobicity_matrix(**kwargs)[source]
load()[source]
molecular_weight(monoisotopic=False)[source]
name
numbering_table(as_array=False, region='all', chain='both')[source]
sequence
total_charge(ph=7.4, pka_database='Wikipedia')[source]

abpytools.core.fab_collection module

class abpytools.core.fab_collection.FabCollection(fab=None, heavy_chains=None, light_chains=None, names=None)[source]

Bases: abpytools.core.base.CollectionBase

aligned_sequences
charge()[source]
extinction_coefficients(extinction_coefficient_database='Standard', reduced=False, normalise=False, **kwargs)[source]
germline
germline_identity
get_object(name)[source]
Parameters:name – str
Returns:
hydrophobicity_matrix()[source]
igblast_local_query(file_path, chain)[source]
igblast_server_query(**kwargs)[source]
classmethod load_from_fasta(path, numbering_scheme='chothia', n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_json(path, n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_pb2(path, n_threads=20, verbose=True, show_progressbar=True)[source]
molecular_weights(monoisotopic=False)[source]
n_ab
names
numbering_table(as_array=False, region='all', chain='both', **kwargs)[source]
regions
save_to_fasta(path, update=True)[source]
save_to_json(path, update=True)[source]
save_to_pb2(path, update=True)[source]
sequences
total_charge(ph=7.4, pka_database='Wikipedia')[source]

abpytools.core.helper_functions module

abpytools.core.helper_functions.germline_identity_pd(heavy_identity, light_identity, internal_heavy, internal_light, names)[source]
abpytools.core.helper_functions.numbering_table_multiindex(region, whole_sequence_dict)[source]
abpytools.core.helper_functions.numbering_table_region(region)[source]
abpytools.core.helper_functions.numbering_table_sequences(region, numbering_scheme, chain)[source]
abpytools.core.helper_functions.to_numbering_table(as_array, region, chain, heavy_chains_numbering_table, light_chains_numbering_table, names, **kwargs)[source]

abpytools.core.utils module

abpytools.core.utils.add_Chain_to_protobuf(antibody_obj, proto_obj)[source]

Helper function to populate a ProtoChain message. Args:

antibody_obj (Chain): proto_obj (ChainProto):

Returns:

abpytools.core.utils.fasta_ChainCollection_parser(raw_fasta, numbering_scheme)[source]
abpytools.core.utils.json_ChainCollection_formatter(chain_objects)[source]

Internal function to serialise ChainCollection objects in JSON format Args:

chain_objects (ChainCollection):

Returns:

abpytools.core.utils.json_ChainCollection_parser(raw_data)[source]
abpytools.core.utils.json_Chain_parser(antibody_dict, name)[source]
abpytools.core.utils.json_FabCollection_formatter(fab_object)[source]

Internal function to serialise FabCollection objects in JSON format Args:

fab_object (FabCollection):

Returns:

abpytools.core.utils.json_FabCollection_parser(raw_data)[source]
abpytools.core.utils.pb2_ChainCollection_formatter(chain_objects, proto_parser, reset_status=True)[source]

Internal function to serialise a ChainCollection object to .pb2 format according to definitition in ‘format/chain.proto’.

Args:
chain_objects (ChainCollection): proto_parser (ChainCollectionProto): reset_status (bool):

Returns:

abpytools.core.utils.pb2_ChainCollection_parser(proto_parser)[source]
abpytools.core.utils.pb2_Chain_parser(proto_chain)[source]

Populate Chain object from protobuf file

Args:
proto_chain (ChainProto):

Returns:

abpytools.core.utils.pb2_FabCollection_formatter(fab_object, proto_parser, reset_status=True)[source]

Internal function to serialise a FabCollection object to .pb2 format according to definitition in ‘format/fab.proto’.

Args:
fab_object (FabCollection): proto_parser (FabCollectionProto): reset_status (bool):

Returns:

abpytools.core.utils.pb2_FabCollection_parser(proto_parser)[source]
abpytools.core.utils.pb2_add_chain(chain_object, proto_parser)[source]

Populates a protobuf ProtoChain message from Chain object and adds it to ChainCollectionProto Args:

chain_object (Chain): proto_parser (ChainCollectionProto):

Returns:

Module contents

class abpytools.core.Chain(sequence, name='Chain1', numbering_scheme='chothia')[source]

Bases: object

The Chain object represent a single chain variable fragment (scFv) antibody.

A scFv can be part of either the heavy or light chain of an antibody. The nature of the chain is determined by querying the sequence to the Abnum server, and is implemented with the Chain.ab_numbering() method.

Attributes:
numbering (list): the name of each position occupied by amino acids in sequence mw (float): the cached molecular weight pI (float): the cached isoelectric point of the sequence cdr (tuple): tuple with two dictionaries for CDR and FR with the index of the amino acids in each region germline_identity (dict):
ab_charge(align=True, ph=7.4, pka_database='Wikipedia')[source]

Method to calculate the charges for each amino acid of antibody :param pka_database: :param ph: :param align: if set to True an alignment will be performed,

if it hasn’t been done already using the ab_numbering method
Returns:array with amino acid charges
ab_ec(extinction_coefficient_database='Standard', reduced=False, normalise=False, **kwargs)[source]
ab_format()[source]
ab_hydrophobicity_matrix(hydrophobicity_scores='ew')[source]
ab_molecular_weight(monoisotopic=False)[source]
ab_numbering(server='abysis', **kwargs)[source]

Return list

Returns:
list:
ab_numbering_table(as_array=False, replacement='-', region='all')[source]
Parameters:
  • region
  • as_array – if True returns numpy.array object, if False returns a pandas.DataFrame
  • replacement – value to replace empty positions
Returns:

ab_pi(pi_database='Wikipedia')[source]
ab_regions()[source]

method to determine Chain regions (CDR and Framework) of each amino acid in sequence

Returns:
ab_total_charge(ph=7.4, pka_database='Wikipedia')[source]
aligned_sequence
chain
static determine_chain_type(numbering)[source]
load()[source]

Generates all the data: - Chain Numbering - Hydrophobicity matrix - Molecular weight - pI

All the data is then stored in its respective attributes

Returns:
classmethod load_from_string(sequence, name='Chain1', numbering_scheme='chothia')[source]

Returns an instantiated Chain object from a sequence Args:

sequence: name: numbering_scheme:

Returns:

name
numbering_scheme
sequence
set_name(name)[source]
status
class abpytools.core.ChainCollection(antibody_objects=None, load=True, **kwargs)[source]

Bases: abpytools.core.base.CollectionBase

Object containing Chain objects and to perform analysis on the ensemble.

ab_region_index()[source]

method to determine index of amino acids in CDR regions :return: dictionary with names as keys and each value is a dictionary with keys CDR and FR ‘CDR’ entry contains dictionaries with CDR1, CDR2 and CDR3 regions ‘FR’ entry contains dictionaries with FR1, FR2, FR3 and FR4 regions

aligned_sequences
append(antibody_obj)[source]
chain
charge
composition(method='count')[source]

Amino acid composition of each sequence. Each resulting list is organised alphabetically (see composition.py) :param method: :return:

distance_matrix(feature=None, metric='cosine_similarity', multiprocessing=False)[source]

Returns the distance matrix using a given feature and distance metric :param feature: string with the name of the feature to use :param metric: string with the name of the metric to use :param multiprocessing: bool to turn multiprocessing on/off (True/False) :return: list of lists with distances between all sequences of len(data) with each list of len(data)

when i==j M_i,j = 0
extinction_coefficients(extinction_coefficient_database='Standard', reduced=False)[source]
Parameters:
  • extinction_coefficient_database – string with the name of the database to use
  • reduced – bool whether to consider the cysteines to be reduced
Returns:

list

germline
germline_identity
get_object(name='')[source]
Parameters:name – str
Returns:
hydrophobicity_matrix()[source]
igblast_local_query(file_path)[source]
igblast_server_query(chunk_size=50, show_progressbar=True, **kwargs)[source]
Parameters:
  • show_progressbar
  • chunk_size
  • kwargs – keyword arguments to pass to igblast_options
Returns:

load(show_progressbar=True, n_threads=4, verbose=True)[source]
classmethod load_from_fasta(path, numbering_scheme='chothia', n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_json(path, n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_pb2(path, n_threads=20, verbose=True, show_progressbar=True)[source]
loading_status()[source]
molecular_weights(monoisotopic=False)[source]
Parameters:monoisotopic – bool whether to use monoisotopic values
Returns:list
n_ab
names
numbering_scheme
numbering_table(as_array=False, region='all')[source]
pop(index=-1)[source]
save_to_fasta(path, update=True)[source]
save_to_json(path, update=True)[source]
save_to_pb2(path, update=True)[source]
sequences
set_numbering_scheme(numbering_scheme, realign=True)[source]
total_charge
class abpytools.core.FabCollection(fab=None, heavy_chains=None, light_chains=None, names=None)[source]

Bases: abpytools.core.base.CollectionBase

aligned_sequences
charge()[source]
extinction_coefficients(extinction_coefficient_database='Standard', reduced=False, normalise=False, **kwargs)[source]
germline
germline_identity
get_object(name)[source]
Parameters:name – str
Returns:
hydrophobicity_matrix()[source]
igblast_local_query(file_path, chain)[source]
igblast_server_query(**kwargs)[source]
classmethod load_from_fasta(path, numbering_scheme='chothia', n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_json(path, n_threads=20, verbose=True, show_progressbar=True)[source]
classmethod load_from_pb2(path, n_threads=20, verbose=True, show_progressbar=True)[source]
molecular_weights(monoisotopic=False)[source]
n_ab
names
numbering_table(as_array=False, region='all', chain='both', **kwargs)[source]
regions
save_to_fasta(path, update=True)[source]
save_to_json(path, update=True)[source]
save_to_pb2(path, update=True)[source]
sequences
total_charge(ph=7.4, pka_database='Wikipedia')[source]
class abpytools.core.Fab(heavy_chain=None, light_chain=None, load=True, name=None)[source]

Bases: object

aligned_sequence
charge(**kwargs)[source]
extinction_coefficient(reduced=False, normalise=False, **kwargs)[source]
germline_identity
hydrophobicity_matrix(**kwargs)[source]
load()[source]
molecular_weight(monoisotopic=False)[source]
name
numbering_table(as_array=False, region='all', chain='both')[source]
sequence
total_charge(ph=7.4, pka_database='Wikipedia')[source]