K-mer signatures
gambit.seq
Generic code for working with sequence data.
Note that all code in this package operates on DNA sequences as sequences of bytes containing ascii-encoded nucleotide codes.
- gambit.seq.NUCLEOTIDES
bytes
corresponding to the four DNA nucleotides. Ascii-encoded upper case lettersACGT
. Note that the order, while arbitrary, is important in this variable as it defines how unique indices are assigned to k-mer sequences.
- gambit.seq.revcomp(seq: bytes) bytes
Get the reverse complement of a nucleotide sequence.
- Parameters:
seq (bytes) – ASCII-encoded nucleotide sequence. Case does not matter.
- Returns:
Reverse complement sequence. All characters in the input which are not valid nucleotide codes will appear unchanged in the corresponding reverse position.
- Return type:
bytes
- class gambit.seq.SequenceFile
Bases:
PathLike
A reference to a DNA sequence file stored in the file system.
Contains all the information needed to read and parse the file. Implements the
os.PathLike
interface, so it can be substituted for astr
orpathlib.Path
in most function arguments that take a file path to open.- Parameters:
path (Union[os.PathLike, str]) – Value of
path
attribute. May be string or path-like object.format (str) – Value of
format
attribute.compression (Optional[str]) – Value of
compression
attribute.
- path
Path to the file.
- Type:
pathlib.Path
- format
String describing the file format as interpreted by
Bio.SeqIO.parse()
, e.g.'fasta'
.- Type:
str
- compression
String describing compression method of the file, e.g.
'gzip'
. None means no compression. Seegambit.util.io.open_compressed()
.- Type:
str | None
- __init__(path, format, compression=None)
Method generated by attrs for class SequenceFile.
- Parameters:
format (str) –
compression (str | None) –
- Return type:
None
- absolute()
Make a copy of the instance with an absolute path.
- Return type:
- classmethod from_paths(paths, format, compression=None)
Create many instances at once from a collection of paths and a single format and compression type.
- Parameters:
paths (Iterable[str | PathLike]) – Collection of paths as strings or path-like objects.
format (str) – Sequence file format of files.
compression (str | None) – Compression method of files.
- Return type:
List[SequenceFile]
- open(mode='r', **kwargs)
Open a stream to the file, with compression/decompression applied transparently.
- Parameters:
mode (str) – Same as equivalent argument to the built-in :func:open`. Some modes may not be supported by all compression types.
**kwargs – Additional text mode specific keyword arguments to pass to opener. Equivalent to the following arguments of the built-in
open()
:encoding
,errors
, andnewlines
. May not be supported by all compression types.
- Returns:
Stream to file in given mode.
- Return type:
IO
- parse(**kwargs)
Open the file and lazily parse its contents.
Returns iterator over sequence data in file. File is parsed lazily, and so must be kept open. The returned iterator is of type
gambit.util.io.ClosingIterator
so it will close the file stream automatically when it finishes. It may also be used as a context manager that closes the stream on exit. You may also close the stream explicitly using the iterator’sclose
method.- Parameters:
**kwargs – Keyword arguments to
open()
.- Returns:
Iterator yielding
Bio.SeqIO.SeqRecord
instances for each sequence in the file.- Return type:
- gambit.seq.seq_to_bytes(seq)
Convert generic DNA sequence to byte string representation.
This is for passing sequence data to Cython functions.
- Parameters:
seq (str | bytes | bytearray | Seq) –
- Return type:
bytes | bytearray
- gambit.seq.validate_dna_seq_bytes(seq)
Check that a sequence contains only valid nucleotide codes (upper case).
- Parameters:
seq (bytes) – ASCII-encoded nucleotide sequence.
- Raises:
ValueError – If the sequence contains an invalid nucleotide.
- gambit.seq.DNASeq
Union of DNA sequence types accepted for k-mer search / signature calculation.
alias of
Union
[str
,bytes
,bytearray
,Seq
]
- gambit.seq.DNASeqBytes
Sequence types accepted directly by native (Cython) code.
alias of
Union
[bytes
,bytearray
]
gambit.kmers
Core functions for searching for and working with k-mers.
- gambit.kmers.index_to_kmer(index: int, kmer: int) bytes
Convert k-mer index to sequence.
- class gambit.kmers.KmerMatch
Bases:
object
Represents a
- kmerspec
K-mer spec used for search.
- Type:
- seq
The sequence searched within.
- Type:
str | bytes | bytearray | Bio.Seq.Seq
- pos
Index of first nucleotide of prefix in
seq
.- Type:
int
- reverse
If the match is on the reverse strand.
- Type:
bool
- __init__(kmerspec, seq, pos, reverse)
Method generated by attrs for class KmerMatch.
- Parameters:
kmerspec (KmerSpec) –
seq (str | bytes | bytearray | Seq) –
pos (int) –
reverse (bool) –
- Return type:
None
- full_indices()
Index range for prefix plus k-mer in sequence.
- Return type:
slice
- kmer()
Get matched k-mer sequence.
- Return type:
bytes
- kmer_index()
Get index of matched k-mer.
- Raises:
ValueError – If the k-mer contains invalid nucleotides.
- Return type:
int
- kmer_indices()
Index range for k-mer in sequence (without prefix).
- Return type:
slice
- class gambit.kmers.KmerSpec
Bases:
Jsonable
Specifications for a k-mer search operation.
- k
Number of nucleotides in k-mer after prefix.
- Type:
int
- prefix
Constant prefix of k-mers to search for, upper-case nucleotide codes as ascii-encoded
bytes
.- Type:
bytes
- prefix_str
Prefix as string.
- Type:
str
- prefix_len
Number of nucleotides in prefix.
- Type:
int
- total_len
Sum of
prefix_len
andk
.- Type:
int
- idx_len
Maximum value (plus one) of integer needed to index one of the found k-mers. Also the number of possible k-mers fitting the spec. Equal to
4 ** k
.
- index_dtype
Smallest unsigned integer dtype that can store k-mer indices.
- Type:
numpy.dtype
- gambit.kmers.find_kmers(kmerspec, seq)
Locate k-mers with the given prefix in a DNA sequence.
Searches sequence both backwards and forwards (reverse complement). The sequence may contain invalid characters (not one of the four nucleotide codes) which will simply not be matched.
- gambit.kmers.index_dtype(k)
Get the smallest unsigned integer dtype that can store k-mer indices for the given
k
.- Parameters:
k (int) –
- Return type:
dtype
- gambit.kmers.kmer_to_index(kmer)
Convert a k-mer to its integer index.
- Raises:
ValueError – If an invalid nucleotide code is encountered.
- Parameters:
kmer (str | bytes | bytearray | Seq) –
- Return type:
int
- gambit.kmers.kmer_to_index_rc(kmer)
Get the integer index of a k-mer’s reverse complement.
- Raises:
ValueError – If an invalid nucleotide code is encountered.
- Parameters:
kmer (str | bytes | bytearray | Seq) –
- Return type:
int
- gambit.kmers.nkmers(k)
Get the number of possible distinct k-mers for a given value of
k
.- Parameters:
k (int) –
- Return type:
int
- gambit.kmers.DEFAULT_KMERSPEC = KmerSpec(11, 'ATGAC')
Default settings for k-mer search
gambit.sigs
Calculate and store collections of k-mer signatures.
gambit.sigs.base
- class gambit.sigs.base.AbstractSignatureArray
Bases:
Sequence
[KmerSignature
]Abstract base class for types which behave as a (non-mutable) sequence of k-mer signatures (k-mer sets in sparse coordinate format).
The signature data itself may already be present in memory or may be loaded lazily from the file system when the object is indexed.
Elements should be Numpy arrays with integer data type. Should implement numpy-style advanced indexing, see
gambit.util.indexing.AdvancedIndexingMixin
. Slicing and advanced indexing should return another instance ofAbstractSignatureArray
.- kmerspec
K-mer spec used to calculate signatures.
- Type:
gambit.kmers.KmerSpec | None
- dtype
Numpy data type of signatures.
- Type:
numpy.dtype
- __eq__(other)
Compare two
AbstractSignatureArray
instances for equality.Two instances are considered equal if they are equivalent as sequences (see
sigarray_eq()
) and have the samekmerspec
.
- sizeof(index)
Get the size/length of the signature at the given index.
Should be the case that
sigarray.size_of(i) == len(sigarray[i])
- Parameters:
index (int) – Index of signature in array.
- Return type:
int
- sizes()
Get the sizes of all signatures in the array.
- Return type:
Sequence[int]
- class gambit.sigs.base.AbstractSignatureArray
Bases:
Sequence
[KmerSignature
]Abstract base class for types which behave as a (non-mutable) sequence of k-mer signatures (k-mer sets in sparse coordinate format).
The signature data itself may already be present in memory or may be loaded lazily from the file system when the object is indexed.
Elements should be Numpy arrays with integer data type. Should implement numpy-style advanced indexing, see
gambit.util.indexing.AdvancedIndexingMixin
. Slicing and advanced indexing should return another instance ofAbstractSignatureArray
.- kmerspec
K-mer spec used to calculate signatures.
- Type:
gambit.kmers.KmerSpec | None
- dtype
Numpy data type of signatures.
- Type:
numpy.dtype
- sizeof(index)
Get the size/length of the signature at the given index.
Should be the case that
sigarray.size_of(i) == len(sigarray[i])
- Parameters:
index (int) – Index of signature in array.
- Return type:
int
- sizes()
Get the sizes of all signatures in the array.
- Return type:
Sequence[int]
- class gambit.sigs.base.AnnotatedSignatures
Bases:
ReferenceSignatures
Wrapper around a signature array which adds
id
andmeta
attributes.- __init__(signatures, ids=None, meta=None)
- Parameters:
signatures (AbstractSignatureArray) – Signature array to wrap.
ids (Sequence | None) – Unique IDs for signatures. Defaults to consecutive integers starting from zero.
meta (SignaturesMeta | None) – Additional metadata describing signatures.
- class gambit.sigs.base.ConcatenatedSignatureArray
Bases:
AdvancedIndexingMixin
,AbstractSignatureArray
Base class for signature arrays which store signatures in a single data array.
- values
K-mer signatures concatenated into single numpy-like array.
- bounds
Numpy-like array storing indices bounding each individual k-mer signature in
values
. Thei
th signature is atvalues[bounds[i]:bounds[i + 1]]
.
- sizeof(index)
Get the size/length of the signature at the given index.
Should be the case that
sigarray.size_of(i) == len(sigarray[i])
- Parameters:
index – Index of signature in array.
- sizes()
Get the sizes of all signatures in the array.
- class gambit.sigs.base.KmerSignature
Type for k-mer signatures (k-mer sets in sparse coordinate format)
alias of
ndarray
- class gambit.sigs.base.ReferenceSignatures
Bases:
AbstractSignatureArray
Base class for an array of reference genome signatures plus metadata.
This contains the extra data needed for the signatures to be used for running queries.
- ids
Array of unique string or integer IDs for each signature. Length should be equal to length of
ReferenceSignatures
object.- Type:
Sequence
- meta
Other metadata describing signatures.
- class gambit.sigs.base.SignatureArray
Bases:
ConcatenatedSignatureArray
Stores a collection of k-mer signatures in a single contiguous Numpy array.
This format enables the calculation of many Jaccard scores in parallel, see
gambit.metric.jaccarddist_array()
.Numpy-style indexing with an array of integers or bools is supported and will return another
SignatureArray
. If indexed with a contiguous slice thevalues
of the returned array will be a view of the original instead of a copy.- values
K-mer signatures concatenated into single Numpy array.
- Type:
numpy.ndarray
- bounds
Array storing indices bounding each individual k-mer signature in
values
. Thei
th signature is atvalues[bounds[i]:bounds[i + 1]]
.- Type:
numpy.ndarray
- __init__(signatures, kmerspec=None, dtype=None)
- Parameters:
signatures (Sequence[KmerSignature]) – Sequence of k-mer signatures.
kmerspec (KmerSpec | None) – K-mer spec used to calculate signatures. If None will take from
signatures
if it is anAbstractSignatureArray
instance.dtype (dtype | None) – Numpy dtype of
values
array. If None will use dtype of first element ofsignatures
.
- classmethod from_arrays(values, bounds, kmerspec)
Create directly from values and bounds arrays.
- Parameters:
values (ndarray) –
bounds (ndarray) –
kmerspec (KmerSpec | None) –
- Return type:
- class gambit.sigs.base.SignatureList
Bases:
AdvancedIndexingMixin
,AbstractSignatureArray
,MutableSequence
[KmerSignature
]Stores a collection of k-mer signatures in a standard Python list.
Compared to
SignatureArray
this isn’t as efficient to calculate Jaccard scores with, but supports mutation and won’t have to copy signatures to a new array on creation.- __init__(signatures, kmerspec=None, dtype=None)
- Parameters:
signatures (Iterable[KmerSignature]) – Iterable of k-mer signatures.
kmerspec (KmerSpec | None) – K-mer spec used to calculate signatures. If None will take from
signatures
if it is anAbstractSignatureArray
instance.dtype (dtype | None) – Numpy dtype of signatures. If None will use dtype of first element of
signatures
.
- insert(i, sig)
S.insert(index, value) – insert value before index
- Parameters:
i (int) –
sig (KmerSignature) –
- class gambit.sigs.base.SignaturesMeta
Bases:
object
Metadata describing a set of k-mer signatures.
All attributes are optional.
- id
Any kind of string ID that can be used to uniquely identify the signature set.
- Type:
str | None
- version
Version string (ideally PEP 440-compliant).
- Type:
str | None
- name
Short human-readable name.
- Type:
str | None
- id_attr
Name of
Genome
attribute the IDs correspond to (seegambit.db.models.GENOME_ID_ATTRS
). Optional, but signature set cannot be used as a reference for queries without it.- Type:
str | None
- description
Human-readable description.
- Type:
str | None
- extra
Extra arbitrary metadata. Should be a
dict
or other mapping which can be converted to JSON.- Type:
Mapping[str, Any]
- __init__(*, id=None, name=None, version=None, id_attr=None, description=None, extra=_Nothing.NOTHING)
Method generated by attrs for class SignaturesMeta.
- Parameters:
id (str | None) –
name (str | None) –
version (str | None) –
id_attr (str | None) –
description (str | None) –
extra (Mapping[str, Any]) –
- Return type:
None
- gambit.sigs.base.dump_signatures(path, signatures, format='hdf5', **kw)
Write k-mer signatures and associated metadata to a file.
- Parameters:
path (str | PathLike) – File to write to.
signatures (AbstractSignatureArray) – Array of signatures to store.
format (str) – Format to use. Currently the only valid value is ‘hdf5’.
**kw – Additional keyword arguments depending on format.
- gambit.sigs.base.load_signatures(path, **kw)
Load signatures from file.
Currently the only format used to store signatures is the one in
gambit.sigs.hdf5
, but there may be more in the future. The format should be determined automatically.- Parameters:
path (str | PathLike) – File to open.
**kw – Additional keyword arguments to
h5py.File()
.
- Return type:
- gambit.sigs.base.sigarray_eq(a1, a2)
Check two sequences of sparse k-mer signatures for equality.
Unlike
AbstractSignatureArray.__eq__()
this works on any sequence type containing signatures and does not use theAbstractSignatureArray.kmerspec
attribute.- Parameters:
a1 (Sequence[KmerSignature]) –
a2 (Sequence[KmerSignature]) –
- Return type:
bool
gambit.sigs.calc
Calculate k-mer signatures from sequence data.
- class gambit.sigs.calc.ArrayAccumulator
Bases:
KmerAccumulator
K-mer accumulator implemented as a dense boolean array.
This is pretty efficient for smaller values of
k
, but time and space requirements increase exponentially with larger values.- __init__(k)
- Parameters:
k (int) –
- add(i)
Add an element.
- Parameters:
i (int) –
- clear()
This is slow (creates N new iterators!) but effective.
- discard(i)
Remove an element. Do not raise an exception if absent.
- Parameters:
i (int) –
- signature()
Get signature for accumulated k-mers.
- Return type:
- class gambit.sigs.calc.KmerAccumulator
Bases:
MutableSet
[int
]Base class for data structures which track k-mers as they are found in sequences.
Implements the
MutableSet
interface for k-mer indices. Indices are added viaadd()
oradd_kmer()
methods, when finished a sparse k-mer signature can be obtained fromsignature()
.- add_kmer(kmer)
Add a k-mer by its sequence rather than its index.
Argument may contain invalid (non-nucleotide) bytes, in which case it is ignored.
- Parameters:
kmer (bytes) –
- abstract signature()
Get signature for accumulated k-mers.
- Return type:
- class gambit.sigs.calc.SetAccumulator
Bases:
KmerAccumulator
Accumulator which uses the builtin Python
set
class.This has more overhead than the array version for smaller values of
k
but behaves much better asymptotically.- __init__(k)
- Parameters:
k (int) –
- add(index)
Add an element.
- Parameters:
index (int) –
- clear()
This is slow (creates N new iterators!) but effective.
- discard(index)
Remove an element. Do not raise an exception if absent.
- Parameters:
index (int) –
- signature()
Get signature for accumulated k-mers.
- Return type:
- gambit.sigs.calc.accumulate_kmers(accumulator, kmerspec, seq)
Find k-mer matches in sequence and add their indices to an accumulator.
- Parameters:
accumulator (KmerAccumulator) –
kmerspec (KmerSpec) –
seq (str | bytes | bytearray | Seq) –
- gambit.sigs.calc.calc_file_signature(kspec, seqfile, *, accumulator=None)
Open a sequence file on disk and calculate its k-mer signature.
This works identically to
calc_signature_parse()
but takes aSequenceFile
as input instead of a data stream.- Parameters:
kspec (KmerSpec) – Spec for k-mer search.
seqfile (SequenceFile) – File to read.
accumulator (KmerAccumulator | None) – TODO
- Returns:
K-mer signature in sparse coordinate format (dtype will match
gambit.kmers.dense_to_sparse()
).- Return type:
numpy.ndarray
See also
- gambit.sigs.calc.calc_file_signatures(kspec, files, progress=None, concurrency='processes', max_workers=None, executor=None)
Parse and calculate k-mer signatures for multiple sequence files.
- Parameters:
kspec (KmerSpec) – Spec for k-mer search.
seqfile – Files to read.
progress – Display a progress meter. See
gambit.util.progress.get_progress()
for allowed values.concurrency (str | None) – Process files concurrently.
"processes"
for process-based (default),"threads"
for threads-based,None
for no concurrency.max_workers (int | None) – Number of worker threads/processes to use if
concurrency
is not None.executor (Executor | None) – Instance of class:concurrent.futures.Executor to use for concurrency. Overrides the
concurrency
andmax_workers
arguments.files (Sequence[SequenceFile]) –
- Return type:
See also
- gambit.sigs.calc.calc_signature(kmerspec, seqs, *, accumulator=None)
Calculate the k-mer signature of a DNA sequence or set of sequences.
Searches sequences both backwards and forwards (reverse complement). Sequences may contain invalid characters (not one of the four nucleotide codes) which will simply not be matched.
- Parameters:
kmerspec (KmerSpec) – K-mer spec to use for search.
seqs (str | bytes | bytearray | Seq | Iterable[str | bytes | bytearray | Seq]) – Sequence or sequences to search within. Lowercase characters are OK.
accumulator (KmerAccumulator | None) – TODO
- Returns:
K-mer signature in sparse coordinate format. Data type will be
kspec.index_dtype
.- Return type:
numpy.ndarray
See also
- gambit.sigs.calc.default_accumulator(k)
Get a default k-mer accumulator instance for the given value of
k
.Returns a
ArrayAccumulator
fork <= 11
and aSetAccumulator
fork > 11
.- Parameters:
k (int) –
- Return type:
gambit.sigs.convert
Convert signatures between representations or from one KmerSpec
to another.
- gambit.sigs.convert.can_convert(from_kspec, to_kspec)
Check if signatures from one KmerSpec can be converted to another.
Conversion is possible if
to_kspec.prefix
is equal to or starts withfrom_kspec.prefix
andto_kspec.total_len <= from_kspec.total_len
.
- gambit.sigs.convert.check_can_convert(from_kspec, to_kspec)
Check that signatures can be converted from one KmerSpec to another or raise an error with an informative message.
- gambit.sigs.convert.convert_dense(from_kspec, to_kspec, vec)
Convert a k-mer signature in dense format from one
KmerSpec
to another.In the ideal case, if
vec
is the result ofcalc_signature(from_kspec, seq, sparse=False)
the output of this function should be identical tocalc_signature(to_kspec, seq, sparse=False)
. In reality this may not hold if any potential matches offrom_kspec
inseq
are discarded due to an invalid nucleotide which is not included in the correspondingto_kspec
match.
- gambit.sigs.convert.convert_sparse(from_kspec, to_kspec, sig)
Convert a k-mer signature in sparse format from one
KmerSpec
to another.In the ideal case, if
sig
is the result ofcalc_signature(from_kspec, seq)
the output of this function should be identical tocalc_signature(to_kspec, seq)
. In reality this may not hold if any potential matches offrom_kspec
inseq
are discarded due to an invalid nucleotide which is not included in the correspondingto_kspec
match.- Parameters:
from_kspec (KmerSpec) –
to_kspec (KmerSpec) –
sig (KmerSignature) –
- Return type:
- gambit.sigs.convert.dense_to_sparse(vec)
Convert k-mer set from dense bit vector to sparse coordinate representation.
- Parameters:
vec (Sequence[bool]) – Boolean vector indicating which k-mers are present.
- Returns:
Sorted array of coordinates of k-mers present in vector. Data type will be
numpy.intp
.- Return type:
numpy.ndarray
See also
- gambit.sigs.convert.sparse_to_dense(k_or_kspec, coords)
Convert k-mer set from sparse coordinate representation back to dense bit vector.
- Parameters:
k_or_kspec (int | KmerSpec) – Value of k or a
KmerSpec
instance.coords (KmerSignature) – Sparse coordinate array.
- Returns:
Dense k-mer bit vector.
- Return type:
numpy.ndarray
See also
gambit.sigs.hdf5
Store k-mer signature sets in HDF5 format.
- class gambit.sigs.hdf5.HDF5Signatures
Bases:
ConcatenatedSignatureArray
,ReferenceSignatures
Stores a set of k-mer signatures and associated metadata in an HDF5 group.
Inherits from
gambit.sigs.base.AbstractSignatureArray
, so behaves as a sequence of k-mer signatures supporting Numpy-style advanced indexing.Behaves as a context manager which yields itself on enter and closes the underlying HDF5 file object on exit. The
__bool__()
method can be used to check whether the file is currently open and valid.- group
HDF5 group object data is read from.
- Type:
h5py._hl.group.Group
- format_version
Version of file format
- Type:
int
- Parameters:
group (h5py._hl.group.Group) – Open, readable
h5py.Group
orh5py.File
object.
- __bool__()
Check whether the underlying HDF5 file object is open.
- __init__(group)
- Parameters:
group (Group) –
- close()
Close the underlying HDF5 file.
- classmethod create(group, signatures, *, compression=None, compression_opts=None)
Store k-mer signatures and associated metadata in an HDF5 group.
- Parameters:
group (Group) – HDF5 group to store data in.
signatures (AbstractSignatureArray) – Array of signatures to store. If an instance of
gambit.sigs.base.ReferenceSignatures
its metadata will be stored as well, otherwise default/empty values will be used.compression (str | None) – Compression type for values array. One of
['gzip', 'lzf', 'szip']
. See the section on compression filters inh5py
’s documentation.compression_opts – Sets compression level (0-9) for gzip compression, no effect for other types.
- Return type:
- class gambit.sigs.hdf5.HDF5Signatures
Bases:
ConcatenatedSignatureArray
,ReferenceSignatures
Stores a set of k-mer signatures and associated metadata in an HDF5 group.
Inherits from
gambit.sigs.base.AbstractSignatureArray
, so behaves as a sequence of k-mer signatures supporting Numpy-style advanced indexing.Behaves as a context manager which yields itself on enter and closes the underlying HDF5 file object on exit. The
__bool__()
method can be used to check whether the file is currently open and valid.- group
HDF5 group object data is read from.
- Type:
h5py._hl.group.Group
- format_version
Version of file format
- Type:
int
- Parameters:
group (h5py._hl.group.Group) – Open, readable
h5py.Group
orh5py.File
object.
- __init__(group)
- Parameters:
group (Group) –
- close()
Close the underlying HDF5 file.
- classmethod create(group, signatures, *, compression=None, compression_opts=None)
Store k-mer signatures and associated metadata in an HDF5 group.
- Parameters:
group (Group) – HDF5 group to store data in.
signatures (AbstractSignatureArray) – Array of signatures to store. If an instance of
gambit.sigs.base.ReferenceSignatures
its metadata will be stored as well, otherwise default/empty values will be used.compression (str | None) –
Compression type for values array. One of
['gzip', 'lzf', 'szip']
. See the section on compression filters inh5py
’s documentation.compression_opts – Sets compression level (0-9) for gzip compression, no effect for other types.
- Return type:
- gambit.sigs.hdf5.dump_signatures_hdf5(path, signatures, **kw)
Write k-mer signatures and associated metadata to an HDF5 file.
- Parameters:
path (str | PathLike) – File to write to.
signatures (AbstractSignatureArray) – Array of signatures to store.
**kw – Additional keyword arguments to
HDF5Signatures.create()
.
- gambit.sigs.hdf5.empty_to_none(value)
Convert
h5py.Empty
instances to None, passing other types through.
- gambit.sigs.hdf5.load_signatures_hdf5(path, **kw)
Open HDF5 signature file.
- Parameters:
path (str | PathLike) – File to open.
**kw – Additional keyword arguments to
h5py.File()
.
- Return type:
- gambit.sigs.hdf5.none_to_empty(value, dtype)
Convert None values to
h5py.Empty
, passing other types through.- Parameters:
dtype (dtype) –
- gambit.sigs.hdf5.read_metadata(group)
Read signature set metadata from HDF5 group attributes.
- Parameters:
group (Group) –
- Return type:
- gambit.sigs.hdf5.write_metadata(group, meta)
Write signature set metadata to HDF5 group attributes.
- Parameters:
group (Group) –
meta (SignaturesMeta) –
- gambit.sigs.hdf5.CURRENT_FMT_VERSION = 1
Current version of the data format. Integer which should be incremented each time the format changes.
- gambit.sigs.hdf5.FMT_VERSION_ATTR = 'gambit_signatures_version'
Name of HDF5 group attribute which both stores the format version and also identifies the group as containing signature data.