pandas.api.extensions.
ExtensionArray
Abstract base class for custom 1-D array types.
pandas will recognize instances of this class as proper arrays with a custom type and will not attempt to coerce them to objects. They may be stored directly inside a DataFrame or Series.
DataFrame
Series
New in version 0.23.0.
Notes
The interface includes the following abstract methods that must be implemented by subclasses:
_from_sequence
_from_factorized
__getitem__
__len__
dtype
nbytes
isna
take
copy
_concat_same_type
A default repr displaying the type, (truncated) data, length, and dtype is provided. It can be customized or replaced by by overriding:
__repr__ : A default repr for the ExtensionArray.
_formatter : Print scalars inside a Series or DataFrame.
Some methods require casting the ExtensionArray to an ndarray of Python objects with self.astype(object), which may be expensive. When performance is a concern, we highly recommend overriding the following methods:
self.astype(object)
fillna
dropna
unique
factorize / _values_for_factorize
argsort / _values_for_argsort
searchsorted
The remaining methods implemented on this class should be performant, as they only compose abstract methods. Still, a more efficient implementation may be available, and these methods can be overridden.
One can implement methods to handle array reductions.
_reduce
One can implement methods to handle parsing from strings that will be used in methods such as pandas.io.parsers.read_csv.
pandas.io.parsers.read_csv
_from_sequence_of_strings
This class does not inherit from ‘abc.ABCMeta’ for performance reasons. Methods and properties required by the interface raise pandas.errors.AbstractMethodError and no register method is provided for registering virtual subclasses.
pandas.errors.AbstractMethodError
register
ExtensionArrays are limited to 1 dimension.
They may be backed by none, one, or many NumPy arrays. For example, pandas.Categorical is an extension array backed by two arrays, one for codes and one for categories. An array of IPv6 address may be backed by a NumPy structured array with two fields, one for the lower 64 bits and one for the upper 64 bits. Or they may be backed by some other storage type, like Python lists. Pandas makes no assumptions on how the data are stored, just that it can be converted to a NumPy array. The ExtensionArray interface does not impose any rules on how this data is stored. However, currently, the backing data cannot be stored in attributes called .values or ._values to ensure full compatibility with pandas internals. But other names as .data, ._data, ._items, … can be freely used.
pandas.Categorical
.values
._values
.data
._data
._items
If implementing NumPy’s __array_ufunc__ interface, pandas expects that
__array_ufunc__
You defer by returning NotImplemented when any Series are present in inputs. Pandas will extract the arrays and call the ufunc again.
NotImplemented
You define a _HANDLED_TYPES tuple as an attribute on the class. Pandas inspect this to determine whether the ufunc is valid for the types present.
_HANDLED_TYPES
See NumPy Universal Functions for more.
Attributes
An instance of ‘ExtensionDtype’.
The number of bytes needed to store this object in memory.
ndim
Extension Arrays are only allowed to be 1-dimensional.
shape
Return a tuple of the array dimensions.
Methods
argsort(self, ascending, kind, *args, **kwargs)
argsort
Return the indices that would sort this array.
astype(self, dtype[, copy])
astype
Cast to a NumPy array with ‘dtype’.
copy(self)
Return a copy of the array.
dropna(self)
Return ExtensionArray without NA values.
factorize(self, na_sentinel)
factorize
Encode the extension array as an enumerated type.
fillna(self[, value, method, limit])
Fill NA/NaN values using the specified method.
isna(self)
A 1-D array indicating if each value is missing.
ravel(self[, order])
ravel
Return a flattened view on this array.
repeat(self, repeats[, axis])
repeat
Repeat elements of a ExtensionArray.
searchsorted(self, value[, side, sorter])
Find indices where elements should be inserted to maintain order.
shift(self, periods, fill_value)
shift
Shift values by desired number.
take(self, indices, allow_fill, fill_value)
Take elements from an array.
unique(self)
Compute the ExtensionArray of unique values.
view(self[, dtype])
view
Return a view on the array.
_concat_same_type(to_concat)
Concatenate multiple array.
_formatter(self, boxed)
_formatter
Formatting function for scalar values.
_from_factorized(values, original)
Reconstruct an ExtensionArray after factorization.
_from_sequence(scalars[, dtype, copy])
Construct a new ExtensionArray from a sequence of scalars.
_from_sequence_of_strings(strings[, dtype, copy])
Construct a new ExtensionArray from a sequence of strings.
_ndarray_values
Internal pandas method for lossy conversion to a NumPy ndarray.
_reduce(self, name[, skipna])
Return a scalar result of performing the reduction operation.
_values_for_argsort(self)
_values_for_argsort
Return values for sorting.
_values_for_factorize(self)
_values_for_factorize
Return an array and missing value suitable for factorization.