For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.
Index
Series
DataFrame
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
Kind of Data
Pandas Data Type
Scalar
Array
TZ-aware datetime
DatetimeTZDtype
Timestamp
Datetime data
Timedeltas
(none)
Timedelta
Timedelta data
Period (time spans)
PeriodDtype
Period
Timespan data
Intervals
IntervalDtype
Interval
Interval data
Nullable Integer
Int64Dtype, …
Int64Dtype
Nullable integer
Categorical
CategoricalDtype
Categorical data
Sparse
SparseDtype
Sparse data
Strings
StringDtype
str
Text data
Boolean (with NA)
BooleanDtype
bool
Boolean data with missing values
Pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.
array()
array(data, dtype, numpy.dtype, …)
array
Create an array.
NumPy cannot natively represent timezone-aware datetimes. Pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.
arrays.DatetimeArray
Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.
datetime.datetime
Timestamp([ts_input, freq, tz, unit, year, …])
Pandas replacement for python datetime.datetime object.
Timestamp.asm8
Return numpy datetime64 format in nanoseconds.
Timestamp.day
Timestamp.dayofweek
Return day of the week.
Timestamp.dayofyear
Return the day of the year.
Timestamp.days_in_month
Return the number of days in the month.
Timestamp.daysinmonth
Timestamp.fold
Timestamp.hour
Timestamp.is_leap_year
Return True if year is a leap year.
Timestamp.is_month_end
Return True if date is last day of month.
Timestamp.is_month_start
Return True if date is first day of month.
Timestamp.is_quarter_end
Return True if date is last day of the quarter.
Timestamp.is_quarter_start
Return True if date is first day of the quarter.
Timestamp.is_year_end
Return True if date is last day of the year.
Timestamp.is_year_start
Return True if date is first day of the year.
Timestamp.max
Timestamp.microsecond
Timestamp.min
Timestamp.minute
Timestamp.month
Timestamp.nanosecond
Timestamp.quarter
Return the quarter of the year.
Timestamp.resolution
Timestamp.second
Timestamp.tz
Alias for tzinfo.
Timestamp.tzinfo
Timestamp.value
Timestamp.week
Return the week number of the year.
Timestamp.weekofyear
Timestamp.year
Timestamp.astimezone(self, tz)
Timestamp.astimezone
Convert tz-aware Timestamp to another time zone.
Timestamp.ceil(self, freq[, ambiguous, …])
Timestamp.ceil
return a new Timestamp ceiled to this resolution.
Timestamp.combine(date, time)
Timestamp.combine
date, time -> datetime with same date and time fields.
Timestamp.ctime()
Timestamp.ctime
Return ctime() style string.
Timestamp.date()
Timestamp.date
Return date object with same year, month and day.
Timestamp.day_name(self[, locale])
Timestamp.day_name
Return the day name of the Timestamp with specified locale.
Timestamp.dst()
Timestamp.dst
Return self.tzinfo.dst(self).
Timestamp.floor(self, freq[, ambiguous, …])
Timestamp.floor
return a new Timestamp floored to this resolution.
Timestamp.freq
Timestamp.freqstr
Return the total number of days in the month.
Timestamp.fromordinal(ordinal[, freq, tz])
Timestamp.fromordinal
Passed an ordinal, translate and convert to a ts.
Timestamp.fromtimestamp(ts)
Timestamp.fromtimestamp
timestamp[, tz] -> tz’s local time from POSIX timestamp.
Timestamp.isocalendar()
Timestamp.isocalendar
Return a 3-tuple containing ISO year, week number, and weekday.
Timestamp.isoformat(self[, sep])
Timestamp.isoformat
[sep] -> string in ISO 8601 format, YYYY-MM-DDT[HH[:MM[:SS[.mmm[uuu]]]]][+HH:MM].
Timestamp.isoweekday()
Timestamp.isoweekday
Return the day of the week represented by the date.
Timestamp.month_name(self[, locale])
Timestamp.month_name
Return the month name of the Timestamp with specified locale.
Timestamp.normalize(self)
Timestamp.normalize
Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz])
Timestamp.now
Return new Timestamp object representing current time local to tz.
Timestamp.replace(self[, year, month, day, …])
Timestamp.replace
implements datetime.replace, handles nanoseconds.
Timestamp.round(self, freq[, ambiguous, …])
Timestamp.round
Round the Timestamp to the specified resolution.
Timestamp.strftime()
Timestamp.strftime
format -> strftime() style string.
Timestamp.strptime(string, format)
Timestamp.strptime
Function is not implemented.
Timestamp.time()
Timestamp.time
Return time object with same time but with tzinfo=None.
Timestamp.timestamp()
Timestamp.timestamp
Return POSIX timestamp as float.
Timestamp.timetuple()
Timestamp.timetuple
Return time tuple, compatible with time.localtime().
Timestamp.timetz()
Timestamp.timetz
Return time object with same time and tzinfo.
Timestamp.to_datetime64()
Timestamp.to_datetime64
Return a numpy.datetime64 object with ‘ns’ precision.
Timestamp.to_numpy()
Timestamp.to_numpy
Convert the Timestamp to a NumPy datetime64.
Timestamp.to_julian_date(self)
Timestamp.to_julian_date
Convert TimeStamp to a Julian Date.
Timestamp.to_period(self[, freq])
Timestamp.to_period
Return an period of which this timestamp is an observation.
Timestamp.to_pydatetime()
Timestamp.to_pydatetime
Convert a Timestamp object to a native Python datetime object.
Timestamp.today(cls[, tz])
Timestamp.today
Return the current time in the local timezone.
Timestamp.toordinal()
Timestamp.toordinal
Return proleptic Gregorian ordinal.
Timestamp.tz_convert(self, tz)
Timestamp.tz_convert
Timestamp.tz_localize(self, tz[, ambiguous, …])
Timestamp.tz_localize
Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp.
Timestamp.tzname()
Timestamp.tzname
Return self.tzinfo.tzname(self).
Timestamp.utcfromtimestamp(ts)
Timestamp.utcfromtimestamp
Construct a naive UTC datetime from a POSIX timestamp.
Timestamp.utcnow()
Timestamp.utcnow
Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset()
Timestamp.utcoffset
Return self.tzinfo.utcoffset(self).
Timestamp.utctimetuple()
Timestamp.utctimetuple
Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday()
Timestamp.weekday
A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.
.dtype
DatetimeArray
np.dtype("datetime64[ns]")
If the data are tz-aware, then every value in the array must have the same timezone.
arrays.DatetimeArray(values[, dtype, freq, copy])
Pandas ExtensionArray for tz-naive or tz-aware datetime data.
DatetimeTZDtype([unit, tz])
An ExtensionDtype for timezone-aware datetime data.
NumPy can natively represent timedeltas. Pandas provides Timedelta for symmetry with Timestamp.
Timedelta([value, unit])
Represents a duration, the difference between two dates or times.
Timedelta.asm8
Return a numpy timedelta64 array scalar view.
Timedelta.components
Return a components namedtuple-like.
Timedelta.days
Number of days.
Timedelta.delta
Return the timedelta in nanoseconds (ns), for internal compatibility.
Timedelta.freq
Timedelta.is_populated
Timedelta.max
Timedelta.microseconds
Number of microseconds (>= 0 and less than 1 second).
Timedelta.min
Timedelta.nanoseconds
Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolution
Timedelta.seconds
Number of seconds (>= 0 and less than 1 day).
Timedelta.value
Timedelta.view()
Timedelta.view
Array view compatibility.
Timedelta.ceil(self, freq)
Timedelta.ceil
Return a new Timedelta ceiled to this resolution.
Timedelta.floor(self, freq)
Timedelta.floor
Return a new Timedelta floored to this resolution.
Timedelta.isoformat()
Timedelta.isoformat
Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S, where the [n] s are replaced by the values.
P[n]Y[n]M[n]DT[n]H[n]M[n]S
[n]
Timedelta.round(self, freq)
Timedelta.round
Round the Timedelta to the specified resolution.
Timedelta.to_pytimedelta()
Timedelta.to_pytimedelta
Convert a pandas Timedelta object into a python timedelta object.
Timedelta.to_timedelta64()
Timedelta.to_timedelta64
Return a numpy.timedelta64 object with ‘ns’ precision.
Timedelta.to_numpy()
Timedelta.to_numpy
Convert the Timedelta to a NumPy timedelta64.
Timedelta.total_seconds()
Timedelta.total_seconds
Total duration of timedelta in seconds (to ns precision).
A collection of timedeltas may be stored in a TimedeltaArray.
TimedeltaArray
arrays.TimedeltaArray(values[, dtype, freq, …])
arrays.TimedeltaArray
Pandas ExtensionArray for timedelta data.
Pandas represents spans of times as Period objects.
Period([value, freq, ordinal, year, month, …])
Represents a period of time.
Period.day
Get day of the month that a Period falls on.
Period.dayofweek
Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyear
Period.days_in_month
Get the total number of days in the month that this period falls on.
Period.daysinmonth
Get the total number of days of the month that the Period falls in.
Period.end_time
Period.freq
Period.freqstr
Period.hour
Get the hour of the day component of the Period.
Period.is_leap_year
Period.minute
Get minute of the hour component of the Period.
Period.month
Period.ordinal
Period.quarter
Period.qyear
Fiscal year the Period lies in according to its starting-quarter.
Period.second
Get the second component of the Period.
Period.start_time
Get the Timestamp for the start of the period.
Period.week
Get the week of the year on the given Period.
Period.weekday
Period.weekofyear
Period.year
Period.asfreq()
Period.asfreq
Convert Period to desired frequency, at the start or end of the interval.
Period.now()
Period.now
Period.strftime()
Period.strftime
Returns the string representation of the Period, depending on the selected fmt.
fmt
Period.to_timestamp()
Period.to_timestamp
Return the Timestamp representation of the Period.
A collection of timedeltas may be stored in a arrays.PeriodArray. Every period in a PeriodArray must have the same freq.
arrays.PeriodArray
PeriodArray
freq
arrays.PeriodArray(values[, freq, dtype, copy])
Pandas ExtensionArray for storing Period data.
PeriodDtype([freq])
An ExtensionDtype for Period data.
Arbitrary intervals can be represented as Interval objects.
Immutable object implementing an Interval, a bounded slice-like interval.
Interval.closed
Whether the interval is closed on the left-side, right-side, both or neither.
Interval.closed_left
Check if the interval is closed on the left side.
Interval.closed_right
Check if the interval is closed on the right side.
Interval.is_empty
Indicates if an interval is empty, meaning it contains no points.
Interval.left
Left bound for the interval.
Interval.length
Return the length of the Interval.
Interval.mid
Return the midpoint of the Interval.
Interval.open_left
Check if the interval is open on the left side.
Interval.open_right
Check if the interval is open on the right side.
Interval.overlaps()
Interval.overlaps
Check whether two Interval objects overlap.
Interval.right
Right bound for the interval.
A collection of intervals may be stored in an arrays.IntervalArray.
arrays.IntervalArray
arrays.IntervalArray(data[, closed, dtype, …])
Pandas array for interval data that are closed on the same side.
IntervalDtype([subtype])
An ExtensionDtype for Interval data.
numpy.ndarray cannot natively represent integer-data with missing values. Pandas provides this through arrays.IntegerArray.
numpy.ndarray
arrays.IntegerArray
arrays.IntegerArray(values, mask[, copy])
Array of integer (optional missing) values.
Int8Dtype()
Int8Dtype
An ExtensionDtype for int8 integer data.
Int16Dtype()
Int16Dtype
An ExtensionDtype for int16 integer data.
Int32Dtype()
Int32Dtype
An ExtensionDtype for int32 integer data.
Int64Dtype()
An ExtensionDtype for int64 integer data.
UInt8Dtype()
UInt8Dtype
An ExtensionDtype for uint8 integer data.
UInt16Dtype()
UInt16Dtype
An ExtensionDtype for uint16 integer data.
UInt32Dtype()
UInt32Dtype
An ExtensionDtype for uint32 integer data.
UInt64Dtype()
UInt64Dtype
An ExtensionDtype for uint64 integer data.
Pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a pandas.api.types.CategoricalDtype.
pandas.api.types.CategoricalDtype
CategoricalDtype([categories])
Type for categorical data with the categories and orderedness.
CategoricalDtype.categories
An Index containing the unique categories allowed.
CategoricalDtype.ordered
Whether the categories have an ordered relationship.
Categorical data can be stored in a pandas.Categorical
pandas.Categorical
Categorical(values[, categories, ordered, …])
Represent a categorical variable in classic R / S-plus fashion.
The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:
Categorical.from_codes()
Categorical.from_codes(codes[, categories, …])
Categorical.from_codes
Make a Categorical type from codes and categories or dtype.
The dtype information is available on the Categorical
Categorical.dtype
The CategoricalDtype for this instance.
Categorical.categories
The categories of this categorical.
Categorical.ordered
Categorical.codes
The category codes of this categorical.
np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!
np.asarray(categorical)
Categorical.__array__(self[, dtype])
Categorical.__array__
The numpy array interface.
A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either
category
cat = s.astype(dtype)
Series(..., dtype=dtype)
dtype
the string 'category'
'category'
an instance of CategoricalDtype.
If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.
Series.cat
Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a arrays.SparseArray.
0
NaN
arrays.SparseArray
arrays.SparseArray(data[, sparse_index, …])
An ExtensionArray for storing sparse data.
SparseDtype(dtype, numpy.dtype, …)
Dtype for data stored in SparseArray.
SparseArray
The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse accessor for more.
Series.sparse
When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string").
"string"
arrays.StringArray(values[, copy])
arrays.StringArray
Extension array for string data.
StringDtype()
Extension dtype for string data.
The Series.str accessor is available for Series backed by a arrays.StringArray. See String handling for more.
Series.str
The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False values) with missing values, which is not possible with a bool numpy.ndarray.
"boolean"
arrays.BooleanArray(values, mask, copy)
arrays.BooleanArray
Array of boolean (True/False) data with missing values.
BooleanDtype()
Extension dtype for boolean data.