Pandas
Install
Quick Start
import pandas as pd
# Read a csv file
df = pd.read_csv('filename.csv')
# Write to csv
df.to_csv('filename.csv')
# Read an excel file
pd.read_excel("foo.xlsx", "Sheet1", index_col=None, na_values=["NA"])
# Write to an excel file
df.to_excel("foo.xlsx", sheet_name="Sheet1")
Basics
Series: Typically represents a column. A one-dimensional labeled array holding data of any type
# From a list
s = pd.Series([1, 2, 3, 4, 5, 6])
# Or from a dictionary
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)
DataFrame: A two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.
df = pd.DataFrame(
{
"A": 1.0, # Constant value for all rows
"B": pd.Timestamp("20130102"), # Timestamp 2013-01-02 for all rows
"C": pd.Series(1, index=list(range(4)), dtype="float32"), # Constant 1.0 for all rows
"D": np.array([3] * 4, dtype="int32"), # Constant 3 for all rows
"E": pd.Categorical(["test", "train", "test", "train"]), # List, one value for each row
"F": "foo", # Constant string for all rows
}
)
# Printing df:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
Previewing data:
df.columns # List of columns
df.head() # Initial few rows of data
df.tail(3) # Last 3 rows of data
df.describe() # Compute and show statistical summary like count, max, min, mean
df['A'] # Returns 'A' column
df[0:3] # Returns first 3 rows of data
pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"]) # Creates a pivot table