100 Days Of Python - Day 25
Day 25
Reading and writing CSV files in python
In python, we can use the csv
module to read and write CSV files.
- CSV stands for Comma Separated Values.
- CSV files are text files that store tabular data.
- CSV files are commonly used to store data in spreadsheets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# weather_data.csv
day,temp,condition
Monday,12,Sunny
Tuesday,14,Rain
Wednesday,15,Rain
Thursday,14,Cloudy
Friday,21,Cloudy
Saturday,22,Sunny
Sunday,24,Sunny
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import csv
with open("weather_data.csv") as file:
data = csv.reader(file)
temperatures = []
for row in data:
if row[1] != "temp":
temperatures.append(int(row[1]))
print(temperatures)
# output
# [12, 14, 15, 14, 21, 22, 24]
- The
csv.reader()
function returns a reader object that iterates over lines in the CSV file. - In the above example our objective is to:
- read the CSV file
- extract the temperatures
- store the temperatures in a list
- print the list of temperatures
Pandas
Pandas is a python library that is used for data analysis.
- Pandas is a third-party library and therefore we need to install it before we can use it.
1
pip install pandas
- If we wish to use pandas to fullfil the objective of the previous example, we can do so as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data["temp"]) # prints the temp column in the CSV file
# output
# 0 12
# 1 14
# 2 15
# 3 14
# 4 21
# 5 22
# 6 24
# Name: temp, dtype: int64
- The
pandas.read_csv()
function returns a DataFrame object that contains the data in the CSV file. - A
DataFrame
is a2-dimensional
data structure that is similar to aspreadsheet
. - When we print just the
temp
column, we get aSeries object
. - A
Series
is a1-dimensional
data structure that is similar to alist
.
1
2
3
4
5
6
7
8
9
10
11
12
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(type(data)) # prints the type of data
print(type(data["temp"])) # prints the type of data["temp"]
# output
# <class 'pandas.core.frame.DataFrame'> # data is a DataFrame
# <class 'pandas.core.series.Series'> # data["temp"] is a Series
- We can use the
pandas.DataFrame.to_dict()
function to convert a DataFrame to a dictionary.
1
2
3
4
5
6
7
8
9
10
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data.to_dict()) # prints the data in the CSV file as a dictionary
# output
# {'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}, 'temp': {0: 12, 1: 14, 2: 15, 3: 14, 4: 21, 5: 22, 6: 24}, 'condition': {0: 'Sunny', 1: 'Rain', 2: 'Rain', 3: 'Cloudy', 4: 'Cloudy', 5: 'Sunny', 6: 'Sunny'}}
- We can use the
pandas.DataFrame[column_name]
syntax to access a column in a DataFrame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data["temp"]) # prints the temp column in the CSV file
# output
# 0 12
# 1 14
# 2 15
# 3 14
# 4 21
# 5 22
# 6 24
# Name: temp, dtype: int64
- We can use the
pandas.DataFrame[column_name].to_list()
function to convert a column in a DataFrame to a list.
1
2
3
4
5
6
7
8
9
10
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data["temp"].to_list()) # prints the temp column in the CSV file as a list
# output
# [12, 14, 15, 14, 21, 22, 24]
- We can use the
pandas.DataFrame[column_name].mean()
function to calculate the mean of a column in a DataFrame.
1
2
3
4
5
6
7
8
9
10
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data["temp"].mean()) # prints the mean of the temp column in the CSV file
# output
# 17.428571428571427
- We can use the
pandas.DataFrame[column_name].max()
function to calculate the maximum value of a column in a DataFrame.
1
2
3
4
5
6
7
8
9
10
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data["temp"].max()) # prints the maximum value of the temp column in the CSV file
# output
# 24
Get Data in Row
- We can use the
pandas.DataFrame[pandas.DataFrame[column_name] == value]
syntax to get a row in a DataFrame.
1
2
3
4
5
6
7
8
9
10
11
# main.py
# open the weather_data.csv file and create a list named data that contains the data in the file
import pandas
data = pandas.read_csv("weather_data.csv")
#print(data) # prints the data in the CSV file
print(data[data["day"] == "Monday"]) # prints the row in the CSV file where the day is Monday
# output
# day temp condition
# 0 Monday 12 Sunny
Create a DataFrame from scratch
- We can use the
pandas.DataFrame()
function to create a DataFrame from scratch.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# main.py
# create a DataFrame from scratch
import pandas
data_dict = {
"students": ["Amy", "James", "Angela"],
"scores": [76, 56, 65]
}
data = pandas.DataFrame(data_dict)
print(data)
# output
# students scores
# 0 Amy 76
# 1 James 56
# 2 Angela 65
Create a CSV file from a DataFrame
- We can use the
pandas.DataFrame.to_csv()
function to create a CSV file from a DataFrame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# main.py
# create a CSV file from a DataFrame
import pandas
data_dict = {
"students": ["Amy", "James", "Angela"],
"scores": [76, 56, 65]
}
data = pandas.DataFrame(data_dict)
data.to_csv("new_data.csv")
# output
# new_data.csv
# ,students,scores
# 0,Amy,76
# 1,James,56
# 2,Angela,65
Squirrel Census Data Analysis
- We can use the
pandas.DataFrame[column_name].value_counts()
function to get the number of times a value appears in a column in a DataFrame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# main.py
# analyze the squirrel census data
import pandas
data = pandas.read_csv("2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv")
#print(data) # prints the data in the CSV file
gray_squirrels_count = len(data[data["Primary Fur Color"] == "Gray"])
red_squirrels_count = len(data[data["Primary Fur Color"] == "Cinnamon"])
black_squirrels_count = len(data[data["Primary Fur Color"] == "Black"])
print(gray_squirrels_count) # prints the number of gray squirrels
print(red_squirrels_count) # prints the number of red squirrels
print(black_squirrels_count) # prints the number of black squirrels
data_dict = {
"Fur Color": ["Gray", "Cinnamon", "Black"],
"Count": [gray_squirrels_count, red_squirrels_count, black_squirrels_count]
}
data = pandas.DataFrame(data_dict)
data.to_csv("squirrel_count.csv")
# output
# 2473
# 392
# 103
Us States Game
This is a game that tests your knowledge of the US states. The game will prompt you to name a state. If you name a state correctly, the state will be marked on the map. If you name a state incorrectly, the game will end and a CSV file containing the names of the states you missed will be created.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import turtle
import pandas
# Create a turtle
screen = turtle.Screen()
screen.title("U.S. States Game")
image = "blank_states_img.gif"
screen.addshape(image)
turtle.shape(image)
data = pandas.read_csv("50_states.csv")
all_states = data.state.to_list()
guessed_states = []
# Game loop
while len(guessed_states) < 50:
answer_state = screen.textinput(title=f"{len(guessed_states)}/50 States Correct",
prompt="What's another state's name?").title()
# Exit game
if answer_state == "Exit":
missing_states = [state for state in all_states if state not in guessed_states]
new_data = pandas.DataFrame(missing_states)
new_data.to_csv("states_to_learn.csv")
break
# Check answer
if answer_state in all_states:
guessed_states.append(answer_state)
t = turtle.Turtle()
t.hideturtle()
t.penup()
state_data = data[data.state == answer_state]
t.goto(int(state_data.x), int(state_data.y))
t.write(answer_state)
This post is licensed under CC BY 4.0 by the author.