Exploring the CSV Module in Python: Reading, Filtering, and Writing CSV Files
The CSV (Comma-Separated Values) format is widely used for data storage and exchange. Python’s built-in csv
module makes it easy to read, write, and manipulate CSV files. In this blog post, we’ll explore how to use the csv
module to handle CSV files, including reading data, filtering it, and writing it back to a new CSV file.
Importing the CSV Module
Python comes with the csv
module pre-installed, so there’s no need for additional installations. Start by importing the module:.
import csv
Reading a CSV File
Let’s start by reading a CSV file. We’ll use a dataset that contains US postal codes along with the city, state, latitude, and longitude. The file is tab-separated, so we’ll specify the delimiter:
# Open the CSV file
with open('10_02_us.csv', 'r') as file:
reader = csv.reader(file, delimiter='\t')
# Print each row
for row in reader:
print(row)
Skipping the Header Row
CSV files often include a header row with column names. You can skip this row using the next
function:
# Open the CSV file and skip the header
with open('10_02_us.csv', 'r') as file:
reader = csv.reader(file, delimiter='\t')
next(reader) # Skip the header row
for row in reader:
print(row)
Using DictReader for Easier Data Access
The csv.DictReader
class reads each row into a dictionary with keys from the header row. This can be very handy for data manipulation:
# Open the CSV file using DictReader
with open('10_02_us.csv', 'r') as file:
reader = csv.DictReader(file, delimiter='\t')
for row in reader:
print(row)
Filtering Data
Suppose we want to find all postal codes in Massachusetts (MA) that are prime numbers. First, we’ll define a function to check for prime numbers and then filter the data accordingly:
# Function to check for prime numbers
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
# List of prime numbers up to 99999
primes = [i for i in range(2, 100000) if is_prime(i)]
# Filter the data for prime postal codes in MA
filtered_data = []
with open('10_02_us.csv', 'r') as file:
reader = csv.DictReader(file, delimiter='\t')
next(reader)
for row in reader:
postal_code = int(row['postal code'].lstrip('0')) # Remove leading zeros
if postal_code in primes and row['state code'] == 'MA':
filtered_data.append(row)
print(f"Found {len(filtered_data)} prime postal codes in MA.")
Writing to a New CSV File
Finally, let’s write the filtered data to a new CSV file:
# Write the filtered data to a new CSV file
with open('ma_prime.csv', 'w', newline='') as file:
writer = csv.writer(file)
# Write the header
writer.writerow(['postal code', 'place name', 'state code', 'county', 'latitude', 'longitude'])
# Write the data rows
for row in filtered_data:
writer.writerow([row['postal code'], row['place name'], row['state code'], row['county'], row['latitude'], row['longitude']])
Example Code
Here’s the complete example code for reading, filtering, and writing CSV files:
import csv
# Function to check for prime numbers
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
# List of prime numbers up to 99999
primes = [i for i in range(2, 100000) if is_prime(i)]
# Filter the data for prime postal codes in MA
filtered_data = []
with open('10_02_us.csv', 'r') as file:
reader = csv.DictReader(file, delimiter='\t')
next(reader)
for row in reader:
postal_code = int(row['postal code'].lstrip('0')) # Remove leading zeros
if postal_code in primes and row['state code'] == 'MA':
filtered_data.append(row)
print(f"Found {len(filtered_data)} prime postal codes in MA.")
# Write the filtered data to a new CSV file
with open('ma_prime.csv', 'w', newline='') as file:
writer = csv.writer(file)
# Write the header
writer.writerow(['postal code', 'place name', 'state code', 'county', 'latitude', 'longitude'])
# Write the data rows
for row in filtered_data:
writer.writerow([row['postal code'], row['place name'], row['state code'], row['county'], row['latitude'], row['longitude']])
Conclusion
Python’s csv
module is a powerful tool for handling CSV files. Whether you’re reading, filtering, or writing data, it provides the necessary functions to work efficiently with CSV data. By understanding and utilizing these functions, you can easily manage your data files and perform various operations on them.