51: Exploring the CSV Module in Python: Reading, Filtering, and Writing CSV Files

Exploring the CSV Module in Python: Reading, Filtering, and Writing CSV Files

The CSV (Comma-Separated Values) format is widely used for data storage and exchange. Python’s built-in csv module makes it easy to read, write, and manipulate CSV files. In this blog post, we’ll explore how to use the csv module to handle CSV files, including reading data, filtering it, and writing it back to a new CSV file.

Importing the CSV Module

Python comes with the csv module pre-installed, so there’s no need for additional installations. Start by importing the module:.

				
					import csv

				
			

Reading a CSV File

Let’s start by reading a CSV file. We’ll use a dataset that contains US postal codes along with the city, state, latitude, and longitude. The file is tab-separated, so we’ll specify the delimiter:

				
					# Open the CSV file
with open('10_02_us.csv', 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    
    # Print each row
    for row in reader:
        print(row)

				
			

Skipping the Header Row

CSV files often include a header row with column names. You can skip this row using the next function:

				
					# Open the CSV file and skip the header
with open('10_02_us.csv', 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    next(reader)  # Skip the header row
    
    for row in reader:
        print(row)

				
			

Using DictReader for Easier Data Access

The csv.DictReader class reads each row into a dictionary with keys from the header row. This can be very handy for data manipulation:

				
					# Open the CSV file using DictReader
with open('10_02_us.csv', 'r') as file:
    reader = csv.DictReader(file, delimiter='\t')
    
    for row in reader:
        print(row)

				
			

Filtering Data

Suppose we want to find all postal codes in Massachusetts (MA) that are prime numbers. First, we’ll define a function to check for prime numbers and then filter the data accordingly:

				
					# Function to check for prime numbers
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

# List of prime numbers up to 99999
primes = [i for i in range(2, 100000) if is_prime(i)]

# Filter the data for prime postal codes in MA
filtered_data = []

with open('10_02_us.csv', 'r') as file:
    reader = csv.DictReader(file, delimiter='\t')
    next(reader)
    
    for row in reader:
        postal_code = int(row['postal code'].lstrip('0'))  # Remove leading zeros
        if postal_code in primes and row['state code'] == 'MA':
            filtered_data.append(row)

print(f"Found {len(filtered_data)} prime postal codes in MA.")

				
			

Writing to a New CSV File

Finally, let’s write the filtered data to a new CSV file:

				
					# Write the filtered data to a new CSV file
with open('ma_prime.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    
    # Write the header
    writer.writerow(['postal code', 'place name', 'state code', 'county', 'latitude', 'longitude'])
    
    # Write the data rows
    for row in filtered_data:
        writer.writerow([row['postal code'], row['place name'], row['state code'], row['county'], row['latitude'], row['longitude']])

				
			

Example Code

Here’s the complete example code for reading, filtering, and writing CSV files:

				
					import csv

# Function to check for prime numbers
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

# List of prime numbers up to 99999
primes = [i for i in range(2, 100000) if is_prime(i)]

# Filter the data for prime postal codes in MA
filtered_data = []

with open('10_02_us.csv', 'r') as file:
    reader = csv.DictReader(file, delimiter='\t')
    next(reader)
    
    for row in reader:
        postal_code = int(row['postal code'].lstrip('0'))  # Remove leading zeros
        if postal_code in primes and row['state code'] == 'MA':
            filtered_data.append(row)

print(f"Found {len(filtered_data)} prime postal codes in MA.")

# Write the filtered data to a new CSV file
with open('ma_prime.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    
    # Write the header
    writer.writerow(['postal code', 'place name', 'state code', 'county', 'latitude', 'longitude'])
    
    # Write the data rows
    for row in filtered_data:
        writer.writerow([row['postal code'], row['place name'], row['state code'], row['county'], row['latitude'], row['longitude']])

				
			

Conclusion

Python’s csv module is a powerful tool for handling CSV files. Whether you’re reading, filtering, or writing data, it provides the necessary functions to work efficiently with CSV data. By understanding and utilizing these functions, you can easily manage your data files and perform various operations on them.