Iterating on rows in Pandas is a common practice and can be approached in several different ways. Usually, you need to iterate on rows to solve some specific problem within the rows themselves – for instance replacing a specific value with a new value or extracting values meeting a specific criteria for further analysis. Pandas DataFrames are fantastic at storing columnar data that is accessible through various methods which makes analysis of data stored in rows very easy.
In this tutorial, we’ll be exploring an open-source dataset from FSU as our sample dataset:
import pandas as pd
file_name = "https://people.sc.fsu.edu/~jburkardt/data/csv/homes.csv"
df = pd.read_csv(file_name)
After running the above commands, we should now have our data loaded into the df, DataFrame, variable.
DataFrame.index
The first approach we cover is to use the DataFrame.index function to iterate on rows. .index returns from a DataFrame the individual row identifier that can be looped through to return a column’s row contents. For more on how indexing and selecting data works, see the official Pandas documentation. In the example data we loop through the Sell variable and print out the contents.*
#use index to iterate over rows
#DataFrame.index returns the row label of each row
for i in df.index:
print(df['Sell'][i]) #In Python 2.7: print df['Sell'][i]
The result of running this loop is to iterate through the Sell column and to print each of the values in the Series.
DataFrame.iterrows()
Another way to iterate on rows in Pandas is to use the DataFrame.iterrows() function of Pandas. When accessing .iterrows() through a loop, both the index of a row and the row contents itself are returned as values. The values we actually want are the rows themselves which we will access in each iteration by accessing the single Sell column.
#iterate over rows using DataFrame.iterrows():
for index, row in df.iterrows():
print(row['Sell'])
Similar to our first approach, we see the same output from printing out each value in the Sell Series stored in our DataFrame.
Looping Using DataFrame.iloc
While the first two approaches focused on pure iteration and index functions that are covered in the official Pandas documentation, there are other methods of iterating rows. The first of these we will cover is DataFrame.iloc which allows for us to access the row number within a DataFrame.
for i in range(0, len(df)):
print(df.iloc[i]['Sell']) #Python 2.7: print df.iloc[i]['Sell']
Similar approaches can be taken to the above except loading DataFrame rows out to list, dictionary, or other data types. It is often required for someone to put a Series or DataFrame rows into a Python list format for use in some other analysis.
Summary
We’ve shown in the above that iterating over rows is not only simplistic but is achievable with several approaches. The ones we have covered here are:
- Looping using DataFrame.index
- Using DataFrame.iterrows()
- Looping with DataFrame.iloc
Some techniques we did not cover in this tutorial include itertuples() and iteritems(), both of which can be used similarly to iterrows(). Extracting data from Pandas column Series and columns is an extremely useful tactic if you need to perform other types of analysis with that data, or if you need to put it into some other data-structure.
For the code surfaced in the screenshots above, please visit our GitHub repository on Data Analysis. More information on common Pandas operations can be found in our detailed tutorials.
*Note that the print() function used from the Python standard library here is for Python 3.