Manipulating values in a Pandas DataFrame is not something you can avoid during the process of analyzing data. The steps you need to take to read, update or delete specific values can become quite complex very quickly. In this article, let’s take a look at how you can read, update and delete specific values in a Pandas DataFrame without too much hassle.
Creating a Pandas DataFrame
Let’s kick off by creating a DataFrame that you will use to walk through this article.
import pandas as pd import numpy as np names_dict ={ ‘Name’:[‘Ken’,’Jeff’,’John’,’Mike’,’Andrew’,’Ann’,’Sylvia’,’Dorothy’,’Emily’,’Loyford’], ‘Age’:[31,52,56,12,45,np.nan,78,85,46,135], ‘Phone’:[52,79,80,75,43,125,74,44,85,45], ‘Uni’:[‘One’,’Two’,’Three’,’One’,’Two’,’Three’,’One’,’Two’,’Three’,’One’] } |
You can also create an index for this dataset. This will be needed in a later part of this article.
index = [“Row_one”,”Row_two”,”Row_three”,”Row_four”,”Row_five”,”Row_six”,”Row_seven”,”Row_eight”,”Row_nine”,”Row_ten”] |
Let’s now create the Pandas DataFrame using the data and the index.
df = pd.DataFrame(names_dict,index=index) |
Accessing the data by the rows
The next step is to look at how you can access data in the above DataFrame using the row names. You can achieve this by using the loc
function and passing the label of the row. Passing the wrong label will result in a key error as shown below.
Pass the correct label name in order to access the data. For instance, let’s use Row_one
.
df.loc[“Row_one”] |
This returns the result as a Pandas Series.
That means that you can access the data in the series by their index. Let’s obtain the value at position 0.
df.loc[“Row_one”][0] |
You can also select the data between rows. For instance, you can select all the values from Row_one
to Row_two
.
Here is how you can do that.
df.loc[“Row_one”:”Row_two”] |
This returns the result as a Pandas DataFrame and not a Series like in the last illustration. You can confirm that by checking the type.
type(df.loc[“Row_one”:”Row_two”]) |
While making the selection, you can also indicate the columns you want to include in the final result. You will do this by passing their labels as shown below. This is a great option when you want to eliminate some columns from the final result. You can then save this new DataFrame in a new variable.
df.loc[“Row_one”:”Row_two”, “Name”:”Age”] |
Replacing values using loc
Let’s now take an example where you want to replace all the values in the DataFrame below.
You can do that by passing a list with the new values that you want. For example, the code below will replace the two rows in the above DataFrame with the values provided in the list.
df.loc[“Row_one”:”Row_two”] = [“Joe”,34,45,”Two”] |
You can now see that the two rows now contain the same information.
However, you might want to make sure that each row gets updated with different values. You can do this, by passing two lists instead of one.
This is illustrated below.
df.loc[“Row_one”:”Row_two”] = [[“Joe”,34,45,”Two”],[“Derrick”,56,46,”Ten”]] |
You can also select a single row and replace everything in that row. As an example, let’s replace everything in the row shown below.
This can be done by passing a list the new values.
df.loc[df[“Name”] == “Joe”] = [“Elijah”,37,89,”five”] |
Conditional select
In the process of your analysis, you might want to select some data based on two or more conditions. Let’s take a look at how you can do that using iloc
.
df.loc[(df[“Name”] == “Elijah”) | ( df[“Phone”] == 89)] |
For more details on selecting rows with Pandas, check out this detailed post.
Integer-location based indexing
In other situations, you might want to manipulate values based on their integer location. In that case, you have to pass the right integer index value while selecting the data. Doing otherwise will result in an error.
If you are not sure about the integer location, you can start by checking that. For instance, let’s check the integer location for Row_one
.
df.index.get_loc(“Row_one”) # Get integer location, slice or boolean mask for requested label. |
After that, you can access the data by passing the right integer to the `iloc` function. This returns the data as a Pandas Series. You can confirm that by checking its type.
df.iloc[0] |
Let’s take a look at how you can select data from rows 1 to 5 using this method.
df.iloc[0:5] |
This selects the data from index 0 up to but not including index 5.
You can also select a row and replace the values just like you have seen previously.
This canbe be achieved by passing the new values as a list.
df.iloc[0:1] = [“Kevin”,34,45,”Two”] |
You can also select a single value and replace it with a new value. To illustrate this let’s change the name Kevin to Eli.
That is done by selecting Kevin and assigning the new name.
df.iloc[0:1,0:1] = “Eli” |
Negative indexing with iloc
You can also use iloc
to select from the last rows in the DataFrame,i.e negative indexing. For instance, let’s pick the information at Row_seven
If you start counting from behind, that will give you -4. That’s the number to be passed to the iloc
function.
df.iloc[-4] |
You can also select everything from -4 to the end of the DataFrame.
df.iloc[-4:] |
iloc
also allows you to select a certain portion of the data using negative indexing.
df.iloc[-4:-2] |
Don’t forget that you can also select the columns you want to include in the final result.
df.iloc[-4:-2,0:3] |
Dropping data using iloc
Let’s take an instance where you want to drop everything in the DataFrame below and remain with the rest of the DataFrame.
You can achieve this by passing the above DataFrame to the drop
function. You will pass the axis as 1 so that it can drop the two columns selected above.
df.drop(df.iloc[0:, 0:2], axis = 1) |
Final Thoughts on DataFrame Values
In this article, you have learned how you can read and manipulate data using the loc
and iloc
functions. You will find these two functions quite useful in the course of your data analysis. Often times you’ll find erroneous or outlier entries within a dataset that need rapid remediation. Manupulating dataframe values will get you the results you need quickly.
You can find the notebook with the code here.
Happy analysing!