Manipulating Specific Values from a DataFrame – Read, Update, Delete

Manipulating values in a Pandas DataFrame is not something you can avoid during the process of analyzing data. The steps you need to take to read, update or delete specific values can become quite complex very quickly. In this article, let’s take a look at how you can read, update and delete specific values in a Pandas DataFrame without too much hassle. 

Creating a Pandas DataFrame 

Let’s kick off by creating a DataFrame that you will use to walk through this article. 

import pandas as pd
import numpy as np
names_dict ={
    ‘Name’:[‘Ken’,’Jeff’,’John’,’Mike’,’Andrew’,’Ann’,’Sylvia’,’Dorothy’,’Emily’,’Loyford’],
    ‘Age’:[31,52,56,12,45,np.nan,78,85,46,135],
    ‘Phone’:[52,79,80,75,43,125,74,44,85,45],
    ‘Uni’:[‘One’,’Two’,’Three’,’One’,’Two’,’Three’,’One’,’Two’,’Three’,’One’]
}

You can also create an index for this dataset. This will be needed in a later part of this article. 

index = [“Row_one”,”Row_two”,”Row_three”,”Row_four”,”Row_five”,”Row_six”,”Row_seven”,”Row_eight”,”Row_nine”,”Row_ten”]

Let’s now create the Pandas DataFrame using the data and the index.

df = pd.DataFrame(names_dict,index=index)

Accessing the data by the rows 

The next step is to look at how you can access data in the above DataFrame using the row names. You can achieve this by using the loc function and passing the label of the row. Passing the wrong label will result in a key error as shown below. 

Pass the correct label name in order to access the data. For instance, let’s use Row_one

df.loc[“Row_one”]

This returns the result as a Pandas Series.

That means that you can access the data in the series by their index. Let’s obtain the value at position 0. 

df.loc[“Row_one”][0]

You can also select the data between rows. For instance, you can select all the values from Row_one to Row_two

Here is how you can do that. 

df.loc[“Row_one”:”Row_two”]

This returns the result as a Pandas DataFrame and not a Series like in the last illustration. You can confirm that by checking the type. 

type(df.loc[“Row_one”:”Row_two”])

While making the selection, you can also indicate the columns you want to include in the final result. You will do this by passing their labels as shown below. This is a great option when you want to eliminate some columns from the final result. You can then save this new DataFrame in a new variable. 

df.loc[“Row_one”:”Row_two”, “Name”:”Age”]

Replacing values using loc

Let’s now take an example where you want to replace all the values in the DataFrame below.

You can do that by passing a list with the new values that you want. For example, the code below will replace the two rows in the above DataFrame with the values provided in the list. 

df.loc[“Row_one”:”Row_two”] = [“Joe”,34,45,”Two”]

You can now see that the two rows now contain the same information. 

However, you might want to make sure that each row gets updated with different values. You can do this, by passing two lists instead of one. 

This is illustrated below. 

df.loc[“Row_one”:”Row_two”] = [[“Joe”,34,45,”Two”],[“Derrick”,56,46,”Ten”]]

You can also select a single row and replace everything in that row. As an example, let’s replace everything in the row shown below. 

This can be done by passing a list the new values. 

df.loc[df[“Name”] == “Joe”] = [“Elijah”,37,89,”five”]

Conditional select

In the process of your analysis, you might want to select some data based on two or more conditions. Let’s take a look at how you can do that using iloc

df.loc[(df[“Name”] == “Elijah”) | ( df[“Phone”] == 89)]

For more details on selecting rows with Pandas, check out this detailed post.

Integer-location based indexing

In other situations, you might want to manipulate values based on their integer location. In that case, you have to pass the right integer index value while selecting the data. Doing otherwise will result in an error. 

If you are not sure about the integer location, you can start by checking that. For instance, let’s check the integer location for Row_one

df.index.get_loc(“Row_one”) # Get integer location, slice or boolean mask for requested label.

After that, you can access the data by passing the right integer to the `iloc` function. This returns the data as a Pandas Series. You can confirm that by checking its type. 

df.iloc[0]

Let’s take a look at how you can select data from rows 1 to 5 using this method. 

df.iloc[0:5]

This selects the data from index 0 up to but not including index 5. 

You can also select a row and replace the values just like you have seen previously. 

This canbe be achieved by passing the new values as a list. 

df.iloc[0:1] = [“Kevin”,34,45,”Two”]

You can also select a single value and replace it with a new value. To illustrate this let’s change the name Kevin to Eli. 

That is done by selecting Kevin and assigning the new name. 

df.iloc[0:1,0:1] = “Eli”

Negative indexing with iloc

You can also use iloc to select from the last rows in the DataFrame,i.e negative indexing. For instance, let’s pick the information at Row_seven

If you start counting from behind, that will give you -4. That’s the number to be passed to the iloc function. 

df.iloc[-4]

You can also select everything from -4 to the end of the DataFrame.

df.iloc[-4:]

iloc also allows you to select a certain portion of the data using negative indexing. 

df.iloc[-4:-2]

Don’t forget that you can also select the columns you want to include in the final result. 

df.iloc[-4:-2,0:3]

Dropping data using iloc

Let’s take an instance where you want to drop everything in the DataFrame below and remain with the rest of the DataFrame. 

You can achieve this by passing the above DataFrame to the drop function. You will pass the axis as 1 so that it can drop the two columns selected above.  

df.drop(df.iloc[0:, 0:2], axis = 1)

Final Thoughts on DataFrame Values

In this article, you have learned how you can read and manipulate data using the loc and iloc functions. You will find these two functions quite useful in the course of your data analysis. Often times you’ll find erroneous or outlier entries within a dataset that need rapid remediation. Manupulating dataframe values will get you the results you need quickly.

You can find the notebook with the code here

Happy analysing!