When working with Pandas you will often encounter some common errors. Luckily these errors are so prevalent that solutions have already been provided for them. These errors could occur when reading in files, performing certain operations such as grouping, and when creating Pandas DataFrames; just to mention a few. In this article, let’s take a look at a couple of these errors and their possible solutions.
ValueError: If using all scalar values, you must pass an index
This error occurs when you are trying to create a Pandas DataFrame using scalars while not passing an index. Let’s illustrate the error by creating a Pandas DataFrame using scalars.
name1 = ‘Derrick’ name2 = “John” df = pd.DataFrame({‘name1′:name1,’name2’:name2}) |
The solution is in the error message. The only thing you need to do is to pass an index.
df = pd.DataFrame({‘name1′:name1,’name2’:name2},index=[“names”]) |
The other alternative is not to use scalar at all. For instance, you can pass in the data as a list.
df = pd.DataFrame({‘name1′:[name1],’name2’:[name2]},) |
The other alternative is to use the `from_records` function while creating the DataFrame. You will then pass in the data as a sequence.
df = pd.DataFrame.from_records( [ { ‘name1′:name1,’name2’:name2 } ] ) |
The function also allows you to set the index of the DataFrame.
df = pd.DataFrame.from_records( [ { ‘name1′:name1,’name2’:name2 } ], index=[“names”] ) |
ValueError: cannot reindex from a duplicate axis
This error occurs when there are duplicate indices in a dataset. Let’s create a dataset with some duplicate indices in order to illustrate this.
names_dict ={ ‘Name’:[‘Ken’,’Jeff’,’John’,’Mike’,’Andrew’,’Ann’,’Sylvia’,’Dorothy’,’Emily’,’Loyford’], ‘Age’:[31,52,56,12,45,56,78,85,46,135], ‘Phone’:[52,79,80,75,43,125,74,44,85,45], ‘Uni’:[‘One’,’Two’,’Three’,’One’,’Two’,’Three’,’One’,’Two’,’Three’,’One’] } index = [“Row_one”,”Row_one”,”Row_three”,”Row_four”,”Row_five”,”Row_six”,”Row_seven”,”Row_eight”,”Row_nine”,”Row_ten”] |
Notice that the Row_one
index is repeated. Let’s use the data and the indices to create a Pandas DataFrame.
df = pd.DataFrame(names_dict,index=index) |
The duplicated
function can be used to check if a DataFrame contains duplicates.
df.index.duplicated() |
One of the scenarios that lead to the occurrence of this error is when you are trying to perform a grouby
on such a dataset and then pass the wrong axis. Let’s an example of performing a group operation on the above data using the Age
column. Since it’s a column, the axis should be 0. If you pass the axis as 1, it means that you using rows instead of columns. Since you are also trying to reset the index at the end of this operation, it will fail with the cannot reindex from a duplicate axis
error.
df.groupby(axis=1,by=’Uni’)[‘Age’].mean().reset_index() |
The solution here is first set the correct axis, i.e 0, and then rename the index or drop rows with duplicate index. Data without duplicate indices can be selected as shown below.
df[~df.index.duplicated()] |
The alternative here is to pass a new index or to reset the index of the DataFrame.If you want the original DataFrame to be affected, you’ll pass inplace=True
to the reset_index
function.
df.reset_index() |
UnicodeDecodeError when reading CSV file in Pandas with Python
This error occurs because of different file encodings. When you download a file or when a colleague sends you a file, you might not know which tool they used to create the file. Different tools may encode data differently. In fact, there are so many encoding types. That said, this error is usually solved by passing the common encoding types while loading the data.
df = pd.read_csv(‘data.csv’, encoding=’utf-8′) df = pd.read_csv(“data.csv”,encoding=’utf-16′) |
Alternatively, you can try and load in the data using a different engine. Pandas supports C and Python for the engine types.
df = pd.read_csv(‘data.csv’, engine=’python’) |
pandas.parser.CParserError: Error tokenizing data
This error usually occurs when the file being loaded doesn’t fit the format expected by the Pandas parser. One such scenario is having extra commas in a comma-separated file. You can fix this by opening the file in a text editor and removing the extra commas. You can also decide to skip these lines while loading in the data.
data = pd.read_csv(‘data.csv’, error_bad_lines=False) |
The error can also be caused by problems in the header of the file. You can open the file in a text editor and check for errors in the header. Alternatively, you can load the data without the header.
df = pandas.read_csv(‘data.csv’, header=None) |
If the data contains more than one header, you can skip those rows while loading the data.
df = pd.read_csv(‘data.csv’, skiprows=3) |
The error can also be as a result of passing the wrong delimiter or Pandas inferring the wrong delimiter. You can fix this by passing the correct delimiter. For example, here is how you can load data that is tab-separated.
data=pd.read_csv(“data.csv”, sep=’\t’) |
Final Thoughts
In this article, you have seen how you can solve a couple of common problems in Pandas. These are problems that you are very likely to encounter at some point in your analysis. You have also seen several possible solutions to these errors in Pandas. Something to note is that in most cases, the solution to these errors is usually in the error message. You, therefore, have to get comfortable and confident at reading and interpreting the error messages. However, copying and pasting the error messages on Google will often result in links to relevant solutions.
Checkout the notebook used in this article here.