
NOTE: It is important to note that the order of indices changes the order of rows and columns in the final DataFrame. This method can be useful for when creating arrays of indices via functions or receiving them as arguments. Produces the same DataFrame as the first example: Sofia_Grades = Report_Card.iloc As shown in the output DataFrame, we have the Lectures, Grades, Credits and Retake columns which are located in the 2nd, 3rd, 4th and 5th columns.įinally iloc can also accept integer arrays as a and b, which is exactly why our second iloc example: Sofia_Grades = Report_Card.iloc,] Hence we specify (2:), which indicates that we want all the columns starting from position 2 (ie., Lectures, where column 0 is Name, and column 1 is Class). This is the result we see in the DataFrame.Īs for the b argument, instead of specifying the names of each of the columns we want as we did with loc, this time we are using their numerical positions. As you can see in the original import of grades.csv, all the rows are numbered from 0 to 17, with rows 6 through 11 providing Sofia’s grades. When specifying a range with iloc, you always specify from the first row or column required (6) to the last row or column required+1 (12). In the first line of code, we’re using standard Python slicing syntax: iloc where a, in this case, is 6:12 which indicates a range of rows from 6 to 11.
Slice it all code#
In this case, we can examine Sofia’s grades by running: Sofia_Grades = Report_Card.iloc or else: Sofia_Grades = Report_Card.iloc,] Both of the above code snippets result in the following DataFrame: We can also slice the DataFrame created with the grades.csv file using the iloc function, which only accepts integers for the a and b values.
Slice it all series#
We are able to use a Series with Boolean values to index a DataFrame, where indices having value “True” will be picked and “False” will be ignored.įor the b value, we accept only the column names listed. In this case, we are using the function loc in exactly the same manner in which we would normally slice a multidimensional Python array.įor the a value, we are comparing the contents of the Name column of Report_Card with Benjamin Duran which returns us a Series object of Boolean values. This might look complicated at first glance but it is rather simple. For example: Grades = Report_Card.loc = "Benjamin Duran" ), ] We can simply slice the DataFrame created with the grades.csv file, and extract the necessary information we need. They want to see their son’s lectures, grades for these lectures, # of credits earned, and finally if their son will need to take a retake exam. For example, let’s say Benjamin’s parents wanted to learn more about their son’s performance at the school. For more complex operations, Pandas provides DataFrame Slicing using “loc” and “iloc” functions. Sometimes generating a simple Series doesn’t accomplish our goals. Let’s create a small DataFrame, consisting of the grades of a high schooler: classes = pd.Series() grades = pd.Series() pd.DataFrame()Īpart from the fact that our example student has pretty bad grades for History and Geography classes, we can see that Pandas has automatically filled in the missing grade data for the German course with “NaN”. Series are one dimensional labeled Pandas arrays that can contain any kind of data, even NaNs (Not A Number), which are used to specify missing data. #3 Creating a DataFrameīesides creating a DataFrame by reading a file, you can also create one via a Pandas Series. Other types of data would use their respective read function parameters. For example, to read a CSV file you would enter the following: data_frame = pd.read_csv( "name_of_the_file.csv" )įor our example, we’ll read in a CSV file (grade.csv) that contains school grade information in order to create a report_card DataFrame: One of the most common operations that people use with Pandas is to read some kind of data, like a CSV file, Excel file, SQL Table or a JSON file.

You should see something similar to: > 0.22.0 #2 Importing a Data Set in to Python

To see if Python and Pandas are installed correctly, open a Python interpreter and type the following: > import pandas as pd > pd._version_
