This numpy tutorial will cover all the basics of the numpy python library. Mastering this library is key if you wish to do meaningful data analysis in python. It’s heavily used for data analysis of all types including analyzing financial data.

The numpy library provides the ndarray object (an n-dimensional array) for holding data. A big reason why numpy is so popular is the advantages of the ndarray objects over typical lists in Python.

As you’ll see, key to the ndarray is that it allows vectorized operations. We’ll get into vectorized operations in this tutorial, but for now just know that it’s an important part of why we use numpy for data analysis.

### Numpy tutorial table of contents

For quick reference, here are the areas we will cover in this tutorial:

- Creating numpy arrays
- Shape and size of numpy arrays
- Extracting items and slicing numpy arrays
- Vectorized operations on numpy arrays
- Statistical operations such as mean, min and max
- Conditional operations using where and take
- Loading data from a file
- Joining multiple numpy arrays
- Sorting numpy arrays
- Working with dates in numpy arrays

### Creating numpy arrays

There are a few different ways you can create a numpy array.

Perhaps one of the most common ways is to create a numpy array from a Python list. You can utilize the numpy.array function (or np.array function as it’s common to refer to numpy as np).

#create the array

mylist = [0,1,2,3]

myarr = np.array(mylist)

#print the array

print(myarr)

[0 1 2 3]

#print the type

print(type(myarr))

<type ‘numpy.ndarray’>

It’s worth noting that even a 1-d array like above is still considered of type ndarray. The np.array function is just a function. There is not an array object in numpy.

You can also create a 2-d array from a Python list of lists.

#create the array

mylist = [[0,1,2,3],[4,5,6,7]]

myarr = np.array(mylist)

#print the array

print(myarr)

[[0 1 2 3]

[4 5 6 7]]

Another consideration when creating numpy arrays is specifying the data type. Unlike lists in Python, **all elements of a numpy array must be of the same data type.**

You can specify the data type when creating the array as follows:

#create the array

mylist = [[0,1,2,3],[4,5,6,7]]

myarr = np.array(mylist, dtype=’float’)

#print the array

print(myarr)

[[0. 1. 2. 3.]

[4. 5. 6. 7.]]

*Note that the above decimal point indicates that the numbers are of float data type.*

You can convert an existing array to a different data type by using the astype functon.

#create the array

mylist = [[0,1,2,3],[4,5,6,7]]

myarr = np.array(mylist, dtype=’float’)

myarr2 = myarr.astype(‘int’)

#print the array

print(myarr)

[[0 1 2 3]

[4 5 6 7]]

Note that if you aren’t sure about data type or want to hold a variety of data types in the array, you can use dtype=’object’ as follows:

#create the array

mylist = [0,1,’hello’]

myarr = np.array(mylist, dtype=’object’)

#print the array

print(myarr)

[0 1 ‘hello’]

Since we’ve talked about lists and the differences between them and numpy arrays, it’s also worth noting that you can convert a numpy array back to a list using the tolist() method.

#create the array

mylist = [0,1,2]

myarr = np.array(mylist)

mylist2 = myarr.tolist()

#print the list

print(mylist2)

[0, 1, 2]

#print the type of the list

print(type(mylist2))

<type ‘list’>

You can also create numpy arrays of a specific size and a set of default values quite easily using some numpy methods.

#create a 3×3 array with all zeroes

myarr = np.zeros([3,3])

#print the array

print(myarr)

[[0. 0. 0.]

[0. 0. 0.]

[0. 0. 0.]]

#create a 3×3 array with all ones

myarr = np.ones([3,3])

#print the array

print(myarr)

[[1. 1. 1.]

[1. 1. 1.]

[1. 1. 1.]]

Lastly, you can create numpy arrays with random values quickly as well.

# Create a 3×3 array with random numbers between [0,1)

print(np.random.rand(2,2))

[[0.5035 0.0745 0.3731]

[0.6996 0.1923 0.8273]

[0.3800 0.8746 0.7461]]

# Create a 3×3 array with random integers between [0, 10)

print(np.random.randint(0, 10, size=[3,3]))

[[5 3 9]

[7 7 8]

[3 6 3]]

### Shape and size of numpy arrays

It can occasionally be important to know the shape and size of your numpy arrays. There are some built-in methods to help you do this.

How to determine the number of elements in a numpy array:

#create the array

mylist = [[0,1,2],[3,4,5]]

myarr = np.array(mylist)

#print the number of elements

print(myarr.size)

6

How to determine the shape of a numpy array, and how to get the number of rows and columns of a 2d-array:

#create the array

mylist = [[0,1,2],[3,4,5]]

myarr = np.array(mylist)

#print the shape

print(myarr.shape)

(2,3)

#print the number of dimensions

print(myarr.ndim)

2

#print the number of rows

print(myarr.shape[0])

2

#print the number of columns

print(myarr.shape[1])

3

### Extracting items and slicing numpy arrays

Numpy arrays use zero-based indexing, so your indexing always begins at zero.

You can access a single element of an array using square brackets and indicating the index for each dimension:

#create and print the array

mylist = [[0,1,2],[3,4,5]]

myarr = np.array(mylist)

print(myarr)

[[0 1 2]

[3 4 5]]

#print the element at 1,1

print(myarr[1,1])

4

You can also take subsets of numpy arrays, often called slices by providing a start and stop value for each dimension separated by a colon (:). Note that the start value of the slice is included in the subset, but the stop value is not.

#create and print the array

mylist = [[0,1,2],[3,4,5],[6,7,8]]

myarr = np.array(mylist)

print(myarr)

[[0 1 2]

[3 4 5]

[6 7 8]]

#Take a 2×2 subset of the array starting with the 1,1 element

print(mylist2[1:3,1:3])

[[4 5]

[7 8]]

Similarly, you can leave off a starting value or stopping value to get the span of that dimension from either the zero-index on, or from a value to the end of the dimension:

#create and print the array

mylist = [[0,1,2],[3,4,5],[6,7,8]]

myarr = np.array(mylist)

print(myarr)

[[0 1 2]

[3 4 5]

[6 7 8]]

#Take the same 2×2 as in the previous example

print(mylist2[1:,1:])

[[4 5]

[7 8]]

You can grab sets of rows or columns of a 2d-array very easily using this slicing convention. This lets you perform operations on rows and columns quite easily and will be something you do quite often in data analysis.

#create and print the array

mylist = [[0,1,2],[3,4,5],[6,7,8]]

myarr = np.array(mylist)

print(myarr)

[[0 1 2]

[3 4 5]

[6 7 8]]

#Take 2nd row (index 1)

print(mylist2[1,:])

[3 4 5]

#Take the last two columns

print(mylist2[:,1:])

[[1 2]

[4 5]

[7 9]]

### Vectorized operations on numpy arrays

Vectorized operations are a big reason why numpy arrays are so useful. The ability to use these operations is also a key difference between numpy arrays and Python lists. Vectorized operations let you apply a function on each element in a vector very quickly and with minimal lines of code. Whereas performing a calculation on each element in a list would typically require some sort of loop, the vectorized calculations push this looping mechanism into the compiled layer where it can execute much faster. As such, these operations are much less costly computationally.

In this simple example, we have a 2d-array and want to add 3 to each element.

#create the array, and add 3 to each element

mylist = [[0,1,2],[3,4,5],[6,7,8]]

myarr = np.array(mylist)

myarr2 = myarr + 3

#print the array

print(myarr2)

[[3 4 5]

[6 7 8]

[9 10 11]]

Vector operations such as adding two arrays together is also easily possible.

#create two arrays

myarr = np.array([[0,1],[2,3]])

myarr2 = np.array([[4,5],[6,7]])

#print the array

print(myarr + myarr2)

[[4,6]

[8 10]]

### Statistical operations such as mean, min and max

Statistical analysis of numpy arrays is quite common and there are a number of functions built in to the numpy library.

mean(), max() and min() methods can be run on the numpy arrays directly to get the mean, max and min values of the entire array.

#create an array

myarr = np.array([[0,1],[2,3]])

#print the mean, min, max of the array

print(myarr.mean())

1.5

print(myarr.min())

0

print(myarr.max())

3

Similarly, these functions can be run on slices of the array as well.

#create an array

myarr = np.array([[0,1],[2,3]])

#print the mean, min, max of the first column of the array

print(myarr[:,0].mean())

1.0

print(myarr[:,0].min())

0

print(myarr[:,0].max())

2

Quite often you might want to calculate the mean of each column, or the max value of each row. Numpy also makes this very easy. The axis parameter is common in numpy. Remember that axis=0 refers to a column wise operation, and axis=1 refers to a row wise operation.

Note that these functions return a ndarray object.

#create an array

myarr = np.array([[0,1,2],[3,4,5]])

#print the mean, min, max of the each column

print(np.mean(myarr, axis=0))

[1.5 2.5 3.5]

print(np.amin(myarr, axis=0))

[0 1 2]

print(np.amax(myarr, axis=0))

[3 4 5]

#print the mean, min, max of the each row

print(np.mean(myarr, axis=1))

[1. 4.]

print(np.amin(myarr, axis=0))

[0 3]

print(np.amax(myarr, axis=0))

[2 5]

When you need the index of a minimum or a maximum value, rather than the value itself, the argmin() and argmax() functions can be used.

These functions can be used with or without an axis parameter. If no axis is used, then the array is flattened into a 1-dimensional array, and the index of the flattened array is returned.

#create an array

myarr = np.array([[0,1,2],[3,4,5]])

#print the index of the max value in the array

print(np.argmax(myarr))

5

#print the index of the min value in each row

print(np.argmin(myarr, axis=1))

[2 2]

### Conditional operations using where and take

On occasion, you may want to grab the elements of a numpy array that satisfy a particular condition. The where method from the numpy library is very useful for this.

Similarly, you can use the take method to grab the values of the array at a provided set of index values.

#create an array

myarr = np.array([0,1,5,2,3,4,5,1])

#get the index of the elements that satisfy the condition of the element being larger than 2

i = np.where(myarr > 2)

print(i)

(array([2,4,5,6]),)

#take the values of the array based on the given index values

v = myarr.take(i)

print(v)

[[5 3 4 5]]

### Loading data from a file into a numpy array

The np.genfromtxt function is quite useful in loading in a csv file or other data files. For this example, we’ll load in the data from this csv file. The np.genfromtxt function lets you specify a file address, web address and more.

#load the data from web URL

url = ‘https://nextlevel.finance/wp-content/uploads/2018/09/test.csv’

data = np.genfromtxt(url, delimiter=’,’, skip_header=0, filling_values=-1, dtype=’int’)

#Let’s look at the resulting array

print(data)

[[2 1 3 0 0]

[7 1 3 5 9]

[2 5 3 3 4]

[2 1 3 9 8]

[8 1 3 0 7]

[2 3 2 3 3]

[1 1 3 4 4]

[8 1 3 3 4]

[3 1 1 5 9]

[2 1 6 6 4]]

print(data.shape)

(10,5)

### Joining multiple numpy arrays

There are a few different ways you can concatenate numpy arrays. The numpy methods mostly used are np.concatenate, np.vstack and np.hstack.

#create two arrays

myarr1 = np.zeros([2,2])

myarr2 = np.ones([2,2])

print(myarr1)

[[0. 0.]

[0. 0.]]

print(myarr2)

[[1. 1.]

[1. 1.]]

#Concat vertically using concatenate

myarr3 = np.concatenate([myarr1, myarr2], axis=0)

print(myarr3)

[[0. 0.]

[0. 0.]

[1. 1.]

[1. 1.]]

#Concat vertically using vstack

myarr4 = np.vstack([myarr1,myarr2])

print(myarr4)

[[0. 0.]

[0. 0.]

[1. 1.]

[1. 1.]]

#Concat horizontally using concatenate

myarr5 = np.concatenate([myarr1, myarr2], axis=1)

print(myarr5)

[[0. 0. 1. 1.]

[0. 0. 1. 1.]]

#Concat horizontally using vstack

myarr6 = np.hstack([myarr1,myarr2])

print(myarr6)

[[0. 0. 1. 1.]

[0. 0. 1. 1.]]

### Sorting numpy arrays

When sorting numpy arrays, typically you want to maintain the integrity of the rows while sorting on a specific column. To do so, you want to use a two-step process using the argsort method rather than the sort method. The sort method will sort every column (or row) and thus corrupt the row (or column) integrity. We’ll look at the more complex case using argsort first since it’s the more common use case.

#create the array

myarr = np.array([[9,5,4],[2,1,6],[4,8,2]])

print(myarr)

[[9 5 4]

[2 1 6]

[4 8 2]]

#Let’s argsort the first column, and see what happens

sorted_index_col1 = myarr[:, 0].argsort()

#The output is the index numbers of the sorted values, So the 1st item is smallest, then the 2nd item, then the 0th item.

print(sorted_index_col1)

[1 2 0]

#Now we can use the index information to sort the array

sorted_arr = myarr[sorted_index_col1]

print(sorted_arr)

[[2 1 6]

[4 8 2]

[9 5 4]]

#We can also sort using the index information in a reversed or descending order

sorted_arr = myarr[sorted_index_col1[::-1]]

print(sorted_arr)

[[9 5 4]

[4 8 2]

[2 1 6]]

Lastly, let’s quickly look at using the numpy sort() method which does not maintain row/record integrity.

#create the array

myarr = np.array([[9,5,4],[2,1,6],[4,8,2]])

print(myarr)

[[9 5 4]

[2 1 6]

[4 8 2]]

#Let’s sort the columns of the numpy array

sorted_arr = np.sort(myarr, axis=0)

print(sorted_arr)

[[1 2 6]

[2 4 8]

[4 5 9]]

#Let’s sort the rows of the numpy array

sorted_arr = np.sort(myarr, axis=1)

print(sorted_arr)

[[4 5 9]

[1 2 6]

[2 4 8]]

### Working with dates in numpy arrays

Numpy uses the datetime64 object to implement dates. The object comes with a number of functions for manipulating dates.

#create a date

mydate = np.datetime64(‘2018-08-25 15:30:30’)

print(mydate)

2018-08-25T15:30:30

#create a range of dates

mydaterange = np.arange(np.datetime64(‘2018-08-01’), np.datetime64(‘2018-08-10’))

print(mydaterange)

[‘2018-08-01’ ‘2018-08-02’ ‘2018-08-03’ ‘2018-08-04’ ‘2018-08-05’ ‘2018-08-06’ ‘2018-08-07’ ‘2018-08-08’ ‘2018-08-09’]