Numpy

Song Jiaming | 16 May 2020

Disclaimer: The structure and content referred from the github repository Data-Science-Notes. I wrote this post as my summary notes and it’s not for any business purpose.

To start using numpy, we import numpy as np

1. Basic Operations

check the dimensions: array.ndim
check the shape: array.shape
check the total number of elements in array: array.size

2. Array type

Create array from list: np.array(any_dimension_list, dtype=np.int32)
- dtype is the dat atype of the elements in array
- np.int32 is the default data type
- Refer more numpy types here
Create an array of zeros: np.zeros((shape), dtype=atype)
Create an array of ones: np.ones((3,4))
Create an empty array: np.empty((shape))
Change an array shape to a specified array: arr.reshape((shape))
Create a series: np.arange(start, end+1, step)
- e.g. np.arange(0,11,2) => [0,2,4,6,8,10]
- np.arange(n) => start from 0 to n, increment by 1 for each step
- np.arange(end,start-1, -1) , decrease by 1 from end to start
Create a range of data: ```np.linespace(start_range, end_range, num_items_in_between)
- e.g. np.linespace(1,10,20) => [ 1 1.47368421 1.94736842 2.42105263 2.89473684 3.36842105 3.84210526 4.31578947 4.78947368 5.26315789 5.73684211 6.21052632 6.68421053 7.15789474 7.63157895 8.10526316 8.57894737 9.05263158 9.52631579 10. ]
Create an array by repeating an array: np.title(starting_array, (x,y)
- copy the starting_array x rows and y columns
- i.e. np.title([0,1],(3,2)) resulting in [[0,1,0,1],[0,1,0,1],[0,1,0,1]

Computation

a = np.array([10,20,30,40]) # [10,20,30,40]
b = np.arange(4) # [0,1,2,3]

a-b, a+b, a * b (0,20,60,120)
a ** n each element in a to power of n
np.sin(a), sin(x) for x in a
b < 2 => [True, True, False, False]
a==b element wise check of if a==b => [False, False False, False]
Matrix dot product: a.dot(b) or np.dot(a,b)
Minimum and maximum: min(a), max(a)
Axis:
- 0: find/sum according to each column
- 1: find/sum according to each row
Sum of each column:
- 0 0 0
  
  1 1 1
- np.sum(arr, axis=1) => [0,1]
- np.sum(arr, axis=0) => [1,1,1]
Get the index of the largest/smallest element in a Matrix A: np.argmax(A), np.argmin(A)
Mean of the whole matrix: np.mean(A) or np.average(A)
Median: np.median(A)
Cumulative sum of elements in A: np.cumsum(A)
- i.e. A = [1,2,3,4] => np.cumsum(A) = [1,3,6,10]
Difference between 2 consecutive elements: np.diff(B)
- B = [[1,2,5],[6,8,9]], diff(B)=> [[1,3],[2,1]]
Get index of nonzero element in an array: np.nonzero(A)
- A = [[3, 0, 0], [0, 4, 0], [5, 6, 0]]
- Returns (array([0, 1, 2, 2]), array([0, 1, 0, 1]))
- which means position(0,0),(1,1), (2,0),(2,1) contains nonzero elements
- Further usage: A[np.nonzero(A)] => array([3,4,5,6]). However, A[A != 0] is prefered
Sort each row in ascending order: np.sort(A)
Transpose: np.transpose(A) or A.T
Change every element outside a range to the boundary value np.clip(A, min, max)
- If the min <= element <= max, then keep the original value
- If the element > max => max instead
- If the element < min, => min instead

Index

A[index], starts from index 0
For 2D array: A[row_index][col_index] or A[row_index, col_index]
To get a list of rows or columns: A[row_start:row_end][col_start:col_end]
To get elements at multiple positions: A[(0,1,2),(4,5,6)]. this will get you elements at (0,4),(1,5),(2,6)
Get elements using mask: mask is actually a bool array which indicates which position you want to get
- i.e. mask = np.array([1,0,1,0],dtype=np.bool) which means we want some elements at index 0,2 (as they are 1 == True)
- Given a = np.array(['a','b','c','d]), a[mask] the resulting array is ['a','c']
Flatten a multidimential array to 1D: A.flatten()

Merge

Merge A and B with A on top: np.vstack((A,B))
Merge A and B with A on left: np.hstack((A,B))
or np.concatenate((A,B),axis=0) 0: vertical, 1: horizontal
Transform to matrix:
- if an array has a shape: (n,), it means the array is a series, not a matrix. To transform, we need a function np.newaxis
- Let A = [1,1,1], A[np.newaxis, :] changes to [[1,1,1]]. np.newaxis at row index means it creates a new row axis, while : means all elements in A
- Similarly, A[:, np.newaxis] change it to [[1]],[1],[1]]

Slicing

Equal split - A has a size of multiple of n: np.split(A, n, axis=1) Cut A to n pieces vertically (similar to row), * Unequal split
Unequal split - A does not have a size of multiply of n, e.g. 10 np.array_split(A,3, axis=1)
Others: np.vsplit(A,3) == np.split(A,3, axis=0) and np.hsplit((A,2)) == np.split(A,2,axis=1)

Coping and assigning

Using = will let every assigning point to the same object
- Let a = [0,1,2,3], b=a, c=b then , both b and c == [0,1,2,3]
- a[index] = a_number: set position index of a to a_number
- Set a[0]=11 => all a,b,c ==[11,1,2,3]
- print(c is a) or print(b is a) ==> True
- c[1:3] =[22,33] set index 1 and 2 of c to be 22 and 33, then print(a) ==> [11,22,33,3]
Using copy() will get an deep copy, it does not relate to the original array
- a = [0,1,2,3] and let b=a.copy()
- Set a[0] = 11, b will remain to be [0,1,2,3]

Broadcast

When 2 arrays have different shape, and we want to perform basic operations, then broadcasting will happen
- Let a = array([[0,0,0,1,1,1,2,2]), b= array([10,10,10])
- a+b == [[10,10,10,11,11,11,12,12,12]]

Useful functions:

np.bincount(): count of frequency of elements which are equal to the index
- x=np.array([1,2,0,1,4,1]) => np.bincount(x) == [1,3,1,0,1]
- Interpretation: in x there are one 0, three 1s, zero 3, one 4.
- Thus for bincount, element at index 0 is the count of 0, thus 1, element at index 1 is the count of 1, thus 3, etc
- Since the maximum length of bincount(x) is the biggest element in x, i.e. if x has a biggest element 100, then the length of bincount(x) is 101
- If we want extend the length by appending 0, simply add np.bincount(x, minlength=7) ==> [1,3,1,0,1,0,0]
np.bincount() with weights:
- x=np.array([1,2,0,1,4,1]), w = [0.3,0.5,0.7,0.6,0.1,-0.9]
- Then np.bincount(x, weights=w) outputs [0.7,0,0.5,0,0.1]
- Interpretation: 0 is at index 2 of x => w[2] == 0.7, thus weight at index 0 is 0.7
- 1 is at index 0,3,5 of x, index 0,3,5 of w are 0.3, 0.6,-0.9, sums to 0
- there is no 3 in x, thus it’s weight is 0

References

Data-Science-Notes