Numpy

Song Jiaming | 16 May 2020


Disclaimer: The structure and content referred from the github repository Data-Science-Notes. I wrote this post as my summary notes and it’s not for any business purpose.


To start using numpy, we import numpy as np

1. Basic Operations

  • check the dimensions: array.ndim
  • check the shape: array.shape
  • check the total number of elements in array: array.size


2. Array type

  • Create array from list: np.array(any_dimension_list, dtype=np.int32)
    • dtype is the dat atype of the elements in array
    • np.int32 is the default data type
    • Refer more numpy types here
  • Create an array of zeros: np.zeros((shape), dtype=atype)
  • Create an array of ones: np.ones((3,4))
  • Create an empty array: np.empty((shape))
  • Change an array shape to a specified array: arr.reshape((shape))
  • Create a series: np.arange(start, end+1, step)
    • e.g. np.arange(0,11,2) => [0,2,4,6,8,10]
    • np.arange(n) => start from 0 to n, increment by 1 for each step
    • np.arange(end,start-1, -1) , decrease by 1 from end to start
  • Create a range of data: ```np.linespace(start_range, end_range, num_items_in_between)
    • e.g. np.linespace(1,10,20) => [ 1 1.47368421 1.94736842 2.42105263 2.89473684 3.36842105 3.84210526 4.31578947 4.78947368 5.26315789 5.73684211 6.21052632 6.68421053 7.15789474 7.63157895 8.10526316 8.57894737 9.05263158 9.52631579 10. ]
  • Create an array by repeating an array: np.title(starting_array, (x,y)
    • copy the starting_array x rows and y columns
    • i.e. np.title([0,1],(3,2)) resulting in [[0,1,0,1],[0,1,0,1],[0,1,0,1]

Computation

a = np.array([10,20,30,40]) # [10,20,30,40]
b = np.arange(4) # [0,1,2,3]
  • a-b, a+b, a * b (0,20,60,120)
  • a ** n each element in a to power of n
  • np.sin(a), sin(x) for x in a
  • b < 2 => [True, True, False, False]
  • a==b element wise check of if a==b => [False, False False, False]

  • Matrix dot product: a.dot(b) or np.dot(a,b)
  • Minimum and maximum: min(a), max(a)
  • Axis:
    • 0: find/sum according to each column
    • 1: find/sum according to each row
  • Sum of each column:
    • 0 0 0
      1 1 1
    • np.sum(arr, axis=1) => [0,1]
    • np.sum(arr, axis=0) => [1,1,1]
  • Get the index of the largest/smallest element in a Matrix A: np.argmax(A), np.argmin(A)
  • Mean of the whole matrix: np.mean(A) or np.average(A)
  • Median: np.median(A)
  • Cumulative sum of elements in A: np.cumsum(A)
    • i.e. A = [1,2,3,4] => np.cumsum(A) = [1,3,6,10]
  • Difference between 2 consecutive elements: np.diff(B)
    • B = [[1,2,5],[6,8,9]], diff(B)=> [[1,3],[2,1]]
  • Get index of nonzero element in an array: np.nonzero(A)
    • A = [[3, 0, 0], [0, 4, 0], [5, 6, 0]]
    • Returns (array([0, 1, 2, 2]), array([0, 1, 0, 1]))
    • which means position(0,0),(1,1), (2,0),(2,1) contains nonzero elements
    • Further usage: A[np.nonzero(A)] => array([3,4,5,6]). However, A[A != 0] is prefered
  • Sort each row in ascending order: np.sort(A)
  • Transpose: np.transpose(A) or A.T
  • Change every element outside a range to the boundary value np.clip(A, min, max)
    • If the min <= element <= max, then keep the original value
    • If the element > max => max instead
    • If the element < min, => min instead

Index

  • A[index], starts from index 0
  • For 2D array: A[row_index][col_index] or A[row_index, col_index]
  • To get a list of rows or columns: A[row_start:row_end][col_start:col_end]
  • To get elements at multiple positions: A[(0,1,2),(4,5,6)]. this will get you elements at (0,4),(1,5),(2,6)
  • Get elements using mask: mask is actually a bool array which indicates which position you want to get
    • i.e. mask = np.array([1,0,1,0],dtype=np.bool) which means we want some elements at index 0,2 (as they are 1 == True)
    • Given a = np.array(['a','b','c','d]), a[mask] the resulting array is ['a','c']
  • Flatten a multidimential array to 1D: A.flatten()

Merge

  • Merge A and B with A on top: np.vstack((A,B))
  • Merge A and B with A on left: np.hstack((A,B))
  • or np.concatenate((A,B),axis=0) 0: vertical, 1: horizontal
  • Transform to matrix:
    • if an array has a shape: (n,), it means the array is a series, not a matrix. To transform, we need a function np.newaxis
    • Let A = [1,1,1], A[np.newaxis, :] changes to [[1,1,1]]. np.newaxis at row index means it creates a new row axis, while : means all elements in A
    • Similarly, A[:, np.newaxis] change it to [[1]],[1],[1]]

Slicing

  • Equal split - A has a size of multiple of n: np.split(A, n, axis=1) Cut A to n pieces vertically (similar to row), * Unequal split
  • Unequal split - A does not have a size of multiply of n, e.g. 10 np.array_split(A,3, axis=1)
  • Others: np.vsplit(A,3) == np.split(A,3, axis=0) and np.hsplit((A,2)) == np.split(A,2,axis=1)

Coping and assigning

  • Using = will let every assigning point to the same object
    • Let a = [0,1,2,3], b=a, c=b then , both b and c == [0,1,2,3]
    • a[index] = a_number: set position index of a to a_number
    • Set a[0]=11 => all a,b,c ==[11,1,2,3]
    • print(c is a) or print(b is a) ==> True
    • c[1:3] =[22,33] set index 1 and 2 of c to be 22 and 33, then print(a) ==> [11,22,33,3]
  • Using copy() will get an deep copy, it does not relate to the original array
    • a = [0,1,2,3] and let b=a.copy()
    • Set a[0] = 11, b will remain to be [0,1,2,3]

Broadcast

  • When 2 arrays have different shape, and we want to perform basic operations, then broadcasting will happen
    • Let a = array([[0,0,0,1,1,1,2,2]), b= array([10,10,10])
    • a+b == [[10,10,10,11,11,11,12,12,12]]

Useful functions:

  • np.bincount(): count of frequency of elements which are equal to the index
    • x=np.array([1,2,0,1,4,1]) => np.bincount(x) == [1,3,1,0,1]
    • Interpretation: in x there are one 0, three 1s, zero 3, one 4.
    • Thus for bincount, element at index 0 is the count of 0, thus 1, element at index 1 is the count of 1, thus 3, etc
    • Since the maximum length of bincount(x) is the biggest element in x, i.e. if x has a biggest element 100, then the length of bincount(x) is 101
    • If we want extend the length by appending 0, simply add np.bincount(x, minlength=7) ==> [1,3,1,0,1,0,0]
  • np.bincount() with weights:
    • x=np.array([1,2,0,1,4,1]), w = [0.3,0.5,0.7,0.6,0.1,-0.9]
    • Then np.bincount(x, weights=w) outputs [0.7,0,0.5,0,0.1]
    • Interpretation: 0 is at index 2 of x => w[2] == 0.7, thus weight at index 0 is 0.7
    • 1 is at index 0,3,5 of x, index 0,3,5 of w are 0.3, 0.6,-0.9, sums to 0
    • there is no 3 in x, thus it’s weight is 0

References

  1. Data-Science-Notes