Numpy Basics for beginners - Python lists and arrays data generation (sequential data) ##Feed1

 

Photo by Jan Antonin Kolar on Unsplash

Numpy is a library in pandas that is useful for generating random data and performing mathematical operations.

Numpy stands for "Numerical python". Numpy is all about multi-dimensional arrays, distributions of data. Numpy has many built-in functions and capabilities. Some of them were vectors, arrays, matrices, and number generation.

What is a list?

List is a collection of items of different data types. These items were not stored in a contiguous memory location. Since the elements in the list have different data types, python should store the data type of each element. For this reason, lists occupy more space in memory and are less efficient.

Ex: [1,  'a', 4.2, 'hello']

What is an array?

The array is a data structure consisting of a collection of items. These items were stored in a contiguous memory location. So these can be accessed using an index. As arrays have elements of the same data type, python will store the data type of only one element. Thus arrays occupy less space in memory and are more efficient.

Ex: [1, 2, 3, 4, 5, 8, 9]

Before using NumPy library we need to import NumPy 

import numpy as np


1.      Generating a list of items or elements in python

  Syntax:

   List_variable= [ elements separated by comma]

 Ex1: Generating a list of 6 elements

Here a variable called list is created with 6 elements. 

 >> list= [1,2,3,4,5,6]

>> list

It is a good practice to assign the elements in a variable so that the variable can be accessed through the program.

To check the elements in the list variable, print the list variable. In Anaconda-jupyter notebook simply type list and run it.

   Output:

[1, 2, 3, 4, 5, 6]


Ex2: Generating a list of alphabets

  >> str_list= ['a', 'b', 'c', 'd', 'e', 'f']

  >>str_list

   Output:

    ['a', 'b', 'c', 'd', 'e', 'f']


Note: Use quotes when dealing with string characters.


Ex 3: Generating a list of fruits

>> fruits= ['Mango','Apple','Orange','Pine','grapes','watermelon','strawberry']

>> fruits

   Output:

   ['Mango', 'Apple', 'Orange', 'Pine', 'grapes', 'watermelon', 'strawberry']


2. Generating a matrix using python list

  Syntax:

[ [ elements of row 1], [elements of row 2], [elements of row 3], .... [elements of row n] ]

In  a list, we would pass individual elements inside a square bracket, but to create a matrix we need to pass multiple lists inside a square bracket ([ ]).

Here the number of square brackets in blue color ( [ ] ) inside the list indicates the number of rows in a matrix and the elements in each square bracket ( [ ] ) are the elements of corresponding rows. Here the number of elements indicates the number of columns. 

 Ex1: Creating a matrix of  3 rows and 3 columns

>> matrix1= [[1,2,3],[4,5,6],[7,8,9]]

 >> matrix1

   Output:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Here 1, 2, 3 are the elements of row1
Here 4, 5, 6 are the elements of row2
Here 7, 8, 9 are the elements of row3

 Ex2: Creating a list of lists

>> matrix2= [[1,2,3],[4,5,6],[7,8]]

 >> matrix2

   Output:

[[1, 2, 3], [4, 5, 6], [7, 8]]

What are the differences between lists and arrays?


a) In Python, a list is a collection of items that can be of multiple data types (integer, float, string) but an array is a collection of items of similar data types.

b) List cannot handle mathematical operations but arrays can handle mathematical operations.

c) Using a list we cannot generate a large sequence of random data.

d)List occupy more space in memory and are less efficient but arrays occupy less space in memory

and are more efficient.


      3. Creating arrays from python list

  Syntax:

  np.array(listvariable)

Ex1: Creating an array using list

 >> list= [1,2,3,4,5,6]

>> np.array(list)

   Output:

array([1, 2, 3, 4, 5, 6])

 Ex2: Creating a matrix of  3 rows and 3 columns

    >> matrix1= [[1,2,3],[4,5,6],[7,8,9]]

    >> np.array(matrix1)

   Output:


Remember the number of elements in each row should be the same otherwise, a matrix cannot be created. It will remain as a list of multiple lists. Check the Ex 3 - creating lists of lists.

 Ex3: Creating a list of lists

   >> list_lists= [[1,2,3],[4,5,6],[7,8]]

   >> np.array(list_lists)

     Output:

array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8])], dtype=object)

Here the number of elements in each row were not same. The first and second row has 3 elements but the third row has only 2 elements. So this is not a matrix but simply a list of lists.


Numpy built-in methods to generate data:

  • In Numpy one dimensonal (1D) arrays are called vectors and two dimensional (2D) arrays are called matrices.
  • Numpy library have many built-in methods to generate arrays like arange,linspace,rand, randn etc.


Let's see each of these methods:


a) arange( )


arange function returns evenly spaced values within the given interval.


 Syntax:

  np.arange(start_number, stop_number, step_size)

Here the default stepsize is 1. stepsize indicates the difference between two consecutive values in the     array.

   Ex1: Generating values in the range 0 to 9

>> np.arange(start=0, stop=9)

Simply we can write the same thing as

>>np.arange(0,9)

   Output:

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

Remember the last number in the resulted array is the value 1 less than the value specified in the stop_number. So to display numbers from 0 to 9 including 9 then we need to specify the stop_number as 10 ( 9 + 1 = 10).
In this example, step_size was not defined, so it considered default stepsize of 1.

Ex2: Generating values in the range 0 to 9 with step size of 2

>>np.arange(start_number=0,stop_number=9,step_size=2) 

or simply we can pass the numbers as 

>>np.arange(0, 9, 2)

   Output:

array([0, 2, 4, 6, 8])

In this example, a step size of 2 is defined, therefore np.arange() returns values between 0 and 9 with difference of 2 between any two consecutive values in the array.

4   Ex3: Generating values in the range 0 to 9 with a step size of 3

>>np.arange(0, 9, 3)

   Output:

array([0, 3, 6])

In this example a step_size of 3 is defined, therefore np.range() returns values between 0 to 9 with difference of 3 between any two consequetive numbers in the array. As the stop_number is 9, arange() can return values only upto 8. Therefore the resulted array has only 0, 3, 6 values.

  Ex4: Generating values in the range 0 to 10 with step size of 3

>>np.arange(0, 10, 3)

   Output:

array([0, 3, 6, 9])

In this example a step_size of 3 is defined, therefore np.range() returns values between 0 to 10 with difference of 3 between any two consequetive numbers in the array. As the stop number is 10, arange() can return values only upto 9. Therfore the resulted array has 0, 3, 6, 9 values.


b) linspace()


linspace function returns evenly spaced numbers within the given interval.


  Syntax:

np.linspace(start_number,stop_number,num=50,endpoint=True,retstep=False,dtype=None, axis=0)


  • start_number is the start value of the sequence and stop_number is the end value of the sequence.num is the number of values or samples to be generated. By default num is set to 50.
  • endpoint decides whether to consider the stop_number or not in the array. It endpoint=True then stop_number will be included in the return array. If endpoint=False then stop_number is not included in the returned array. By default endpoint is set to True
  • retstep decides whether to return the stepsize along with array or not. If retstep=True linspace returns tuple including both array and stepsize ie., return(array,stepsize). If retstep=False linspace returns only array i.e., return(array). By default retstep is set to False.
  • dtype defines the datatype of the returned array. Give dtype=int to get integer values with in the specified range. Give dtype=float to get the float values with in the specified range. If dtype was not mentioned (dtype=None) then returned array have float values though the values in the array are integers.
  • axis is defined only for multidimensional arrays. For 1D array axis cannot be defined.

In general among these arguements start_number, stop_number and num were commonly
used.

   Ex1: Generating 5 values in the range 0 to 9 

    >> np.linspace(0, 9, 5)

   Output:

array([0.  , 2.25, 4.5 , 6.75, 9.  ])

In this example we have given start_number=0, stop_number=9 and num=5. As other arguements were not given, default values of these arguements will be considered by linspace. This example returned values including 9 as the endpoint is set true by default and these are float values since we have not mentioned the data type of the array.


So check the same example without considering the endpoint (endpoint = False).

   Ex2: Generating 5 values in the range 0 to 9 with endpoint as False

    >>np.linspace(0, 9, 5, endpoint=False)

   Output:

array([0. , 1.8, 3.6, 5.4, 7.2])

In both example1 and example2 linspace generates 5 values between 0 and 9. The output of these examples were different because 9 is considered in the array as endpoint=True by default in Ex1 but in Ex2 we set the endpoint to False so 9 is not considered in the array.As a result 5 evenly spaced values were generated between 0 to 8.


   Ex3: Generating 5 values in the range 0 to 9 without considering the endpoint and also return the stepsize

    >> np.linspace(0, 9, 5, endpoint=False,retstep=True)

   Output:

(array([0. , 1.8, 3.6, 5.4, 7.2]), 1.8)

This result is same as the result of Ex2 (just above this example) but with inclusion of stepsize. Here 1.8 is the stepsize. We know stepsize is the diference between the two consecutive values in the array.

Ex4: Generating values in the range 0 to 9 

 >>np.linspace(0, 9)

Output:


Here nuber of elements to be diaplayed (num) is not specified, so by default linspace will return 50 values between the specified interval. In this example 50 values were created between 0 to 9. As endpoint is not specified as either True or False so by default endpoint is set to true indicating the inclusion of stop_number which was 9.

c) Zeros()

This method generates an array of zeros.

Syntax:
a) For one dimensional array:

  np.zeros(number of elements)

b) For two dimensional array:

  np.zeros((rows, columns))  

Ex1: Generate an array of 6 zeros.


    >>np.zeros(6)

  Output:
array([0., 0., 0., 0., 0., 0.])

Ex2: Generate an matrix array of zeros with 3 rows and 4 columns

   >>np.zeros((3,4))
  Output:


d) ones()

This method generates an array of ones.

Syntax:
a) For one dimensional array:

np.ones(number of elements)


b) For two dimensional array:


  np.ones((rows, columns))  


Ex1: Generate an array of 5 ones.


 >> np.ones(5)

  Output:
array([1., 1., 1., 1., 1.])

Ex2: Generate an matrix array of ones with 3 rows and 3 columns

 >> np.ones((3,3))
  Output:




e) eye( )

This function creates an identity matrix.

Syntax:


np.eye(row or column size)


Identity matrix is a square matrix with same number of rows and columns. Identity matrix consists of one in the diagonal and all the other elements will be zero.


Ex1: Generate an identity matrix array of 4x4.

>> np.eye(4)
  Output:



By default these functions np.zeros(), np.ones() and np.eye() will generate float values. To generate       integer values we need to specify the datatype.

Ex2: Generate an identity matrix array of 4x4 with integer values.


>> np.eye(4,dtype=int)
  Output:



5.      Hope this was helpful to create data and play with it...


Comments