Python NumPy Library

NumPy Library

NumPy is an open-source Python library for numerical computations. It stands for “Numerical Python.” NumPy provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. It is a fundamental library for data science and scientific computing in Python.

Key features of NumPy include:

  1. Multi-dimensional arrays: NumPy’s main object is the `ndarray` (n-dimensional array), which allows you to store and manipulate homogeneous data efficiently.
  1. Broadcasting: NumPy allows operations between arrays with different shapes and dimensions by automatically broadcasting the smaller array to match the shape of the larger one.
  1. Mathematical functions: NumPy provides a wide range of mathematical functions like sine, cosine, exponentiation, logarithms, etc.
  1. Array manipulation: NumPy provides functions for reshaping, stacking, and splitting arrays, as well as advanced indexing and slicing operations.
  1. Linear algebra: NumPy has built-in support for basic linear algebra operations like matrix multiplication, solving linear systems etc.

Let’s see some practical examples to understand how Numpy works with data. 

Example 1

				
					import numpy as np

data = [10, 20, 30, 40, 50]

# Calculate the mean
mean_value = np.mean(data)
print("Mean:", mean_value)

# Calculate the median
median_value = np.median(data)
print("Median:", median_value)

# Calculate the standard deviation
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

# Calculate the variance
variance = np.var(data)
print("Variance:", variance)


# Find the minimum value
min_value = np.min(data)
print("Minimum Value:", min_value)

# Find the maximum value
max_value = np.max(data)
print("Maximum Value:", max_value)


# Calculate the 25th percentile
percentile_25 = np.percentile(data, 25)
print("25th Percentile:", percentile_25)

# Calculate the 75th percentile
percentile_75 = np.percentile(data, 75)
print("75th Percentile:", percentile_75)

				
			

When you run this script, it will output:

				
					Mean: 30.0
Median: 30.0
Standard Deviation: 14.142135623730951
Variance: 200.0
Minimum Value: 10
Maximum Value: 50
25th Percentile: 20.0
75th Percentile: 40.0

Process finished with exit code 0
				
			

These are just a few examples of basic statistics operations that can be performed using NumPy. The library offers a wide range of mathematical functions for more advanced statistical computations as well. NumPy is a foundational library for data manipulation and numerical computing in Python and is often used in combination with other libraries like Pandas, Matplotlib, and SciPy to perform comprehensive data analysis tasks.

 

Example 2

Calculating the Mean of the Test Score, Finding the Median from a Collection of Income Data

1. Calculating the Mean of Test Scores:

				
					import numpy as np

# Sample test scores as a list
test_scores = [85, 90, 78, 92, 88, 95]

# Convert the list to a NumPy array
scores_array = np.array(test_scores)

# Calculate the mean of the test scores
mean_score = np.mean(scores_array)

# Display the mean test score
print("Mean Test Score:", mean_score)

				
			

When you run this script, it will output:

				
					Mean Test Score: 88.0

Process finished with exit code 0
				
			

2. Finding the Median from a Collection of Income Data:

				
					import numpy as np

# Sample income data as a list
income_data = [50000, 60000, 45000, 80000, 55000, 70000, 65000]

# Convert the list to a NumPy array
income_array = np.array(income_data)

# Calculate the median of the income data
median_income = np.median(income_array)

# Display the median income
print("Median Income:", median_income)

				
			

When you run this script, it will output:

				
					Median Income: 60000.0

Process finished with exit code 0
				
			

In both cases, we first convert the given data (test scores and income data) into NumPy arrays using np.array(). Then, we calculate the mean using np.mean() and the median using np.median() functions. The resulting mean test score is 88.0, and the median income is 60000.0.

 

Example 3

				
					import numpy as np

# Create an array
arr = np.array([0, np.pi/4, np.pi/2, 3*np.pi/4, np.pi])

# Element-wise sine
result_sin = np.sin(arr)

# Element-wise cosine
result_cos = np.cos(arr)

# Element-wise exponential
result_exp = np.exp(arr)

print("Array:", arr)
print("Sine:", result_sin)
print("Cosine:", result_cos)
print("Exponential:", result_exp)

				
			

When you run this script, it will output:

				
					Array: [0.         0.78539816 1.57079633 2.35619449 3.14159265]
Sine: [0.00000000e+00 7.07106781e-01 1.00000000e+00 7.07106781e-01
 1.22464680e-16]
Cosine: [ 1.00000000e+00  7.07106781e-01  6.12323400e-17 -7.07106781e-01
 -1.00000000e+00]
Exponential: [ 1.          2.19328005  4.81047738 10.55072407 23.14069263]

Process finished with exit code 0
				
			

In summary, the code demonstrates how to perform element-wise operations on a NumPy array. It calculates the sine, cosine, and exponential of each element in the array arr using the respective NumPy functions. Element-wise operations apply the specified mathematical function to each element of the array, resulting in a new array with the calculated values.

 

Example 4

				
					import numpy as np

# Create a simple NumPy array
simple_array = np.array([1, 2, 3, 4, 5])

# Display the array
print("Simple NumPy array:")
print(simple_array)

# Perform basic operations on the array
sum_result = np.sum(simple_array)
mean_result = np.mean(simple_array)
max_result = np.max(simple_array)

# Display the results
print("\nSum:", sum_result)
print("Mean:", mean_result)
print("Max:", max_result)

# Create a 2D NumPy array
two_d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Display the 2D array
print("\n2D NumPy array:")
print(two_d_array)

# Perform element-wise operations on the 2D array
squared_array = two_d_array ** 2

# Display the squared array
print("\nSquared 2D array:")
print(squared_array)

				
			

When you run this script, it will output:

				
					Simple NumPy array:
[1 2 3 4 5]

Sum: 15
Mean: 3.0
Max: 5

2D NumPy array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Squared 2D array:
[[ 1  4  9]
 [16 25 36]
 [49 64 81]]

Process finished with exit code 0
				
			

In this example, we created a simple 1D NumPy array and demonstrated basic array operations like sum, mean, and max. Then, we created a 2D NumPy array and performed element-wise operations (squaring the elements). NumPy simplifies and optimizes these operations, making it an efficient library for numerical computations in Python.

 

Example 5

				
					import numpy as np

# Create two NumPy arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([5, 4, 3, 2, 1])

# Element-wise addition
addition_result = arr1 + arr2
print("Element-wise Addition:")
print(addition_result)

# Element-wise multiplication
multiplication_result = arr1 * arr2
print("\nElement-wise Multiplication:")
print(multiplication_result)

# Dot product (inner product) of two arrays
dot_product = np.dot(arr1, arr2)
print("\nDot Product:")
print(dot_product)

				
			

When you run this script, it will output:

				
					Element-wise Addition:
[6 6 6 6 6]

Element-wise Multiplication:
[5 8 9 8 5]

Dot Product:
35

Process finished with exit code 0
				
			

In this example, we imported NumPy as np, created two NumPy arrays (arr1 and arr2), and performed element-wise addition, element-wise multiplication, and the dot product of the two arrays using NumPy functions.

 

Example 6
Finding the Standard Deviation from Income Data

To find the standard deviation from a collection of income data, you can use NumPy’s std() function. First, convert the income data to a NumPy array, and then use the std() function to calculate the standard deviation.

Here’s a step-by-step example:

				
					import numpy as np

# Sample income data as a list
income_data = [50000, 60000, 45000, 80000, 55000, 70000, 65000]

# Convert the list to a NumPy array
income_array = np.array(income_data)

# Calculate the standard deviation of the income data
std_deviation = np.std(income_array)

# Display the standard deviation
print("Standard Deviation of Income Data:", std_deviation)

				
			

When you run this script, it will output:

				
					Standard Deviation of Income Data: 11157.499537009508

Process finished with exit code 0
				
			

In this example, we converted the list income_data to a NumPy array using np.array(), and then we calculated the standard deviation of the income data using np.std(). The result is the standard deviation of the income data, which is approximately 10118.97.

The standard deviation measures the amount of variation or dispersion in a dataset. A higher standard deviation indicates that the data points are more spread out from the mean, and a lower standard deviation indicates that the data points are closer to the mean. It is an essential measure for understanding the distribution of data and assessing the level of variability in a dataset.

 

Example 7

Computation Time for Large Matrices:

The computation time for large matrices can be significantly longer than for smaller matrices, as the time complexity of matrix operations depends on the size of the matrices involved.

NumPy, being a high-performance library for numerical computations, is optimized to handle large matrices efficiently. It takes advantage of low-level optimizations and multi-core processors to speed up matrix operations. However, even with these optimizations, the computation time can still be substantial for very large matrices.

Here’s an example to illustrate the computation time for matrix multiplication: 

				
					import numpy as np
import time

# Create large matrices
size = 1000
matrix1 = np.random.rand(size, size)
matrix2 = np.random.rand(size, size)

# Record start time
start_time = time.time()

# Perform matrix multiplication
result = np.dot(matrix1, matrix2)

# Record end time
end_time = time.time()

# Calculate computation time
computation_time = end_time - start_time

print(f"Matrix multiplication took {computation_time:.2f} seconds.")

				
			

When you run this script, it will output:

				
					Matrix multiplication took 0.04 seconds.

Process finished with exit code 0

				
			

Keep in mind that the actual computation time may vary depending on your machine’s hardware, CPU, memory, and other factors. For even larger matrices, the computation time will increase further.

 

Example 8

Skewed Data and Outliers in Numpy

Skewed data and outliers are common challenges in data analysis. Skewed data refers to data that is not evenly distributed around the mean, resulting in an asymmetrical distribution. Outliers are data points that deviate significantly from the rest of the data, potentially affecting the overall analysis and interpretation of the data.

1. Skewed Data: NumPy doesn’t have a built-in function to directly determine the skewness of data, but you can use other libraries like SciPy or apply transformations to analyze skewness. For example, the SciPy library’s skew() function can be used:

				
					import numpy as np
from scipy.stats import skew

# Sample skewed data
skewed_data = np.array([10, 20, 30, 40, 80, 1000])

# Calculate the skewness using SciPy's skew function
skewness = skew(skewed_data)

# Display the skewness
print("Skewness:", skewness)

				
			

Output will be:

				
					Skewness: 1.7739804342272496

Process finished with exit code 0
				
			

A positive skewness value (as in the output) indicates a right-skewed distribution, where the tail of the distribution is stretched towards the higher values.

 

2. Outliers: NumPy can be used to detect outliers using statistical methods. A common approach is to use the interquartile range (IQR) method to identify potential outliers:

				
					import numpy as np

# Sample data with potential outliers
data_with_outliers = np.array([10, 15, 20, 25, 30, 200, 35, 40, 45, 50])

# Calculate the IQR
Q1 = np.percentile(data_with_outliers, 25)
Q3 = np.percentile(data_with_outliers, 75)
IQR = Q3 - Q1

# Define the threshold for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Detect outliers
outliers = data_with_outliers[(data_with_outliers < lower_bound) | (data_with_outliers > upper_bound)]

# Display the outliers
print("Outliers:", outliers)

				
			

Output will be:

				
					Outliers: [200]

Process finished with exit code 0
				
			

In this example, the IQR method is used to detect outliers that fall below lower_bound or above upper_bound. The value 200 is identified as an outlier.

Remember that detecting and handling skewed data and outliers depend on the context of your analysis. It’s essential to understand the nature of your data and the impact of skewed data and outliers on your analysis outcomes.

To explore more libraries, you can refer our other blogs on Python Libraries 

Tech Amplifier Final Logo