Introduction in using Machine Learning for pattern recognition in Python

This is an introductory example in Machine Learning and Pattern Recognition of certain data. A Python program is programmed to predict the type of plants.

The iris dataset is used for this. A decision tree is used to classify data. This tutorial uses Python 3.6. Python 3.5 or later is required for this tutorial. It shows how to use Machine Learning to teach a program to create patterns from existing data and calculate predictions from them.

What is Iris Dataset?

The Iris Dataset is a multivariate dataset containing 50 data samples of three “iris” plant species each. From this dataset you can identify certain patterns (data patterns) with the help of machine learning. This dataset is often used by beginners for machine learning projects.

What is a Decision Tree?

A “decision tree” is used to make decisions. It is similar to a flowchart but consists of nodes where decisions are made in a binary system (yes or no). Each decision is represented by a node. A decision tree is very suitable for data with few attributes and it only requires less data preparation. For larger amounts of data, you should use a different algorithm that can make much more accurate predictions.


The following packages must be installed:

  • NumPy (>= 1.11.0),
  • SciPy (>= 0.17.0),
  • joblib (>= 0.11) and
  • scikit-learn

scikit-learn can be installed via the package manager pip:

Installation on the Windows CMD:


Now a Python program is created, which should learn from the existing dataset and find out certain patterns. The package “numpy” will be used to store the dataset in an array. “Numpy” is always used when working with data sets, e.g. Machine Learning.
The package “Scikit-learn” is used for machine learning. The program “tree” (for using a decision tree) and the program “accuracy_score” are called by this package. The Iris dataset is in the package “sklearn.datasets”.

If this program code is then executed in Python, then the following is output. The output varies after each execution of this program code. The names of the plant species are stored and output as IDs in an array.


The IDs of iris plant species: 0 is iris setosa, 1 is iris versicolor, 2 is iris virginica
The first line contains calculated predictions created by Machine Learning.
The second row contains the actual values used to verify the correctness of the prediction calculated by this algorithm. As you can see here, the plant species were correctly predicted to about 93%. The accuracy of the predictions can change depending on the call of this program and the amount of data used.

Also try this program with larger data sets than the “15” used here. The more data you supply to this program, the better this program can recognize data patterns and make predictions from them. Machine Learning, as you can see here in this introductory example, is used, for example, in logistics to calculate the number of goods required in the future. For example, existing data on the number of goods orders is used to calculate this forecast.

Tutorials about Linux, Java and information about IT —

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store