Artificial intelligence is one of the most trending topics in today’s market. It is used almost in every industry nowadays, in fact after the rise of ChatGPT more common people are exposed to this field and are becoming interested in learning it.
Artificial Intelligence is a very broad topic, and it is not just about ChatGPT or Natural Language Understanding. It is one of the subfields of AI.
AI spans from basic searching algorithms to machine learning/deep learning. The field is vast, so it is up to you where you want to go.
You might have come here to understand how AI can be build using Python. So, let’s dive deep into it and understand how we can build a simple Artificial Intelligence to predict the price of a house.
We are going to quickly look into the code, understand the code a little bit without confusing ourselves with too much information, and see how an AI is built. First, let’s look at the building blocks when creating an AI, especially machine learning model.

Building the AI
The example we are discussing right now is part of a machine learning problem. Out of all algorithms and fields in artificial intelligence, machine learning is the mostly used field nowadays with ChatGPT, Vision Transformers etc. which are nothing but deep learning, part of machine learning only.
In the house price prediction problem, we are given a dataset of house prices. Why we need a dataset?
Dataset is a storage of examples, which are question and answer pairs, to tell the AI what kind of house what price has given the features of that house. After it understands the relationship between the feature of a house to the price of a house, AI will be able to predict a house price given just it’s price.
This is called supervised learning.
I am following the given flow to build this AI –
- Download the dataset
- Looking at the dataset
- Splitting the dataset – train and test
- Visualizing the data
- Preparing the data for model training
- Training the model
- Evaluating the model
Let’s see some of the code needed to understand the working of this AI.
Code Example
We will first start by downloading the dataset and loading the dataset in our code.
from pathlib import Path
import pandas as pd
import tarfile
import urllib.request
def load_housing_data():
tarball_path = Path("datasets/housing.tgz")
if not tarball_path.is_file():
Path("datasets").mkdir(parents=True, exist_ok=True)
url = "https://github.com/ageron/data/raw/main/housing.tgz"
urllib.request.urlretrieve(url, tarball_path)
with tarfile.open(tarball_path) as housing_tarball:
housing_tarball.extractall(path="datasets")
return pd.read_csv(Path("datasets/housing/housing.csv"))
housing = load_housing_data()
Next, we can look at the dataset. Understand how many features we have, what is our target, and what kind of relationship does they have with the target.
housing.info()
# Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 longitude 20640 non-null float64
1 latitude 20640 non-null float64
2 housing_median_age 20640 non-null float64
3 total_rooms 20640 non-null float64
4 total_bedrooms 20433 non-null float64
5 population 20640 non-null float64
6 households 20640 non-null float64
7 median_income 20640 non-null float64
8 median_house_value 20640 non-null float64
9 ocean_proximity 20640 non-null object
dtypes: float64(9), object(1)
memory usage: 1.6+ MB
Here, in this dataset, we are going to target the feature median_house_value using the other features.
Then, we need to split the dataset into training and testing parts. The reason behind doing this is, at the end whatever model we have, we can test and check it’s accuracy/performance based on unseen data.
import numpy as np
def shuffle_and_split_data(data, test_ratio):
shuffled_indices = np.random.permutation(len(data))
test_set_size = int(len(data) * test_ratio)
test_indices = shuffled_indices[:test_set_size]
train_indices = shuffled_indices[test_set_size:]
return data.iloc[train_indices], data.iloc[test_indices]
train_set, test_set = shuffle_and_split_data(housing, 0.2)
Visualizing the data could be a little bit tricky but, in this example, we are just trying to check the relationship between some of the features and what could be contributing more towards predicting the house price.
from pandas.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot") # extra code
plt.show()
housing.plot(kind="scatter", x="median_income", y="median_house_value",
alpha=0.1, grid=True)
save_fig("income_vs_house_value_scatterplot") # extra code
plt.show()
Now, we are going to prepare the dataset for the mode.
housing = train_set.drop("median_house_value", axis=1)
housing_labels = test_set["median_house_value"].copy()
housing_num = housing.drop("ocean_proximity", axis=1)
from sklearn.pipeline import Pipeline
num_pipeline = Pipeline([
("impute", SimpleImputer(strategy="median")),
("standardize", StandardScaler()),
])
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
])
housing_prepared = full_pipeline.fit_transform(housing)
Now, the data is prepared for the mode, we will fit the dataset into the linear regression model, as the data is linearly related to the target.
from sklear.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(houisng_prepared, housing_labels)
Evaluate the model with some data.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.fit_transform(some_data)
# Do the prediction on some data
lin_reg.predict(some_data_prepared)
Congratulations!! Finally, you have created an AI which is able to predict the price of a house based on its features.
But, there are lots of explanations that I have skipped because I did not want to confuse you with too much information and wanted to show you how code in python looks like to build an artificial intelligence.
If you want to know a little bit more about the theory and the explanation of code then refer to the book “Hands-On Machine Learning with Sckikit-Learn, Keras & Tensorflow”.
Conclusion
This blog is just an overview of how the artificial intelligence models are built using python and give you an overall idea about the general flow of how the code follows and builds the model.
This same structure will be followed if you want to build some other models.
Liked this blog and want to read more? Jump to the next blog right now😉.
References:
- Hands-On ML GitHub: handson-ml3/02_end_to_end_machine_learning_project.ipynb at main · ageron/handson-ml3
- Boston House Price Prediction: Arup3201/boston_house_price_prediction