Create a Full Machine Learning Pipeline Without Coding in AWS SageMaker Canvas

Xuebin Wei
Sep 19, 2024
2 min read

Updated: Jun 10, 2025

Discover how to set up your SageMaker domain and user profile in AWS Academy, explore and clean data effortlessly using AWS SageMaker Canvas, and build a complete machine-learning pipeline without coding. Perfect for those who want to harness the power of machine learning without programming!

Data

Download the house price data from https://github.com/xbwei/machine_learning_in_python/blob/master/house_price_full.csv.

Steps

Download the house price data and upload it to a S3 bucket. Create a bucket if needed.
Open the SageMaker service and create a SageMaker Domain by clicking Create domain:
- Choose set up for organizations
- Provide a domain name
- Use the existing LabRole for step 2
- In Step 3, choose the existing execution role as follows

arn:aws:iam::ACCOUNT-ID:role/LabRole

In Step 5, choose the public internet access and provide two subnets.

Set up a SageMaker user profile when the domain is ready
- Open the created domain and select User Profiles
- Choose Add User and provide a username
- In step 4, choose the existing execution role as follows

arn:aws:iam::ACCOUNT-ID:role/LabRole

Start Canvas with the created user profile.
In Data Wrangler, clean the house price data and split it into training and test datasets.
- Import the house price data from S3
- Convert ID and zip to string
- Filter out house type that is lot or land
- Drop columns with many missing values or columns you think will not contribute to the prediction.
- Remove rows with missing values.
- Celan the lot_size, and you need this Python code to clean the lot_size:

df.loc[df['lot_size'] < 10, 'lot_size'] = df['lot_size']*43560

Remove outliers
Run the quality report to drop the columns with low-prediction power.
Split data for training and testing datasets, copy the S3 location and import the datasets to Canvas.

In My Models, train a model to predict the house price.
- Use the training dataset
- Uncheck the ID column
- Use the Auto training configuration and Quick Build
- Check model performance
Predict data
- Generate a manual prediction with the test dataset
- Upload the prediction result to Canvas
- Build a new data flow to join the prediction result and the test dataset.
- Calculate the Feature Correlation and create a scatter plot in Data analysis to evaluate the model performance.
Optional: Deploy the Model to use it outside of Canvas or Register it to share it with other users.
Log out of Canvas to save the credits.

https://www.youtube.com/watch?v=bKIQel2BzE4

Create a Full Machine Learning Pipeline Without Coding in AWS SageMaker Canvas

Recent Posts

Comments