top of page

LBSocial

Create a Full Machine Learning Pipeline Without Coding in AWS SageMaker Canvas

Updated: Jun 10


Discover how to set up your SageMaker domain and user profile in AWS Academy, explore and clean data effortlessly using AWS SageMaker Canvas, and build a complete machine-learning pipeline without coding. Perfect for those who want to harness the power of machine learning without programming!


Data

 

Steps

  • Download the house price data and upload it to a S3 bucket.  Create a bucket if needed.

  • Open the SageMaker service and create a SageMaker Domain by clicking Create domain:

    • Choose set up for organizations

    • Provide a domain name

    • Use the existing LabRole for step 2

    • In Step 3, choose the existing execution role as follows

arn:aws:iam::ACCOUNT-ID:role/LabRole
  • In Step 5, choose the public internet access and provide two subnets.

  • Set up a SageMaker user profile when the domain is ready

    • Open the created domain and select User Profiles

    • Choose Add User and provide a username

    • In step 4, choose the existing execution role as follows

arn:aws:iam::ACCOUNT-ID:role/LabRole
  • Start Canvas with the created user profile.

  • In Data Wrangler, clean the house price data and split it into training and test datasets.

    • Import the house price data from S3

    • Convert ID and zip to string

    • Filter out house type that is lot or land

    • Drop columns with many missing values or columns you think will not contribute to the prediction.

    • Remove rows with missing values.

    • Celan the lot_size, and you need this Python code to clean the lot_size: 

df.loc[df['lot_size'] < 10, 'lot_size'] = df['lot_size']*43560
  • Remove outliers

  • Run the quality report to drop the columns with low-prediction power.

  • Split data for training and testing datasets, copy the S3 location and import the datasets to Canvas.

  • In My Models, train a model to predict the house price.

    • Use the training dataset

    • Uncheck the ID column

    • Use the Auto training configuration and Quick Build

    • Check model performance

  • Predict data

    • Generate a manual prediction with the test dataset

    • Upload the prediction result to Canvas

    • Build a new data flow to join the prediction result and the test dataset.

    • Calculate the Feature Correlation and create a scatter plot in Data analysis to evaluate the model performance.

  • Optional: Deploy the Model to use it outside of Canvas or Register it to share it with other users.

  • Log out of Canvas to save the credits.




Comments


bottom of page