top of page

LBSocial

Create a Full Machine Learning Pipeline Without Coding in AWS SageMaker Canvas

 


Discover how to set up your SageMaker domain and user profile in AWS Academy, explore and clean data effortlessly using SageMaker Canvas, and build a complete machine-learning pipeline without coding. Perfect for those who want to harness the power of machine learning without programming!


Data

 

Steps

  • Download the house price data and upload it to a S3 bucket.  Create a bucket if needed.

  • Open the SageMaker service and create a SageMaker Domain by clicking Create domain:

    • Choose set up for organizations

    • Provide a domain name

    • Use the existing LabRole for step 2

    • In Step 3, choose the existing execution role as follows

arn:aws:iam::ACCOUNT-ID:role/LabRole
  • In Step 5, choose the public internet access and provide two subnets.

  • Set up a SageMaker user profile when the domain is ready

    • Open the created domain and select User Profiles

    • Choose Add User and provide a username

    • In step 4, choose the existing execution role as follows

arn:aws:iam::ACCOUNT-ID:role/LabRole
  • Start Canvas with the created user profile.

  • In Data Wrangler, clean the house price data and split it into training and test datasets.

    • Import the house price data from S3

    • Convert ID and zip to string

    • Filter out house type that is lot or land

    • Drop columns with many missing values or columns you think will not contribute to the prediction.

    • Remove rows with missing values.

    • Celan the lot_size, and you need this Python code to clean the lot_size: 

df.loc[df['lot_size'] < 10, 'lot_size'] = df['lot_size']*43560
  • Remove outliers

  • Run the quality report to drop the columns with low-prediction power.

  • Split data for training and testing datasets, copy the S3 location and import the datasets to Canvas.

  • In My Models, train a model to predict the house price.

    • Use the training dataset

    • Uncheck the ID column

    • Use the Auto training configuration and Quick Build

    • Check model performance

  • Predict data

    • Generate a manual prediction with the test dataset

    • Upload the prediction result to Canvas

    • Build a new data flow to join the prediction result and the test dataset.

    • Calculate the Feature Correlation and create a scatter plot in Data analysis to evaluate the model performance.

  • Optional: Deploy the Model to use it outside of Canvas or Register it to share it with other users.

  • Log out of Canvas to save the credits.




41 views0 comments

Comentarios


bottom of page