Discover how to set up your SageMaker domain and user profile in AWS Academy, explore and clean data effortlessly using SageMaker Canvas, and build a complete machine-learning pipeline without coding. Perfect for those who want to harness the power of machine learning without programming!
Data
Download the house price data from https://github.com/xbwei/machine_learning_in_python/blob/master/house_price_full.csv.
Steps
Download the house price data and upload it to a S3 bucket. Create a bucket if needed.
Open the SageMaker service and create a SageMaker Domain by clicking Create domain:
Choose set up for organizations
Provide a domain name
Use the existing LabRole for step 2
In Step 3, choose the existing execution role as follows
arn:aws:iam::ACCOUNT-ID:role/LabRole
In Step 5, choose the public internet access and provide two subnets.
Set up a SageMaker user profile when the domain is ready
Open the created domain and select User Profiles
Choose Add User and provide a username
In step 4, choose the existing execution role as follows
arn:aws:iam::ACCOUNT-ID:role/LabRole
Start Canvas with the created user profile.
In Data Wrangler, clean the house price data and split it into training and test datasets.
Import the house price data from S3
Convert ID and zip to string
Filter out house type that is lot or land
Drop columns with many missing values or columns you think will not contribute to the prediction.
Remove rows with missing values.
Celan the lot_size, and you need this Python code to clean the lot_size:
df.loc[df['lot_size'] < 10, 'lot_size'] = df['lot_size']*43560
Remove outliers
Run the quality report to drop the columns with low-prediction power.
Split data for training and testing datasets, copy the S3 location and import the datasets to Canvas.
In My Models, train a model to predict the house price.
Use the training dataset
Uncheck the ID column
Use the Auto training configuration and Quick Build
Check model performance
Predict data
Generate a manual prediction with the test dataset
Upload the prediction result to Canvas
Build a new data flow to join the prediction result and the test dataset.
Calculate the Feature Correlation and create a scatter plot in Data analysis to evaluate the model performance.
Optional: Deploy the Model to use it outside of Canvas or Register it to share it with other users.
Log out of Canvas to save the credits.
Comentarios