How to use AutoML-Matrix through AWS/Sagemaker Platform?
How to prepare the dataset for AutoML-Matrix
-
AutoML Matrix expects the data to be in a CSV file format (comma separated) with no header column
-
Rows in the file represent data points and columns represent the features / attributes
-
Feature values can be only numeric (integer or floating point values)
-
First column represents the class label and it should contain only integer values
-
There is no need to split the dataset into train/validation/test sets, the algorithm automatically does that internally
How to train AutoML Matrix models from AWS Sagemaker Web Console
Below is a step-by-step description of how to use AutoML Matrix algorithm on AWS:
-
Log into your AWS account
-
Go to Amazon Marketplace and subscribe Deep Element AutoML Matrix algorithm
-
Create a folder (bucket) in AWS S3 Storage and upload data that bucket
-
Open the web console page of AWS Sagemaker
-
On the left pane, click on link “Training Jobs”
-
Click on the button “Create training job”
-
Enter a name for the training job
-
Under the “Algorithms Source” section, select “An algorithm subscription from AWS Marketplace”
-
Select “DeepElement AutoML-Matrix” algorithm
-
Select the “Instance Type” among the list of suggested instance types
-
Specify the value of “Maximum Runtime”.
-
Runtime of the algorithm depends on various factors including number of data samples, number of features, the complexity of the problem itself. A rough approximation of Maximum Runtime based only on number of data samples can be:
-
If #Samples < 100,000, then Maximum Runtime can be in range (2, 4) hrs
-
If #Samples < 500,000, then Maximum Runtime can be in range (3, 6) hrs
-
If #Samples < 1,000,000, then Maximum Runtime can be in range (6, 10) hrs
-
If #Samples < 10,000,000, then Maximum Runtime can be in range (10, 20) hrs
-
-
Channels: Algorithm expects train data location as part of channel “train”.
-
Input Mode: Select “File”
-
Content Type: Type “text/csv”
-
Compression Type: None
-
Record Wrapper: None
-
Data Source: Select “S3”
-
S3 data type: S3Prefix
-
S3 data distribution type: FullyReplicated
-
S3 location: Specify the location of S3 location of the train data CSV file
-
S3 output path: S3 location of the folder (or bucket) which contains the train data file
-
-
Click “create training job”
How to deploy AutoML Matrix models from AWS Sagemaker Web Console
Below is a step-by-step description of how to use AutoML Matrix algorithm on AWS:
-
Once the training job is completed successfully, open the training jobs page
-
Click on “Create model package”
-
Specify the name of the model package
-
Click “Next”
-
Under the “Validation and Scanning” section,
-
Select “No” to “Publish this model package on AWS Marketplace”
-
Select “No” to “Validate this resource”
-
-
Click on “Create Model Package” - this should create a new Model Package
-
Click “Model Packages” on the left pane of the web page, this should show you the list of Model Packages created so far
-
Select the model package you just created
-
Select “Create endpoint”
-
Model Name: Specify a name for this new model
-
Under “Container input options”, select “Use a model package resource”
-
Click “Next”
-
Endpoint Name: Specify a name for this endpoint
-
Select “Create a new endpoint configuration”
-
Endpoint configuration name: Specify a value
-
Click “Create Endpoint Configuration”
-
Click “Submit”
-
-
This should create a new endpoint which is callable for making predictions