How to use AutoML-Matrix through AWS/Sagemaker Platform?

How to prepare the dataset for AutoML-Matrix

  • AutoML Matrix expects the data to be in a CSV file format (comma separated) with no header column

  • Rows in the file represent data points and columns represent the features / attributes

  • Feature values can be only numeric (integer or floating point values)

  • First column represents the class label and it should contain only integer values

  • There is no need to split the dataset into train/validation/test sets, the algorithm automatically does that internally

How to train AutoML Matrix models from AWS Sagemaker Web Console

Below is a step-by-step description of how to use AutoML Matrix algorithm on AWS:

  • Log into your AWS account​

  • Go to Amazon Marketplace and subscribe Deep Element AutoML Matrix algorithm

  • Create a folder (bucket) in AWS S3 Storage and upload data that bucket

  • Open the web console page of AWS Sagemaker

  • On the left pane, click on link “Training Jobs”

  • Click on the button “Create training job”

  • Enter a name for the training job

  • Under the “Algorithms Source” section, select “An algorithm subscription from AWS Marketplace”

  • Select “DeepElement AutoML-Matrix” algorithm

  • Select the “Instance Type” among the list of suggested instance types

  • Specify the value of “Maximum Runtime”.

  • Runtime of the algorithm depends on various factors including number of data samples, number of features, the complexity of the problem itself. A rough approximation of Maximum Runtime based only on number of data samples can be:

    • If #Samples < 100,000, then Maximum Runtime can be in range (2, 4) hrs

    • If #Samples < 500,000, then Maximum Runtime can be in range (3, 6) hrs

    • If #Samples < 1,000,000, then Maximum Runtime can be in range (6, 10) hrs

    • If #Samples < 10,000,000, then Maximum Runtime can be in range (10, 20) hrs

  • Channels: Algorithm expects train data location as part of channel “train”.

    • Input Mode: Select “File”

    • Content Type: Type “text/csv”

    • Compression Type: None

    • Record Wrapper: None

    • Data Source: Select “S3”

    • S3 data type: S3Prefix

    • S3 data distribution type: FullyReplicated

    • S3 location: Specify the location of S3 location of the train data CSV file

    • S3 output path: S3 location of the folder (or bucket) which contains the train data file

  • Click “create training job”

How to deploy AutoML Matrix models from AWS Sagemaker Web Console

Below is a step-by-step description of how to use AutoML Matrix algorithm on AWS:

  • Once the training job is completed successfully, open the training jobs page

  • Click on “Create model package”

  • Specify the name of the model package

  • Click “Next”

  • Under the “Validation and Scanning” section,

    • Select “No” to “Publish this model package on AWS Marketplace”

    • Select “No” to “Validate this resource”

  • Click on “Create Model Package” - this should create a new Model Package

  • Click “Model Packages” on the left pane of the web page, this should show you the list of Model Packages created so far

  • Select the model package you just created

  • Select “Create endpoint”

    • Model Name: Specify a name for this new model

    • Under “Container input options”, select “Use a model package resource”

    • Click “Next”

    • Endpoint Name: Specify a name for this endpoint

    • Select “Create a new endpoint configuration”

    • Endpoint configuration name: Specify a value

    • Click “Create Endpoint Configuration”

    • Click “Submit”

  • This should create a new endpoint which is callable for making predictions

© 2020 by Deep Element