With the current AI boom, incorporating this technology into your product can give you a competitive edge. To successfully integrate Machine Learning (ML) models into our product at ITONICS, our data science team needed to follow a few key steps. Here's how we did that and the lessons I learnt as a senior data scientist that can help others implement ML models in their business.
Why we’re integrating AI into our product at ITONICS
At ITONICS, we are building innovation management software solutions. What do we mean by innovation management? We provide our customers with all the data they need to drive where they invest their research and development money. The goal is to invest in the right things at the right time, and our product helps our customers research, analyse, evaluate and collaborate on a large scale.
Dealing with large amounts of data is a core part of our business — we have vast amounts of information we can provide our customers. But how do you find the signal in all that noise? This is where ML comes in. It allows us to identify the most useful patterns amongst an influx of data and make it more digestible for customers.
With recent rapid developments in AI, we have more opportunities to make our product more impactful to clients by helping them achieve their tasks more effectively and in less time.
How we’re integrating AI into our product
Here are some of the ways we’re leveraging AI within our product:
- Recommender systems: Our customers use our product to put together content for research and development. We use ML models to recommend them content that other people in their organization have created. This cuts down on potential double work within teams. Furthermore, we use these systems to point our clients to other recommended articles and documents during their research process. It functions similarly to an algorithm on a website like YouTube, where it recommends data based on your current interest.
- Competitive intelligence: We use ML models to alert our customers of any new patents filed in their area of interest, allowing them to quickly gather information on competitors.
- Generative AI: We’re currently in the process of developing generative AI models for the text-based data we provide our customers. For example, if a customer wants to write about a topic, we can generate a brief synopsis of the topic as a starting point.
Our mindset for implementing these models is to start with small units of work that can make our clients’ lives easier, instead of trying to build one large model over a long period.
We’ve learnt that it can be hard to identify exactly which use cases will add the most value to customers in a field that develops as rapidly as AI. By shipping smaller units of work, we can iterate quickly on feedback and find the most impactful use cases.
Steps for embarking on a machine learning journey as a business
When implementing ML models in your business, there are some key steps that can go a long way to help you achieve success. Without a good data foundation, it will be much harder to successfully implement ML in your business.
Here’s what we learnt during our ML journey at ITONICS:
Step 1: Evaluate your data availability
First, you need to consider your available data and identify the areas where ML can provide the most value to your organisation. Prioritise the use cases that align with your organisation’s strategic goals and that have the potential to deliver significant business impact. To identify these use cases, start with a problem that the customer wants to solve and work backwards from there.
In our case, we had ten years’ worth of data available from a previous iteration of our product that we could use to apply machine learning techniques. If you don’t have data that you've gathered in the past, you will likely have to buy data — take these costs into account, which can be extensive depending on the data you’re looking at.
Step 2: Clean up your data
Next, one of the most crucial things you need to do is clean the data you want to use for your Machine Learning models.
The cleaner the data, the better value you’ll get. If you train these models on bad data, you’ll get random results that are of little use to customers.
Cleaning your data means removing duplicate data and junk data. For example, we had to remove a lot of foreign language data, spam, and data in random characters from our own dataset before we could start training our ML models.
Keep in mind that training these models is not cheap in terms of the hardware needed to do it. If you make a mistake in the data you use to train your ML models, it will add up to expensive errors in computing time.
Step 3: Get your data infrastructure in place
You need solid data infrastructure in place for Machine Learning models. It’s helpful to keep these considerations in mind:
- Having powerful hardware: Machine Learning often uses large amounts of data, which require large data stores and ways to transfer your data to train the models. The models themselves are usually larger than normal software applications, so you will need powerful hardware to run them.
- Data storage costs: Storage costs of large datasets can be quite expensive, especially if you're acquiring them from scratch.
- Data security: Your data team needs to be able to experiment with your data in a secure manner and in a separate environment from your production database. This is especially important if you’re working with sensitive data. Your team needs a space to safely work with these datasets before you switch over to the production version of your database.
Tip: Getting your models into production can be a challenge since they’re typically much larger development packages than normal. You need to strike a balance between cost and performance. We’ve found good success using Elasticsearch for our database, Amazon's Elastic Container Service for deployment, and spot.io for cost optimisations.
Step 4: Appoint a dedicated data science team
Once you have your data infrastructure set up, it would be ideal to have a dedicated data science team to work on implementing your models. This team should comprise individuals with expertise in data science, ML algorithms, programming, and statistics.
At ITONICS, we have a small team that handles everything related to ML implementation end to end. That includes everything from data modelling, data processing, back-end development, and deployment. This allows us to be more agile in shipping small units of work quickly.
If you want to focus on being more data-driven in your organisation, it's necessary to have an in-house team that has thorough domain knowledge and experience of your unique challenges.
After following the above steps, we’re now set up to hit the ground running when building out our Machine Learning initiatives at ITONICS, and remain competitive in the rapidly developing field of AI.
Tim Terblanche is a senior data scientist at ITONICS: he loves problem-solving, and Data Science allows him to do this in a systemic way on real-world problems. At work, his goal is to create data products that deliver value to clients. Outside of work, he enjoys rock climbing, mountain biking and the occasional PC game.