Just reading chapter 01 of very excellent books about Advances in Machine Learning (AFML)from Prof Marcos Lopez de Prado (MLdP). If you interesting in finance and how to apply machine learning in general you should buy this book.
I will share the main point i’ve got from Chapter 01. Feel free to discuss about this. Couple of things. It’s about how to use machine learning responsibly and also how to create a great financial research framework. As I’m a software engineer i will give my view on this matters using my only hat. :)
Me and ML
This books change my view about ML and AI in general. As people in my community knows i’m bit skeptical about data science because currently is overhype and got nowhere in a lot of company because of either a recruiter is wrong or the recruitee is wrong. Either way it’s very challenging in our situation now.
A lot of people nowadays is really fall in love into data science, machine learning and deep learning. That’s all fun and awesome. But we need to get back to theory, make it into production, not just playing around with new toys, throwing some data into library and blindly use it to predict something without knowing the theory behind.
I don’t like using ML that way. It’s boring, with training and waiting time, coffee in between of course, tuning parameters and doing lots of thing that can be automated. Where is the creative part on that? As engineer i hate waiting. I would love to automate a lot of task for them to make their life easier.
Financial ML is Different
Some of that style of using ML can works if the problem is easy to do by human. Like detecting a cat, recognizing handwriting, reading a books etc. Currently what we have as an AI is mimicking what human are really good at.
But in finance is not quite the same. It’s really different. As we need to build on basis of good theoretical foundation. We can’t just treat ML as a black-box. That’s not a good way to use it in Finance. If everyone can just throw OHLC or time series datasets from the web and write some code that fit that into LSTM or CNN, all of engineer and data scientist will be rich.
We should use ML to build theory. That will stand on it feet with the use of ML. If you predict something, it means that it right for all historical dataset. But it might not be true for the future. If you have theory than you will not need to predict. It will true all the time and you can make money. It is simple. But it’s not easy. For sure one person can’t do that alone.
The era when there is one person sitting on his bedroom and write some code then change the world is over. You will need a team of talented and driven person. That where leadership comes into place. That’s what Jim Simmons do with Renaissance Technologies. And that what i hope everyone should do. Be a good team player, leader and attract talent. Oh, yes. You should have a great research infrastructure or big data or data analytics platform.
Ok, now we should create a team. What would be my team consist of? Computer Scientist? Financial Engineer? Data Engineer?
Well, that would be collaboration across discipline. We don’t need superhero that can do anything or full-stack engineer in that sense. You need specialized people in his field to do what he’s great at. ML and AI for sure will change the landscape of the employment, hiring and firing. So we need to have a new skills and ability to learn faster and collaborate with others. You don’t need to suddenly change your job from engineer into data scientist or also not the other way around!
There’s no need to envy each other. Let’s collaborate.
Everyone should get benefit from AI and technology advancement. A lot of new job will be created like data labelers, data curators, data quality engineer etc. It all a good things for society. Get more work done with the help of machine and do what human can do best which is creativity and empathy. That’s something that we all do best! ML is just a tools for us.
Everything that can be automated should be automated. That’s the tenets of Continuous Integration, Continuous Delivery and Deployment. We are engineer should thrive on those things. Not become afraid of the ML/AI movement.
So let me breakdown how research factory should looks like based on AFML.
I do really believe this one is the job of Data Engineers, how to build pipeline and data platform. Arrange that in some way to make it better and easier to access. Create data catalog and good data governance. And further we can create good data platform where people can do self-service data analytics and get people reason based on data. This will push data literacy also inside the organization.
For finance this would be the person that know how to connect to API, Gateway, Protocol and write the code to prepare and insert that into the Big Data Store. So engineering roles is needed here!
For sure the next part is how to automate this to create self-driving data governance. That’s what Data Engineering should aim for. So they can be free from maintaining ETL and boring job like creating data pipeline. If that can’t be automate, then it should be self-service.
Here’s where the role of Data Scientist come to shine. It should be a people that know information theory, signal processing and how to squeeze the data and find insight that can’t be seen in the plain sight. Remember we need to find edge in Finance.
To build that edge you will need features, predictor and anomalies in the data that you can use to create your model. Feature engineering, extraction and dictionary is the huge part here. You can use ML, Signal Processing etc here for for finding the features that important. Then you need to push that again into the features catalog in your Research/Data Platform. Research/Data Platform that has been created by data engineer.
In this division, the features that has been found by Feature Analyst. Strategist can found that also in their Research/Data Platform. Search and browse the features and create a theory that has economic values or can be monetize. So theory first. Strategist will be a Data Scientist with Financial Market knowledge. Remember this one is specialized data scientist. No generalist.
So it should make sense and not just doing data torture all day long. This scientist should have creativity, vision and imagination to comes up with the ideas that worth to pursue. He/She should able to code the algorithm for that strategy and make it available on the Research/Data Platform for Backtester to judge.
This divison, should be the transformation of Quality Assurance or Software Tester in general. They need to retrain their role with data science skills and bit of finance. Their job is absolutely to found bugs, errors and works hard to prove that the strategy is wrong. What? How come this is a team works?
Don’t get me wrong. That’s the job of Software Tester. Make a quality great by finding more bugs. You don’t do backtest to make it looks great. You do backtest to prove it wrong. If the backtester can’t prove it otherwise then we have a chance for that to works in production.
For sure there will be a learning curve for Tester to have a grasp of how to conduct the experiment or other details. But that should be covered by Data/Research Platform. See… Big Data technology helps a lot especially the Data Platform parts. It’s like a glue that enable people to works together with the common goals. It breaks the isolation.
This one is the division that consists of Software Engineer from backend, system engineer, networking, infrastructure, cloud engineer. They should have computer science background, networking and distributed system. Not only discrete math, they should understand numerical algorithm also.
They should be able to create high performance code that utilized hardware. So it can be correct, clean, fast and robust. Bugs can cost us a lot of money. Be careful of this one. Don’t underestimate this roles. See Knight Capital story
See? It’s serious job for software engineer!
Everyone has its part. You need Good people, Data, Analytics Platform. That’s how you should create your research factory. It’s a great time to learn. Everyone should have a skill in something and focus on that. Do what is your great at, follow your passion and monetize on that. And the great things is you shouldn’t do that alone.
The worlds is more connected that ever in this COVID-19. So let’s keep your chin up, have a hope and have fun learning!