What you need to know about industrial data scientists — as told by an industrial data scientist

Presented by AspenTech

With industrial organizations undergoing rapid, large-scale digital transformations, it can sometimes be easy to miss the forest for the trees. When you get in the weeds on AI and machine learning, big data, analytics, the cloud, and the edge, you can forget that, at the end of the day, the goal of digital transformation is not to accumulate new technologies for the sake of new technologies. It’s to use these new solutions to empower employees, make their lives easier and set them up to efficiently deliver new value and innovations for their organization. It’s about the people, not just the tech; the latter empowers the former to drive results.

The industrial data scientist is a living, breathing example of this — a relatively new role in the process engineering and industrial sector that has emerged to fulfill a growing need in our industry: marrying traditional data science with localized domain expertise at a time of great, generational change occurring in the industrial workforce.

As an industrial data scientist myself, I’d like to shed more light on what industrial data scientists are doing and can do for their organizations, including how our roles have evolved, how we complement and collaborate with traditional data scientists, and how leaders can help better set us (and their organizations) up for success.

The unique role of industrial data scientists

The industrial data scientist’s journey started with jumpstarting an evolution in industrial data aggregation and collection processes. Limited data sets, varying levels of data quality, different storage formats, different security stages, even data written down on paper — many industrial organizations still have an extreme level of variability in how they document and collect industrial data.

That variability makes collaboration, visibility, and the ability to properly leverage industrial data for tangible, valuable outcomes unnecessarily hard, tedious, and time-consuming. Industrial data scientists are uniquely positioned to get involved in the end-to-end nuts and bolts, from installing the hardware, to programming and designing the algorithms, to establishing site-wide connectivity and cloud computing, to streamlining and consolidating data collection processes. This can entail placing industrial data from different sources within the same storage or security formats, to opening access to datasets across the organization, rather than keeping it siloed to separate teams or even individual members.

What about for those organizations that are already ahead of the curve on modernizing their data management practices? Industrial data scientists will continue to play a role here, in fact a more expansive and robust one. Even for organizations that have consolidated formatting and storage standards for their industrial data, or have eliminated the internal silos that kept datasets localized to specific teams or team members, there’s still a lot of work to do.

Data quality is always of paramount importance, and that only becomes more of a priority as access to a larger pool of data opens up. Because of their unique direct contact with customers or use-cases in the field — something that traditional data scientists typically don’t have — industrial data scientists can leverage that position and direct line of feedback to better refine data quality practices.

This could include anything from managing control areas, connecting independent devices, deploying a broader use of data historians, programming homegrown Python-based setups tailored to domain specific tools, to leveraging fully automated machine learning solutions. Because industrial data scientists sit at the crossroads of domain expertise, industrial data management, and direct experience with use cases and customers, we can serve an even more fundamental role: being an advocate for the importance of better industrial data management practices, through automation and Industrial AI.

Managing the relationship between traditional and industrial data scientists

Industrial data scientists are domain experts at heart. They have strong technical backgrounds and an understanding of technologies like machine learning and Industrial IoT tools. That technical expertise is leveraged to enrich their domain areas. Ultimately, we play the role of both expert and user. Traditional data scientists are more focused on improving toolchains and algorithms, with a strict focus on the technology side, regardless of specific domains. There’s a natural opportunity for tight collaboration between these two camps, if leadership is willing to help facilitate it.

A hypothetical organization could see its industrial data scientists focus on developing features that maximize impact and scalability; highlight critical tools or processes for automation; or deploy new capabilities for taking advantage of untapped industrial time-series data. Traditional data scientists, meanwhile, can improve generic algorithms and toolchains, while also providing expert input on what’s new in their space: specific hardware that can accelerate new capabilities and put them into action faster.

Leveraging the joint potential of traditional and industrial data scientists cannot be done purely organically. Industrial data scientists are a rare and still-emerging element in the industrial workforce, and they require active recognition and advocacy from leadership in order to continue growing into a more substantial and efficient part of an organization.

How leadership can facilitate the growth of industrial data scientists

Every company is different, every data landscape is different, and every industrial data scientist’s journey from emergence to maturation is going to be different. But there are certain common threads that the leadership of an industrial organization can pull on to nurture and evolve their industrial data scientists, and their role within the organization. Leadership has to make sure the environment is right for maximizing efficiency and enabling scalability.

From a culture perspective, this means alleviating domain experts’ limited bandwidth, so that there are more opportunities for collaboration — and as a result, more room to grow for industrial data scientists. It’s not just about facilitating collaboration with domain experts, but with product managers and customers as well. More collaboration creates a broader, more holistic ecosystem for industrial data scientists to thrive and maximize value creation.

From a technology perspective, this means making sure there are reusable and flexible, end-to-end toolchains in place, and automation for data cleaning. Custom toolchains limit collaboration and reuse, which slows down work and drives inefficiencies. Streamlining toolchains and making them reusable ensures that data scientists are not having to constantly redo everything from scratch themselves, and can instead focus on creating scalable solutions and building sustainable industrial AI models. 

My team and I are building and constantly refining such toolchains both for internal use and to further enable our customers’ own industrial data scientists. One toolchain we develop can provide a flexible Jupyter IDE-based environment that allows connectivity to historian plant data, data exploration, preprocessing, machine learning model deployment, visualization, and much more.

Another allows industrial data scientists to make use of recorded asset data but also directly interface with simulators to easily execute domain-based data conditioning, utilize tools for model explainability and performance evaluation, and ensure that physical first principles constraints are met.

Industrial data scientists are a critical part of our industry’s future. Generational changes in the workforce are precipitating a major loss of historical knowledge and operational expertise. There’s a major need for industrial AI models that can utilize data science to capture and preserve that historical knowledge before it’s too late. Industrial data scientists are uniquely positioned to do just that thanks to their combination of domain expertise and data science savvy. But doing so requires a real push from leadership to facilitate their growth and open collaboration with all teams across the organization.

Want to learn more about industrial data scientists? Check out my recent roundtable discussion on The Rise of the Industrial Data Scientist – where I spoke with Peter Reynolds, Analyst of ARC Advisory Group, David Leitham, SVP and General Manager, Life Sciences, AspenTech and Jose Valls, CTO, Process Manufacturing of Microsoft on how industrial data is used in key industries; the role that industrial data scientists can play in improving your organization; and how to hire an industrial data scientist for your team.

Heiko Claussen is SVP of Artificial Intelligence & Industrial Data Scientist at AspenTech.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.