How data science is becoming available ‘for the good of all’ businesses
By Hugh Durkin
In his 2010 Ted Talk "When Ideas Have Sex," Matt Ridley posits that human prosperity was caused by one thing and one thing only; our unique human ability to specialise and exchange ideas and tools.
Ridley's example of the invention of the reading light illustrates how far we've come. Thousands of years ago, making an hour of reading light required hunting an animal and killing it, before rendering it down to make a candle. Today, the average human earns an hour of reading light in less than half a second. The reclaimed time is spent relaxing, traveling, and working day to day in specialized industries for the benefit of other humans. Specialization and exchange creates new technologies faster, and at an ever decreasing cost.
'Data science as a service' is the latest way for humans to specialize and exchange data science ideas and tools, and is fast accelerating a new wave of computing innovations at an ever decreasing cost. At Zalando, Europe's leading online fashion platform, we've been 'all in' on data science almost since the start of our journey to 'reimagine fashion for the good of all'; delivering customized experiences, quality search results, and contextually relevant recommendations through AI and Machine Learning. Today, we're betting on 'data science as a service' as a new way to democratize the previously specialized power of data science to teams across Zalando.
To democratize technologies, you must understand how this 'new' innovation is similar and different from legacy innovations that people already use; in this case, traditional 'as a service' API platforms.
First, most platform APIs typically enable users to do one of two things: perform CRUD-like operations to create, read, update, and delete information from a central source of truth (the platform), or ask questions of pre-defined and indexed datasets within a platform (most people call this 'analytics').
Data science platform APIs are different. Acronyms and words like NLP and deep learning are used to describe data science, but what data scientists really do is help machines understand the evolving, unstructured world around us as humans do. 'Data science as a service' APIs provide power by adding structure to unstructured random inputs and questions like:
In the examples above, what "this" is could refer to unstructured text, images, video, or audio. "Meaningful groups" and "unusual things" could be subjective. Humans can be biased when answering questions like these. Machines don't (yet) have human biases, so helping them to understand unstructured inputs, and create loosely structured outputs requires a different way for these machines to talk to each other, and to humans.
Second, most platform APIs deliver confident results in a binary way. As an example, querying an API for a set of records created within a date range will deliver back the correct set of records created within that date range, provided the data initially provided is accurate. Similarly, when an API is used to read a single record in a database, the API will confidently retrieve that record and its contents for you.
Again, data science APIs are different. Imagine someone stopping you on the street, showing you the photo above, and asking you, "What do you see in this picture?" Your answers might begin with "I'm pretty sure I see..." (a boat on the Hudson River), or "I definitely see..." (the Empire State Building). You might also be asked clarifying questions like, "Where do think you see it?" As your answers will be either high, medium, or low confidence answers, 'data science as a service' APIs must also have a means to express their level of confidence.
At Zalando, we've formed a deep understanding of why 'data science as a service' APIs are different through building our Fashion Content Platform Team. Simply put, our team of data scientists, engineers, designers, and product managers develop fashion-focussed AI models, capabilities and APIs to enable any team in Zalando to integrate self-serve 'data science as a service' APIs when building relevant and immersive experiences for customers. Here's some key lessons we learned along the way.
'Data science as a service' APIs are different, and for both technical and non-technical users, it's important to understand why they're different, by 'making it real'. For non-technical users, easy to use demo UIs and a 'Labs' environment make it easy for any member of any team to understand what our deep learning and NLP models do, and how integrating them can help them deliver unique customer experiences. They also take the mystery out of data science, through familiar inputs, simple language, and visual responses with clear explanations. For technical users, 'make it real' happens through easy to use tools to call APIs from within the documentation. Certain fields are pre-filled with images and text to reduce time-to-first API call.
For image analysis deep learning APIs, developers must understand quickly how JSON responses -- or features built with them -- might surface to customers within their applications. We carry interactive and visual cues through the documentation and JSON responses are clear, and in-context too. Where relevant, Taxonomies are visual, using imagery to quickly articulate what 'A-line', 'Cropped', and 'Paisley' might mean. For NLP text analysis APIs like Entity Relations, JSON responses are structured for easy interpretation, and demo UIs are available for users to understand visually how Entities relate to each other.
Many developers 'fail first time' when using 'data science as a service' APIs, as they feel they've incorrectly integrated, or are not getting the results back they require. Like humans, machines are limited to what they know based on what they've seen before, and simple, visual explanations within documentation help developers understand what the machine knows now, and what it might be learning soon (the AI product roadmap). Providing examples of inputs that work, and inputs that do not, will help set expectations, and likely help fuel your AI product roadmap with new feature requests. Product limitations are always an opportunity to prioritise feature requests faster.
While terms like 'confidence score' and 'features' are part of day-to-day conversations amongst data science teams, it's easy to forget that developers newer to data science may not understand what they mean, or what their JSON output represents. Stating the seemingly obvious not only helps developers adopt and integrate with APIs more quickly, it provides an opportunity for all types of developers to skill up and learn about new technologies, and hopefully will spark some ideas for them, too.
"Data science" in all its various forms has existed for more than 30 years, but the majority of businesses in the world today don't understand what it is, or don't understand the benefits data science can deliver for their business. 'Data science as a service' will address that knowledge and tools gap, enabling businesses everywhere to understand large datasets, automate manual processes, and deliver relevant customer experiences.
Originally published by Hugh Durkin from developerecosystem.com on the Zalando tech blog in April 2018.
© 2020 Developer Ecosystem