What is Cognitive Computing?

Most probably anyone who is even remotely aware of the nature of contemporary Data Science landscape will recognize the truth of the following two statements: (a) Data Wrangling is necessary with almost every new project, and (b) Data Wrangling is difficult and tedious. Following all the investment and enthusiasm that you have put in your education until now (that long, challenging but painful road to becoming a Data Scientist…), could it be possible that data cleaning, formatting, fusing, and restructuring is taking as much as 80% of your working hours? Wasn’t it all meant to be about statistical modeling, graphs, and applied artificial intelligence? Of all those beautiful R and Python packages that you have studied, is it true that {dplyr}, {tidyr}, and pandas are your best friends at the end of the day? No. You did not sign up for this job.

This blog, from a Data Scientist, to a Data Scientist; it might help rediscover the initial beauty that you thought will be always be inherent to the landscape. I want to ask the following: why is Data Wrangling so difficult and time consuming? With all the automation that we write to let our computers do the job by themselves, what is specifically difficult in Data Wrangling operations that prevents us from automating them too? Here’s a working example: say, we have a number of tables in some RDBS, a lot of data distributed across them, and we want to build a discrete choice model (say, a multinomial logistic regression) on these data. However, not all data in our database are useful. Moreover, not all data make sense to build such a model from. Imagine that we have some transactional, timestamped data there: well, not a millisecond resolution will be exactly what we’re looking for, but most probably something like day, week, or month of transaction could play a role of a predictor (strsplit()… enjoy). We need the data from the last two years, so select … where 2015, 2016, 2017. Assume we are mixing categorical and continuous predictors: for example, a primary key (ID int NOT NULL AUTO_INCREMENT) would do well as a categorical predictor? Not really. This is easy: tell R not to select the keys. Good. Consumer age and gender would do if and when available, right. Wait, what’s this: “… the column contains any of the following: 0, 2, 6, 7, referring to type of most frequent item type purchase made before automatic user re-registration in 2012; not in use since…” - no, no, we don’t want this... You know the drill.

Why is automatic data wrangling so difficult?

Can’t we expect from our super-smart algorithms to infer automatically this type of common knowledge and expectations among data analysts upon being given a command of a type: build me a multinomial regression model from Y as criterion and select all meaningful data as predictors; iterate model selection until best model is selected? It turns out that what is solved by mere project specification and some bare intuition in your mind – before it starts taking long hours of coding - presents a rather difficult riddle when posed as computational, algorithmic problem. Why is that so?

Let’s assume that we want to solve the problem by imposing a set of formal constraints upon the eligible data types that can enter the model. In R, continuous predictors would fall under the double type, however, sometimes integer needs to be treated as continuous in regression; character, factor, and integer would do as categorical predictors; in a discrete model, the dependent is always categorical. This is extremely easy to automate, but it would help only for the datasets where the variable semantics are all set; in other words, having the problem of letting the algorithm decide what variables do and what do not make sense as predictors is what is really makes the automation of Data Wrangling difficult. Obviously, we would need to build a semantic model, a structured knowledge repository that would be addressed by our automation of Data Wrangling in order to inspect all variable names and descriptions and see which of them match some predefined schema: the schema that defines what is allowable and what not in building a particular statistical model. Our task would then be to define the binding of all columns from our SQL tables to a set of abstract variables from our semantic model to perform the appropriate selection, and then easily build a statistical model in the desired programming language. We can probably solve this kind of Data Wrangling automation for a more or less wide class of Data Science projects; but can we solve the general case that would do for any given relational database and a wide class of statistical models? We are now well aware of the scope of the problem: its solution would almost be a true artificial intelligence. That fact is what takes up to 80% of your daily work routine.

The meaning of cognitive computing

I was motivated to write-up this short summary of the Data Wrangling automation problem a long time ago, maybe because my background as a cognitive psychologist makes me think about similar problems in the cognitive ergonomics of computer programming more often then it sparks the imagination of my colleagues with a background in software engineering and similar. But the motivation for this very blog post came as a consequence of reading some recent discussions on how to define cognitive computing properly. What is cognitive computing? On one hand, we are being told that it is essentially programming computers to perform cognitive operations in a way similar to what the natural minds do. But, at least half of the typical Data Scientist’s toolkit comes from there: people with background in cognitive sciences would be able to list a dozen of fundamental research areas that have spawned mathematical models used by Data Scientist nowadays, but that were initially developed in order to understand the workings of the human mind. On the other hand, sometimes the explanation of what cognitive computing is seems to be tightly related to the aspects of UX/UI design: cognitive computing means computers being able to react adaptively to our natural language or motor inputs and manage their outputs so to much our original intentions. I guess the automation of Data Wrangling as I have discussed it falls close to this second connotation. I have sometimes encountered that cognitive computing is not the same as AI because the former is of a probabilistic nature while the later is not, which is really true only if you put an equality sign between AI and the old classic AI research program based on the idea of rule-guided behavior (cognitive psychologists have started writing about the “probabilistic turn” in the study of human cognition more than ten years ago, not to mention the study of probabilistic causal networks that has it roots back in the 80s). The whole contemporary discourse of cognitive computing is obviously motivated by some recent developments that have created the need to redefine the meaning of the term, but the redefinition in itself seems to be taking too much time and struggle with fine-grained distinctions from similar terms; it seems to be so edgy that even rumors on cognitive computing being just another marketing hype started appearing.

A view from the perspective of cognitive ergonomics

Well, one take home message is that cognitive computing is certainly not a marketing hype in itself; as I have tried to illustrate above on the example of Data Wrangling automation problem, the problems it may address are real and many would benefit from their solution. A realistic research and development program (semantic modeling + you-name-it-probabilistic-learning-approach) is available to address a more or less wide classes of typical problems of the similar type, and the application of such programs is well under way. The most likely source of too much uncertainty in the discussion on what computing is and what it is not is probably the natural relation of this term to the possibility of obtaining general solutions for wide classes of problems similar to my illustration. Then, we should maybe start making a distinction between cognitive computing in a general and a narrow sense: the former addressing the typical fundamental questions of AI research (irrespective of whether the specific approach under discussion is deterministic or probabilistic), and the later reserved for cognitive applications that solve a constrained class of problems that prevent the user to interact with the computer in cognitively ergonomic way.

Another, final remark, to keep us in line with the nature of problem as exemplified in the beginning of this post. In a similar way that we have tried to discover why Data Wrangling is difficult, we could ask and try to understand why coding in general is hard. Every cognitive psychologist can testify that the human mind does not exhibit too much of a preference for abstraction. In our everyday lives, we rarely organize our thinking around general categories and abstract concepts. In our natural, intuitive mental operations, we look for a glass of water, we desire a conversation with a close friend, we like to have a bite of an apple; we are not chaining large numbers of formal inferences leading from the abstract goal of “being fine”, across many categories that define all that is encompassed by that general state, in order to reach to particulars like “a glass of water”, “a friend nearby”, or “an apple”. Quite contrary, these – so called “basic level categories” - seem to be readily accessible to our minds, and immediately active when a particular goal needs to be met and a particular action performed. But digital computers work exactly the other way around: on the very deep level of their operation, they derive everything by accessing the more general properties and then inferring the particular results. You can thus think of indexing in SQL as a rudimentary step in cognitive engineering: it organizes a data structure in such a way to be able meet the most probable intentions of the user who need to fetch the data from it. You can also think of developing a general semantic model for Data Wrangling in the same way: it needs to adaptively engineer a data structure in a way that matches the intended statistical modeling. From this perspective, cognitive computing can be understood as really spanning across its general and narrow senses: connecting the (a) need to develop adaptive computing that reflects the semantic understanding of (narrowed down) human needs with (b) the (general) algorithmic means of accomplishing that goal (by mimicking the operations of the human mind that it needs to adapt to).