Digital Humanities: What I learned about the humanities over a decade, as a scientist

January 25, 2021

On January 27th, the Islamicate Digital Humanities Network (IDHN) will organize a conference that will showcase a lot of my team’s research on understanding Hadiths as a narrative network. This is a special moment for all of us in the team, and definitely a milestone for me because I have been working on this meaty idea for close to a decade with my mentor/collaborator Mairaj Syed.

We have come together as a team (along with other colleagues) many times over the years to brainstorm, read, write code, visualize, and ponder about the implications of our results. I had the opportunity to attend and give talks in quite a few humanities workshops, conferences, and classes. I also read many papers in religious studies and worked with a few humanities researchers for this and other projects, so I feel comfortable to comment on the best practices for fruitful collaborations. In this post, I will summarize my understanding of the two very distinct cultures (sciences and humanities) as a scientist. This is solely based on my personal experience, but I hope it might help both sides of digital humanities to understand each other and manage expectations.

Background/Our Story

Hadiths are a major source of available narratives on early middle eastern history. It is a huge body of narrations about the sayings of the Islamic prophet (and others) on different events and phenomena, that transmitters passed down from one generation to another (this claim is both refuted and supported by different evidences), and different generations recorded these in different books. Hadiths have far reaching implications to this day, as they often influence legal, cultural, political, and social decision making in the Muslim world, and they are also entangled with other cultures that are trying to understand Islam. The dynamics of these narratives reveal very important historical insights and provide a lens into the authenticity of the narratives and historical claims.

I was an undergraduate student at Bard College (NY) studying physics and computer science when I met Mairaj in 2011. He was teaching an advanced Islamic Legal Theory class that I took. Even though science and engineering students often complain about their humanities classes and I was no different initially, the liberal arts culture at Bard slowly changed my views. I was drawn towards the humanities and was increasingly taking advanced humanities classes to fulfill my requirements, instead of the usual introductory classes science students take to get by.

Mairaj was a young professor, fresh out of Princeton with a PhD in religious studies, and I would approach him after the classes to talk about the kind of research they do in general. Mairaj used to be a student of information management and knew some coding and databases, so unlike many other humanities professors he would enthusiastically discuss possibilities in digital humanities research. We explored a few things in 2011 and 2012 on Isnads and Matn (information on how transmitters passed the Hadith literature). I wrote some code to combine Matns, and discussed results with Mairaj, but that was pretty much it. The project seemed to die down.

After I left Bard to pursue a Master’s in Computing, I was increasingly drawn towards Network Science. In 2013, I reached back to Mairaj with the idea that we could interpret the Hadith literature as a social network of transmitters. Mairaj moved to UC Davis at the time. We collaborated again to explore some preliminary results in centrality metrics in the Hadith network. Mairaj was not an expert in any of web scraping, data mining, or network science, but over the next few years he taught himself a lot of the tricks by studying my and other people’s code.

When I started my PhD at MIT, we made an informal habit of visiting each other at MIT or UC Davis, and would have these few-days long hackathons where I would code non-stop and Mairaj would organize our data into better databases or historically interpret the computational results. We would discuss results and visualizations and talk about writing a book together. In fact, in 2015 I received a book offer from Princeton University Press, but we were not entirely sure if we could commit the time required for such a project so we passed the offer after drafting some parts of the book.

From 2016, we doubled down on our efforts and made consistent progress. New members would join our team regularly. Some left and some stayed. Mairaj took care of all the management, with occasional support from myself. We started writing and getting grants. We eventually had some excellent technical people in the team (Danny Halawi, Mohamed Alkaoud, and recently Shuaib Choudhry) so I did not have to do heavy lifting on coding the analytics infrastructure anymore. We also had other humanities researchers give us feedback and discuss our work over long sessions. At this point, I shifted my attention and time to some of the theoretical questions that I have been exploring since the beginning, such as, what does the overall structure and dynamics of the Hadith network look like? What does it say about the current notions of Hadith transmission and disputes on early-Islamic history? I attended history and religious studies workshops and conferences since 2014, presenting my research on these questions.

At the moment, our work's portfolio includes 1400 parsed books, with 50,000 authors and millions of links between them, and a collection of data analysis and visualization techniques to understand the narration dynamics in this huge network.

Note that this was not my PhD research. My main interests are Human-Computer Interaction (HCI), Data Science, Interactive Computing, Design, and Wearable Sensors. However, I have been very passionate about this project, and whenever I got a chance or needed a break from my PhD work I would work on this. Having read some other religious studies PhD works, I feel comfortable to say that the impact of our team’s Hadith analytics works could be equated to several PhDs. I would not dare call it a second PhD for myself because I know my readings are not at the level of an actual humanities PhD. However, having done a lot of scattered readings over the years, I do entertain the idea at times that I should take a religion PhD qualifying exam to figure out where I am really at in this regard.

With the background out of the way, let me summarize a few key differences between the sciences and humanities cultures. When speaking about science, I speak from experience in the CS, Physics, and Applied Math communities, but I have heard and read many stories from other science PhDs that I feel comfortable to make some general statements.

Cultural Differences

  1. Publication expectations: Probably the first thing a scientist (especially a computer scientist) would find surprising in the humanities land is the pace of publications. While quantity of papers doesn't reflect quality, computer scientists are used to submitting substantial new works and ideas to conferences every year. The pace at first may seem slow in a digital humanities collaboration.

One reason maybe is just the nature of research. Often times writing 500 lines of code to deal with a dataset is much easier than reading through big volumes of text and understanding the nuances presented by their authors. Even though coding clever algorithms is no small task, the humanities works often are the long and tedious ones. The other reasons for such prolific activity in the sciences have good and bad causal forces. The bad force is how money and grants are set up in the current system. The outcomes of grants are usually publications, and there is more money allocated for science/engineering. Therefore, many “garbage papers” are written in abundance in a “prolific” field like computer science to keep the money flowing. Even good scientists often feel pressured to churn out useless papers, and ignoring that mindset is a challenge when you are aiming for a job market.

Having said that, I think there is room for improvement in the humanities too, where the sciences are more or less doing it right. It boils down to team-based work.

  1. Team culture: The research works in humanities are often done by lone rangers, instead of large international collaborations and teams that are frequently seen in the sciences. For no-BS prolific scientists, frequent publications is the way to disseminate knowledge asap and open it up for critique, and science appears to move faster because of that. The challenge for a reader is to then filter through the incremental works described in #1 and find these gems. Once you build a reputation as a no-BS scientist/team, it’s easier to just find and follow your work.

This kind of international teamwork mindset could be adapted in the humanities too, with better means of knowledge distribution.

  1. Problem selection: When dealing with theory, I personally favor research in the Pasteur’s Quadrant, fundamental research that has applied use cases. Search for a never-ending series of abstractions — ignoring the human meaning making process — sometimes seems redundant to me. It is a view that I share with Sabine Hossenfelder, and this style of research seems to be prevalent among many theorists.

Interestingly, I think I have seen this trend in both the humanities and sciences. Without pointing out specific works, I would just say that the premature search for can-explain-it-all theories is quite abundant in both fields, whereas we should first look for provable hypotheses that may or may not generalize all too well. That is part of doing science.

  1. Superiority/Inferiority complex around technology: A recipe for disaster is to build an interdisciplinary team whose members don't know (or trust) each other’s strengths, or don't have the capability to understand the available strengths properly. Some scientists might think they are the genius here and doing others a favor. At the same time, humanities scholars may get excited at the prospect of applying technical methods without understanding the methods and their implications completely. It is inevitable for a new interdisciplinary team, and I was fortunate that in our team none of these happened, except occasional tensions at times.

I have observed such trends in other collaborations though. I have often seen HCI collaborations where a more technical person (say, a machine learning theorist) would be the person holding back the less technical but more humanities/social science oriented researchers, who may be waiting for the technical person to apply some cool techniques on their hard-earned, curated datasets. However, applying a cool method may not answer the question at hand, but both parties don’t seem to care as this might get them a new paper. To do good research, we need to understand each other’s works better. A corollary to this is that good team spirit in this realm means maintaining a delicate balance of trust and doubt, which brings me to my next point.

  1. Teacher and Colleague: In a good digital humanities collaboration, each person is both. As a teacher, you have to sometimes direct with certainty, but as a colleague you have to listen and trust. In our work, Mairaj’s subject matter expertise and mentorship guided my investigations. However, my emphasis on certain techniques paid off, and Mairaj had to trust me on my mathematical instincts on his subject matter.

  2. Presentation: This is a very minor but funny one. Let’s just say I was not impressed by slides with poor visual choices, such as cyan/purple backgrounds, full of yellow and red colored text, with no pictures, that some North American humanities professors were just reading out loud in some conference sessions. While I did see very good quality presentations in the same conferences, I didn’t realize slides that can instantly make you fall asleep could be part of the same sessions. You say I shouldn’t judge from a few conferences? I agree, but it was unexpected nonetheless. I am used to seeing demo videos with published papers, and well crafted slides for presentations in HCI. So naturally I would expect that professors who teach in US universities do not just read their slides that are full of text. Dissemination of knowledge in widely understandable and enjoyable formats should be prioritized by any researcher.

Philosophical Differences and Similarities

I am not entirely qualified to comment on such a huge topic, but I can talk about this philosophy from the perspective of cognitive science and embodied mathematics, a field that formed the backbone of my PhD dissertation. To begin to understand the philosophical differences between sciences and humanities, and to act on reducing our existing biases and assumptions, we perhaps need to understand the nature of mathematical abstractions first.

A. The nature of abstractions: Mathematics is feared by many, and a lot of works in the digital humanities require some level of mathematical maturity (coding is a form of mathematics too). The reason most people often feel lost in this land is because the gradual layers of abstractions and metaphors are often too many to form a coherent mental model. Let me explain with a diagram from my PhD dissertation.

This is a rough sequence of mathematical abstractions that we are introduced to over a general learning trajectory in STEM. Embodied math is the domain that most humans understand naturally (object collection, shapes, drawing, manipulating small collections). The progression of subsequent mathematical reasoning is often based on abstractions that are introduced as metaphors. This view is championed by cognitive scientists.

The abstractions are often composed of layers of metaphors. For example, to really understand an algebra equation, you have to understand common metaphors in algebra, such as "variables are boxes containing objects and each box can count how many objects are inside", or + and - are the equivalent of adding or removing objects from the box. Just like a language, the metaphors can vary. Different metaphors may lead to the same outcome, and you can chain different metaphors together to get to an abstract concept. To put it simply, math is a collection of layers of metaphors. To understand math as a whole, we  need to establish connections between each metaphor, and at some point the layers may become cognitively arbitrary for many unless you actively trained to keep the connections in mind, which happens very rarely.

This is not only true for math. This is in fact true for any spoken language (George Lakoff's book Metaphors We Live By shows excellent examples of how a language is a collection of grounded symbols and layers of metaphors on top of these symbols), music (check out Adam Neely's excellent video "what does music mean" to understand how musical compositions are metaphors), sketching and drawing (check out Part V of the book Symbol Formation (1963)), etc.

Sign game: mathematical rules can be arbitrary and context based. This does not only pose a cognitive challenge for the beginners, experts can be the victims of this trap too. Often a veteran computer scientist might use a wrong method because the context for the problem is different than what she thought the rules suggest. Sometimes the rules create new context, especially if you are always relying on the rules to guide yourself in making meaning. Wittgenstein called this phenomenon sign game. To put it simply, when you are relying on abstractions and not thinking through the metaphorical layers, there are chances that the shortcuts you are taking in your reasoning might land you in the wrong interpretation regions.

B. Implications of abstractions: The way humans create meaning and form abstractions have profound implications in science, math, and humanities cultures. Any mathematical analysis of an event that is perceived by a human requires translating the event into symbolic abstractions, constructing the necessary layers of metaphors along the way. On your way back from the abstraction land to embodied meaning making, you may introduce some more metaphors and context that may or may not be consistent with the existing ones that you used on your way in. The same could be said about the humanities when a scholar applies a certain framework (say, a form of exegesis, feminism, a political theory etc.) to an event to make some nuanced meaning out of it.

When viewed from this perspective, both data scientists and humanities scholars have something to be humble about. However, the burden is often more on the number-crunching folks. Translating embodied, perceived versions of events to an abstraction through metaphors is a tricky task. A data scientist treating numbers and data analysis as the truth will often fall into the trap of wrong interpretation, and this happens so often that it is embarrassing, and our predictions may turn out to be wrong.

Does it mean we stop predicting using numbers? No. Mathematical abstractions are powerful tools for thinking and taking shortcuts in the process of "meaning making". What would be lifetime works of thousands of Hadith scholars, data mining, machine learning, and analytics can potentially turn that into a few days worth of work. However, it comes with the risk of wrong interpretations and predictions, or missing out nuances. A good approach to any problem is therefore to examine it from both perspectives.

There is no greater or smaller work here. Stop thinking you are special if you know how to crunch numbers and many others don't! You simply are an expert in connecting some metaphor layers, and others can connect other forms of metaphors. I say this because I am tired of seeing how we, the computer scientists, are over-praised, and are over-confident about what we do to "change the world" in the name of innovation. Computer Science has given us very powerful tools and metaphors to model the world's information, but it comes with its own baggage.

C. How do we get out of the abstractions trap? One possible way, out of a few, is to design better tools that take human cognition and cultures into account. Good tools would take away the headache of connecting the layers of metaphors, and/or make the connections between the layers clear for different contexts. This could eventually help both sides of digital humanities. Scientists use computational tools heavily, and the artifact(s) produced by these tools is their window into interpreting the reality. Tools that retain explainability is therefore the key, and such design efforts require collaboration between designers, algorithmists, and humanities scholars. You can read about how I approach this problem in my research on embodied math.

Final Thoughts

The abstractions used in qualitative and quantitative research are fundamentally different. Each type of abstraction has its strengths and weaknesses. Some abstractions require more training than others, but I believe well designed tools can mitigate this problem eventually in the future. We also need a culture of appreciation for different abstractions. Just because I am connecting metaphor layers that some others have not trained for, does not mean they are connecting concepts, contexts, and metaphors that I will understand right away. So no one job is essentially “easy” or “hard”, it boils down to which abstractions your culture made you think were superior. As for the cultural differences I stated here, I hope these kinds of discussions will happen more, so the work and collaboration cultures become easier for both technical and humanities scholars.

Back to blog