There is a rapidly growing debate around the use of AI for the development of systematic reviews. As someone fairly new to AI but curious to learn more about its applications to systematic reviews, I had the privilege to attend a two-day seminar led by internationally acclaimed author Dr Piers Steel and global meta-analysis expert and co-founder of the online meta-analytic platform HubMeta, Dr Hadi Fariborzi ****on the use of ChatGPT and other AI tools for systematic reviews.

In this blog, co-written with fellow seminar participant Anthea Sutton, Research Fellow at Sheffield Centre for Health and Related Research (SCHARR), University of Sheffield, we share some reflections on how AI can be applied to the undertaking of systematic reviews.

Identifying the research question

The seminar was designed to illustrate the use of selected AI tools for each step of the systematic review process. Kicking off with identifying the research question, the course leaders tasked us to set up our own ChatGPT Custom Instructions, which for us looked something like this:


Custom Instructions allow ChatGPT to provide more tailored answers based e.g. on job/profession, interests, location etc., and in a style dictated by us, e.g. formal, informal, long, concise and it can also provide an opinion or remain neutral. It might be worth remembering at this point that ChatGPT is not a search engine - where you can expect an answer from a single line of questioning, but a large language model (LLM), which works best if constantly trained or rather prompted, just as though you were having a conversation with it (the hint is in the name, right?).

The next step was to conduct a scoping exercise to check the importance of our review question. We asked ChatGPT 3.5 “If I did a comprehensive meta-analysis [or systematic review] on [topic of our choice], what theories could I test or advance?”, “What would be the practical significance of these contributions?”. Though it can be a good place to start off for gathering ideas and inspiration, it was stressed that ChatGPT content needs always to be verified and tested.

Refining the research question

The course leaders highlighted an inherent problem with sharpening the research question into something searchable. On one side, we are searching for something we don’t fully know yet, that is our idea of the topic is shaping up as we conduct the searching. On the other side, terminology may vary and two same words can mean different things and two different words can mean the same thing. With this in mind, we were shown the use of ChatGPT to create a list of synonyms from an initial database search. ChatGPT needs to be trained to work best so a conversation that starts along the lines “here’s a list of terms that I have used to search for articles as an information specialist conducting a systematic review [enter terms]. Are there any other terms you would suggest related to [enter keyword]?”, and goes on with “from your list these are the ones I like, remembering the ones I didn’t like, would you suggest any other terms?”, might be a way of getting ChatGPT know your topic better. Unlike search engines, this is an iterative process and when ChatGPT stops giving you any new synonyms, this is as far as it goes (though it can take quite some time!).

Developing and translating search strategies

We had a go at some prompt engineering to see how ChatGPT performs, or could be trained, to develop a search strategy. Here’s our attempt at developing a PubMed search strategy using four-level prompt engineering for the review topic “Family therapy approaches for anorexia nervosa”:

There is a fascinating piece of research on using ChatGPT for constructing Boolean searches, including limitations (e.g. high level of variability and incorrect/missed subject headings), which can be found in Wang, S., Scells, H., Koopman, B. & Zuccon, G. 2023. Can ChatGPT write a good boolean query for systematic review literature search? arXiv preprint arXiv:2302.03495.

Similarly, we tested ChatGPT’s capability of translating search strategies. Perhaps this was where the generative language model left us less persuaded as it required a fair bit of human intervention to make the translated searches work across different databases.

Managing results

In addition to the more popular software packages that are used for managing citations in systematic reviews such as EndNote, Zotero, Covidence and Rayyan, we were introduced to HubMeta. HubMeta is a freely available online platform that enables researchers to create multiple projects, deal with thousands of citations and share them with teams. HubMeta performs automatic de-duplication and provides a summary of the de-duplicated references but it also allows for manual de-duplication review. Other functionalities include title and abstract screening, full-text screening, coding (e.g. tagging) as well as automatic language detection. It might be worth mentioning that HubMeta is integrated with Dropbox where all the PDFs need to be stored (due to copyright regulations PDFs cannot be uploaded directly in HubMeta). The course leaders also informed us that it does not work well with University email addresses, and recommended using a personal email account (such as Gmail). Further information including training on HubMeta is available on the HubMeta YouTube channel.

Citation chaining

Over the course of the seminar, we explored the use of automated tools for citation chaining (e.g. snowballing, backward and forward citation searching). Platforms such as Connected Papers and Research Rabbit may be already known to some of us, but we were also given a demo of Litmaps and Inciteful. Although powered by different (and yet unclear) algorithms, all of these platforms have in common the feature of producing a visual map or network of citations: