Over the years, we have witnessed the issues or multiple peer-reviewed papers being recalled. A recent example as reported in numerous places, Reference 1 states: “The Dana-Farber Cancer Institute (SCFI), an affiliate of Harvard Medical School, is seeking to retract six scientific studies and correct 31 others that were published by the institute’s top researchers, including its CEO. The researchers are accused of manipulating data images with simple methods, primarily with copy-and-paste in image editing software, such as Adobe Photoshop.”
There were allegations of data manipulation in 57 SFCI-led studies. [Ref. 2] There has been an increase in the application of AI applications being employed to check for fraudulent imagery. In an editorial [Ref. 3] in Science, they assert that they are using Proofig to look for image duplication or other types of image modifications. They also employ iThenticate for plagiarism detection.
In a related area, AI is running into copyright difficulty with its generated images. The IEEE Spectrum magazine [Ref. 4] has an article on the potential for copyright violations. One example shows a generated article almost 90% identical in words and sentences from a New Youk Times article. While this article references this type of result to plagiaristic outputs, it is plagiarism if a person did that. The ability of AI generated texts to create imaginary references has been referenced as having hallucinatory output. A key question that was generated was: is there any way for a user of the generative AI to ensure there is not copyright infringement or plagiarism? A good question that will need to be answered. In the evaluation of images, the researchers found hundreds of instances where there was very little difference for recognizable characters in video and games. This analysis was based on a very limited study of subjects (a few hundred).
While the use of Generative AI is becoming more widespread, even careful reviews of the data and pictures will not prevent the misuse of the results. In the April 2020 Blog [Ref. 5] the topic of scientific integrity and COVID-19 was covered in detail. The key points were that even with a solid research foundation the results can be subject to misinterpretation by people who are unfamiliar with various techniques of analyzing the data. Another point in that blog is that when the results of an analysis are reduced to a single number, the potential for creating inappropriate impressions is high. So, the construct of the model and the assumptions are very important.
This brings up another question of what are the under pinnings of Artificial Intelligence programs. What are the algorithms that are being employed AND do these algorithms interact with each other. As described in earlier blogs involving expert systems work in the 1980s, the expert system is based on the environment (data analyzed) it was created for. The expert systems then improved its performance based on the new data acquired though its operation. This is a problem of self-biasing. AI programs are built on a base of information. Sometimes the data absorbed is protected, e.g., the New York Times database. So, all the data might not be available. If one were to focus on a single database and develop that for projecting future information, there would be significant difference in news projection depending on if the data were obtained from CNN or Fox News.
The applications and even the development of new tools for creating reports and the complementary programs for evaluating the veracity of the information presented are still in the very early stages of development. This year, 2024, should witness some interesting development in the application of AI tools. Significant assist in medicine is being provided already and more should be coming. It just requires careful application of the programs and understanding the data.