Anat Cell Biol 2023; 56(1): 86-93
Published online March 31, 2023
Copyright © Korean Association of ANATOMISTS.
Phisamon Kengkard1 , Jirachaya Choovuthayakorn1 , Chollada Mahakkanukrauh1 , Nadee Chitapanarux1 , Pittayarat Intasuwan2 , Yanumart Malatong2 , Apichat Sinthubua2,3 , Patison Palee4 , Sakarat Na Lampang5 , Pasuk Mahakkanukrauh2,3
1Faculty of Medicine, Chiang Mai University, Chiang Mai, 2Department of Anatomy, Faculty of Medicine, Chiang Mai University, Chiang Mai, 3Excellence in Osteology Research and Training Center, Chiang Mai University, Chiang Mai, 4College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 5Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Chiang Mai, Thailand
Correspondence to:Pasuk Mahakkanukrauh
Department of Anatomy & Excellence in Osteology Research and Training Center, Chiang Mai University, Chiang Mai 50200, Thailand
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Age at death estimation has always been a crucial yet challenging part of identification process in forensic field. The use of human skeletons have long been explored using the principle of macro and micro-architecture change in correlation with increasing age. The clavicle is recommended as the best candidate for accurate age estimation because of its accessibility, time to maturation and minimal effect from weight. Our study applies pre-trained convolutional neural network in order to achieve the most accurate and cost effective age estimation model using clavicular bone. The total of 988 clavicles of Thai population with known age and sex were radiographed using Kodak 9000 Extra-oral Imaging System. The radiographs then went through preprocessing protocol which include region of interest selection and quality assessment. Additional samples were generated using generative adversarial network. The total clavicular images used in this study were 3,999 which were then separated into training and test set, and the test set were subsequently categorized into 7 age groups. GoogLeNet was modified at two layers and fine tuned the parameters. The highest validation accuracy was 89.02% but the test set achieved only 30% accuracy. Our results show that the use of medial clavicular radiographs has a potential in the field of age at death estimation, thus, further study is recommended.
Keywords: Clavicle, Age estimation, Radiography, Convolutional neural network
Over the past decades, crimes and pandemics have posted a real challenge for experts to establish biological profiles of human remains. Over 1,800 bodies from the Indian ocean Tsunami in 2004 remain unidentified even 7 months after the incidence . Apart from that, there are growing cases of murdered victims yet to be given justice. According to Thai ministry of justice, for the past 17 years (2002–2019), over 4,000 unnamed bodies have been reported . The European union and The US also face the same problem. The EU has more than 1,500 accumulated bodies  while the US has 4,000 bodies reported per year . In all country, money and resources are spent storing and managing the bodies especially in Thailand where those bodies have to be kept up to 20 years . However, the major portion of those bodies are of the people who have been reported missing. In order to improve the situation, a more effective way of constructing one’s identity is necessary and this is where age plays a crucial role.
Bone has long been recognized as one of the five key elements in identification process . Its relationship with age and persistency makes it the most preserved part that remain in the crime scene. Various bone parts have been used by forensic anthropologists as an age indicator for example: pelvis, cranium, femur and clavicle [6-8]. Because of the characteristic of clavicles: low weight bearing, the longest time to maturate, easy to access, they are considered as the best candidate for accurate age estimation. The statement is correlate with Marera and Satyapal  study which found that estimating age from clavicle yields the most accurate result. However, methods used nowadays mainly rely on observing epiphyseal plate fusion of clavicles which could only predict age of young adult and adolescent. In addition, it requires specialists to interpret the result which leads to the problem of subjectivity and time consumption . An alternative method which is widely used in bone age estimation other than epiphyseal plate union is bone mineral density (BMD). However, Botha et al.  stated that BMD in blacks does not significantly correlate with age which disagrees with previous research. Other methods used in this field: cortical thickness and histomorphometry are both impractical in real life application due to their methodology inconsistency and sample variety. Apart from this, histomorphometry is destructive to the specimen and should be an alternative way to forensic human identification.
Imaging techniques used to estimate bone age are conventional radiograph, digital X-ray radiograph (DXR), dual-energy X-ray absorptiometry (DXA), computed tomography (CT), magnetic resonance imaging (MRI) [12, 13]. DXA, CT, and MRI are considered hard - to – obtain and not cost efficient in our country. Consequently, digital radiography, which is easy to obtain and cost efficient, was chosen. Furthermore, DXR, compared to conventional radiograph, provides clearer and more accurate pictures. The concept of imaging and age estimation is based on the knowledge that bone density will deteriorate overtime due to bone modeling and remodeling mechanism. As we age, the remodeling ability diminishes, accompanied by changes in hormonal response and reduced production of calcitriol, results in increase of bone porosity .
Artificial Intelligence is an ability of a computer or robots to do tasks that are usually done by human. Deep learning is a type of artificial intelligence that imitates the way humans gain certain type of knowledge. There are 2 main types of deep learning: Artificial Neural Network (ANN) and Convolutional Neural Network (CNN). ANN refers to a network that resembles human neural network. After receiving signal from data, neurons weigh their data which is passed on from layer to another. If the system detects any defect, it will change the weight of that layer. CNN refers to a system that learns mainly by pattern recognition and is widely used in images recognition – classification and object detection. It has been used for analyzing medical images in various field such as prediction of blood glucose level from electrocardiogram , Not repeated in the main text. Please spell it out. evaluation of lung nodules on chest CT and classification of ear abnormality.
Thus, we aim to embrace the help of CNN (GoogLeNet; Google Inc., Mountain View, CA, USA) with digital radiograph to create a more convenient, refined, and non-subjective method to estimate age from medial clavicle in a Thai population, which has not yet been done. This would promote precise personal identification and facilitate the forensic team to return justice to the dead.
Human dry-bone specimens of Thai national (Mongoloid) were provided by the Forensic Osteology Research Center, Faculty of Medicine, Chiang Mai University, Thailand. The study sample encompasses 374 female clavicles and 641 male clavicles. All samples were collected between 2006–2018 with the age at death ranging from 20–100. The cause of death was mostly natural. Individuals with health condition affecting bone mass density for example Thalassemia or with visible bone deformities such as fractures and erosions were excluded. Further biomedical information such as weight, lifestyle and personal medication was not available. The study was approved by the Research ethics committee, Faculty of Medicine, Chiang Mai University, Thailand (Research ID: ANA2563-07823).
The samples were placed in an anteroposterior position with distal ends placed on the fixed clay and medial ends elevated 1 cm above the x-ray plate. Each radiograph comprised of 8 pairs of clavicles and the main focus is on the medial end of the clavicles (Fig. 1). Radiographs of the specimens were obtained using Kodak 9000 Extraoral Imaging System (Eastman Kodak, Rochester, NY, USA) with the criteria: 60 kV, 2 mA, DC 1 seconds, 124 cm of focus-film distance at the Oral Radiology Clinic of Faculty of Dentistry, Chiang Mai University, Thailand. The radiographs were then stored as bitmap files, organized by collection number, sex and age.
The radiographs of the clavicular sample were processed using Adobe Photoshop 2019 (version 19.1.6, Adobe Systems Inc., San Jose, CA, USA). The region of interest was selected at the medial end of each clavicle using a template size 3×3 cm2 (261×261 pixels) with a reference point being at the sternal border of the clavicle (Fig. 1). Images containing artifacts such as remnant of clay were excluded. The number of total samples used was 625 and 363 for male and female respectively, allowing them to be distributed in 8 age groups (Table 1).
CNN which excels at image classification were used in this study . A pre-trained network called GoogLeNet was adopted to minimize training duration and reduce the need for large amount of training data . The loss3-classifier and output classification layer of the network were substituted with ‘Fully connected layer’ and ‘The final class output’. Other parameters were adjusted based on data set characteristics. MATLAB R2021b (MathWorks, Natick, MA, USA) were used as the main platform of this study.
To prevent overfitting due to small sample size, all images were randomly augmented during training. The samples could be reflected horizontally, rotated from –60 to 60, rescaled from 0.5 to 1.5 and translated horizontally and vertically from –10 to 10.
The network was tested to evaluate initial performance of GoogLeNet model. Two sets of experiment containing different datasets were carried out using the same parameters (InitialLearnRate 0.001, Epochs 30 and MiniBatch Size 8) to evaluate the effect of sex on age estimation. The validation accuracy of male-only dataset and male-and-female dataset were 31.89% and 25.91% respectively. The results imply that gender specific data yield better outcome despite having smaller size of training data. Thus, shifting this research focus to development of gender-specific network. Hence, only male data were used in this research.
From the initial experiment, the model was found to be inadequately trained due to small dataset and unequally distributed data. After careful consideration, the category 20–29 was excluded from the study since its number of samples were significantly smaller than others. From each remaining 7 categories, 20% of samples were randomly separated as a test set. Then, Generative Adversarial Network (GAN) was used to generate 500 more samples per category with Epochs 5,000 and MiniBatchSize 16. Newly generated images that did not resemble clavicular structure were excluded. The total 3,999 figure samples of both original and generated clavicular images were randomly allocated to training and validation set with the ratio of 75:25 (Table 2).
After the training process with the highest validation accuracy and lowest loss training, the test set was assessed by the network to evaluate the performance of the model. The result of the trained network was exported into the workspace using MATLAB. Commands were used to loaded pictures of clavicles (120 blind images) into the current folder to predict the age of the test set. The code for prediction age was below the transfer learning with deep network designer live editor, initially separated during sample modification, to the network in a one-by-one manner.
The total of 3,684 images of male samples, after excluding from 3,999 images that did not resemble the clavicular structure, were used for this model with an age range at death from 30 to 100 years. A training dataset was used for CNN models, and a validation dataset was used to avoid overfitting. The original dataset was randomly divided into 75% for training and 25% for validation. Fig. 2 shows the confusion matrices from the training and validation dataset of models. The correct age prediction for each class was 96.67%, 59.17%, 62.5%, 93.33%, and 88.33% for 40–49, 50–59, 60–69, 70–79, and 80–89 classes, respectively. Because the age group of 30–39 and 90–100 have a small sample size, the confusion matrix that the auto-random test was auto-excluded those two groups for testing (Fig. 2).
Fig. 3 demonstrates training progress, accuracy, and loss after training of the final CNN model, the highest validation accuracy was 89.02% when applied hyperparameters of the age estimation model as follows: InitialLearnRate 0.0002, L2regularization 0.0002, MaxEpochs 50, MiniBatchSize 64, Momentum 0.99 and Validation frequency 25 (Fig. 3).
The testing image dataset was assessed by the network to evaluate the performance of the trained model. When testing dataset with 120 hidden images in a one-by-one manner was tested by the CNN model, the correct age was achieved with 30.0% accuracy (correctly assigned 36 out of 120 cases).
Clavicle is considered the bone of choice in age estimation due to its universal reliability compared to other bone parts. It has been proved to be the most reliable indicator for age estimation in radiological studies . The medial part of the clavicle is widely used in forensic field since it was less affected by weight than the body and acromion part. In postmortem condition, medial clavicle is also the least damaged part while the acromion part is highly fragile especially in elderly . The body is markedly affected by force and is composed with different cortical components and properties compared with acromion and sternal end. Accordingly, it is the most frequent (70%–80%) location of fracture .
In the current study, the area of 3×3 cm from sternal end of the medial clavicle was selected as a region of interest from the reason that the medial end is the most preserved part and would accurately depicts bone cortex and trabecular representing bone porosity. Physiologically, our bodies balance between bone degenerative process and remodeling process which the latter decreases as we age . This results in increase porosity and decrease bone mass in elderly . In our settings, using anatomical landmarks on radiographs to mark region of interest such as costoclavicular ligament  was not accomplished due to technical limitations and radiographs quality.
Currently, clavicle is used in age estimation by observing gross appearance of epiphyseal plate closure separating into 4 stages (Webb and Suchey classification) by using completeness of epiphyseal union. However, even though it is very accurate, it could only estimate age in adolescents and young adult (15–29 years old in Thai population) . The Study Group on Forensic Age Diagnostics Arbeitsgemeinschaft für Forensische Altersdiagnostik (AGFAD), recommended that the medial clavicular epiphyses using the radiographic images are one of the evidence to determine for adulthood legal age estimation [18, 20, 21] while estimation in other age range is still in debate . Thus, clavicles aged between 30–100 years old were chosen, seeking an alternative way to accurately estimate age at this range and clavicles aged under 30 years were excluded from this study.
According to Marera and Satyapal , age assessment using clavicle epiphyseal plate closure was more precise in males than in females. This correlates with the fact that female is affected from osteoporosis more and faster than male from smaller bone size and the process of estrogen (bone resorption) as the age advanced . The effect of estrogen mainly acts on trabecular bone  thus, male, and female individuals show a different pattern of osteoporosis and should yield different pattern. From this reason, male and female bones was not categorized as a same data set to train deep learning.
Our study also shows that when combined with female, the accuracy of age estimation using deep learning were 25.9% which is lower than male alone (31.9%). Besides, our Forensic Osteology Research Center houses bone collection of 278 Thai mongoloid males which is twice as much of female. Consequently, unlike previous studies, male is our main focus.
Regarding the knowledge mentioned before that bone porosity increases as age advances, nowadays a lot of research is conducted using various imaging techniques to detect bone porosity or bone mass for age estimation. These methods could widely expand age range of interest and are more practical compared to the epiphyseal method. According to Benito et al. , grey average together with cortical thickness of clavicle from radiograph show significant negative correlation with age. Similar results have been obtained by Botha et al.  and Navega et al.  which used BMD from DXA as an indicator. However, these two studies were conducted using femur instead of clavicle.
The advancement of Deep learning in recent years, especially the development of a certain type of neural network called CNN, has benefited medical field through the use of pattern recognition . In Forensic works, CNN has mainly been studied as a tool for age and sex assessment . Several modalities of images from X-ray, DXA, MRI and CT are applied to various networks, newly developed or pre-trained, to obtain the highest performing model . Navega et al.  accomplished in combining deep learning with DXA imaging modality of femur to estimate age at death in Caucasian female. In Mongoloid, this type of studies is still limited.
In this study, left and right clavicles were computed altogether since author wanted deep learning to be able to estimate age at death from both sides. This would allow applying this deep learning in real life situations where any sides of the clavicle could be left in crime scene.
The results from our first few experiments using solely original male clavicular images yield only 68% validation accuracy despite achieving 100% training accuracy. This indicates overfitting which, in layman’s terms, means that AI model has learned in a manner that is only applicable to the training sample and is no longer generalizable to the overall population. Ideally, to overcome this problem, more training data should be collected and general basic starting point of training of 1,000 images per category should be reached . However, in medical field, this task is merely impossible due to various reasons such as ethical privacy, high cost of obtaining data, requirement of data labelling from specialists and rare incidence of diseases [29-31]. Although there are several open access databases offering medical images for researchers, our study, which focus on Thai population, may not benefit from them. To tackle the insufficient data, our study used data augmentation, a technical solution widely adopted in medical imaging field [31-33].
GAN, a machine learning framework composing of a generator network and discriminator network, are extensively used for medical image synthesis . Many studies have evaluated the quality of data generated from GAN by using it for various tasks through training of other CNNs. Frid-Adar, whose study focused on image classification, reported a performance boosted from 88.4% to 92.4% specificity. Another study on lung cytology images also reported a better outcome after using GAN with a rise of 4.3% accuracy. Other studies have shown promising results as well [32, 34, 35]. These illustrate GAN’s generalizability thus can be apply with clavicular radiographs. The result from our study is in concordance with the mentioned studies as we found a 5% increase in validation accuracy of testing data.
Pattern recognition is the core concept of image classification carried out by CNN, implying that image with better quality would enable distinguish features to be drawn out more accurately . In his research, Samuel Dodge found noise and blur to have a negative impact on neural networks’ performance, including GoogLeNet’s. However, training CNN with low quality image may not always decrease the network performance. If the trained network is tested on images of the approximate quality, its performance is likely to be improved. On the contrary, if the test set contains higher resolution data, the network’s validation accuracy would have a tendency to plummet . Our study was situated in this scenario which the majority of images (3,186 of 3,684 or 86.5%) used for training were derived from GAN with apparent lower resolution while the test set comprised of only original images.
This could explain the wide gap between our training and testing validation accuracy, a sharp decrease from 87% to 30% accuracy respectively. Our problem could be solved utilizing GAN with optimal fine tuning for better resolution, but due to hardware and time limitation, this was not possible. Another explanation of our finding would be the unequally distributed data . From the confusion matrix (Fig. 2), it showed that our network performed best in the category with the most images and most of the false positive falls into that same category. Additional information such as socioeconomic, nutrition, occupation, and other factors which were not included in the assessment would be beneficial to our study.
Our initial hypothesis was that decreasing trabecular density, a known indicator for increasing age, would correlate with overall grey level of clavicular radiographs allowing the network to recognize difference pattern of each age group. However, with our imaging protocol, digital radiograph may not be able to capture minimal variation in grey level of each age group, resulting in poor categorization . Further studies using other imaging techniques such as MRI or CT should be considered as it would allow CNN to be trained with higher quality images.
Our preliminary study has given an insight into what CNN could achieve in the field of age at death estimation. Even though the accuracy of the test set is quite low, the results from this study show the various possibilities to apply the CNN model for age estimation in a Thai population sample and to decrease the subjectivity as well as errors in the measurement. The result shows possibility of using CNN as a part of identification tool although, facing major limitation of small dataset. The accuracy of network is expected to increase as more data and resources are provided. In the future, collaboration with other bone collections to expand our samples and to develop more accurate results is expected. Apart from that, a more extensive network modification with advanced technique would result in an even more practical model. The success of further development will present an opportunity for real life application of the technology.
This work was supported by the Faculty of Medicine, Chiang Mai University, grant no. 069-2565 for research funding. The authors are also gratefully thankful for the support from the Excellence Center in Osteology Research and Training Center (ORTC) with partial support from Chiang Mai University.
Conceptualization: PM. Data acquisition: PK, JC. Data analysis or interpretation: CM, NC. Drafting of the manuscript: PK, JC, CM, NC. Critical revision of the manuscript: PI, YM, AS, PP, SNL. Approval of the final version of the manuscript: all authors.
No potential conflict of interest relevant to this article was reported.