In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.
http://bit.ly/2RVVeaz
Αρχειοθήκη ιστολογίου
-
►
2023
(138)
- ► Φεβρουαρίου (74)
- ► Ιανουαρίου (64)
-
►
2022
(849)
- ► Δεκεμβρίου (61)
- ► Σεπτεμβρίου (74)
- ► Φεβρουαρίου (65)
-
►
2021
(2936)
- ► Δεκεμβρίου (59)
- ► Σεπτεμβρίου (180)
- ► Φεβρουαρίου (325)
-
►
2020
(1624)
- ► Δεκεμβρίου (293)
- ► Σεπτεμβρίου (234)
- ► Φεβρουαρίου (28)
-
▼
2019
(13362)
- ► Δεκεμβρίου (19)
- ► Σεπτεμβρίου (54)
- ► Φεβρουαρίου (5586)
-
▼
Ιανουαρίου
(5696)
-
▼
Ιαν 27
(41)
- Anti-Inflammatory Effects of Aurantiochytrium lima...
- Population based study: atopy and autoimmune disea...
- Plasma Cell Leukemia: Definition, Presentation, an...
- Effects of crab shell extract as a coagulant on th...
- Reproducibility of the Peritoneal Regression Gradi...
- ICU [Humanities]
- Work-life advantages of becoming a salaried physic...
- Emergency physicians and public health experts cal...
- Why do patients often lie to their doctors? [News]
- Short-term increase in self-reported cannabis use ...
- A crude approach to evaluating cannabis legalizati...
- Secure care: more harm than good [Correction]
- Diet quality in Canada: policy solutions for equit...
- Positive perilymph fistula test with semicircular ...
- Mycoplasma genitalium infection [Practice]
- The inquiry model of medicine [Humanities]
- The risk of infective endocarditis among people wh...
- "CRISPR babies": What does this mean for science a...
- Emotion Recognition in Low-Spatial Frequencies Is ...
- Circular RNA Profiling by Illumina Sequencing via ...
- Probiotic Studies in Neonatal Mice Using Gavage
- A Controlled Mouse Model for Neonatal Polymicrobia...
- Sexual Transmission of American Trypanosomes from ...
- Prevalence of Pulmonary Embolism in Patients Prese...
- Surgical treatment of liver metastases from kidney...
- Evaluation of website information provided by paed...
- Collagenase injections for Dupuytren's contracture...
- Does knee position during wound closure alter pate...
- Outcomes from cytoreduction and hyperthermic intra...
- Ureteric implantation into the bowel portion of au...
- Euglycemic Ketoacidosis in Spinal Muscular Atrophy
- Handling Big Data Scalability in Biological Domain...
- Levels of CEACAM6 in Peripheral Blood Are Elevated...
- Proinflammatory Role of Angiotensin II in the Aort...
- “Crack, Reduce, and Implant”: A Safe Phaco Techniq...
- Cancers, Vol. 11, Pages 147: A Simplified Genomic ...
- Cancers, Vol. 11, Pages 148: Histone Deacetylase I...
- Differentiated-Type Intraepithelial Neoplasia-Like...
- Inhibition of LOXL2 Enhances the Radiosensitivity ...
- Vitamin D Regulates the Expressions of AQP-1 and A...
- Bioinformatics Analysis Reveals the Altered Gene E...
-
▼
Ιαν 27
(41)
-
►
2018
(66471)
- ► Δεκεμβρίου (5242)
- ► Σεπτεμβρίου (5478)
- ► Φεβρουαρίου (4835)
- ► Ιανουαρίου (5592)
-
►
2017
(44259)
- ► Δεκεμβρίου (5110)
- ► Σεπτεμβρίου (5105)
-
►
2016
(7467)
- ► Δεκεμβρίου (514)
- ► Σεπτεμβρίου (1038)
- ► Φεβρουαρίου (793)
Αναζήτηση αυτού του ιστολογίου
Κυριακή 27 Ιανουαρίου 2019
Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
-
Αλέξανδρος Γ. Σφακιανάκης Medicine by Alexandros G. Sfakianakis,Anapafseos 5 Agios Nikolaos 72100 Crete Greece,00302841026182,0030693260717...
-
heory of COVID-19 pathogenesis Publication date: November 2020Source: Medical Hypotheses, Volume 144Author(s): Yuichiro J. Suzuki ScienceD...
-
Alimentary Pharmacology &Therapeutics, EarlyView. https://ift.tt/2qECBIJ
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου
Σημείωση: Μόνο ένα μέλος αυτού του ιστολογίου μπορεί να αναρτήσει σχόλιο.