Shapiro–Senapathy algorithm

Shapiro–Senapathy algorithm

The Shapiro—Senapathy algorithm (S&S) is a computational method for identifying splice sites in eukaryotic genes. The algorithm employs a Position Weight Matrix (PWM) scoring formula to predict donor and acceptor splice sites in any given gene. This methodology has been used to discover splice sites and disease-causing splice site mutations in the human genome, and has become a standard tool in clinical genomics. The S&S algorithm has been cited in thousands of clinical studies, according to Google Scholar. It has also formed the basis of widely used software, including Human Splicing Finder, SROOGLE, and Alamut, which identify splice sites and splice site mutations that cause disease. The algorithm has uncovered splicing mutations in diseases ranging from cancers to inherited disorders, and predicted the deleterious effects of these mutations including exon skipping, intron retention, and cryptic splice site activation. == The algorithm == A splice site defines the boundary between a coding exon and a non-coding intron in eukaryotic genes. The S&S algorithm employs a sliding window, corresponding to the length of the splice site motif, to scan a gene sequence and detect potential splice sites. For each sliding window, the algorithm calculates a score by comparing the nucleotide sequence to a Position Weight Matrix (PWM) derived from known splice sites. This formula generates a percentile score, indicating the likelihood that a given sequence functions as a donor or acceptor splice site. The majority of disease-causing mutations in the human genome are located in splice sites. Clinical genomics studies analyze the splice site scores generated by the S&S algorithm to predict the consequences of splice site mutations including exon skipping and intron retention. The algorithm's sensitivity to single-nucleotide changes allows it to determine mutations that may impact RNA splicing and contribute to disease. In addition to identifying real splice sites, the S&S algorithm has been used to discover cryptic splice sites — alternative splice sites activated by mutations — which may disrupt normal splicing. The algorithm detects mutations that lead to the activation of cryptic splice sites, which may be located proximal to real splice sites or deep within non-coding introns. It has thus been used to determine the causes of numerous diseases that are due to cryptic splicing. == Cancer gene discovery using S&S == The S&S algorithm has been used to identify splice-site mutations in genes associated with several cancers. For example, genes causing commonly occurring cancers including breast cancer, ovarian cancer, colorectal cancer, leukemia, head and neck cancers, prostate cancer, retinoblastoma, squamous cell carcinoma, gastrointestinal cancer, melanoma, liver cancer, Lynch syndrome, skin cancer, and neurofibromatosis have been found. In addition, splicing mutations in genes causing less commonly known cancers including gastric cancer, gangliogliomas, Li-Fraumeni syndrome, Loeys–Dietz syndrome, Osteochondromas (bone tumor), Nevoid basal cell carcinoma syndrome, and Pheochromocytomas have been identified. Specific mutations in different splice sites in various genes causing breast cancer (e.g., BRCA1, PALB2), ovarian cancer (e.g., SLC9A3R1, COL7A1, HSD17B7), colon cancer (e.g., APC, MLH1, DPYD), colorectal cancer (e.g., COL3A1, APC, HLA-A), skin cancer (e.g., COL17A1, XPA, POLH), and Fanconi anemia (e.g., FANC, FANA) have been uncovered. The mutations in the donor and acceptor splice sites in different genes causing a variety of cancers that have been identified by S&S are shown in Table 1. == Discovery of genes causing inherited disorders using S&S == Specific mutations in different splice sites in various genes that cause inherited disorders, including, for example, Type 1 diabetes (e.g., PTPN22, TCF1 (HCF-1A)), hypertension (e.g., LDL, LDLR, LPL), Marfan syndrome (e.g., FBN1, TGFBR2, FBN2), cardiac diseases (e.g., COL1A2, MYBPC3, ACTC1), eye disorders (e.g., EVC, VSX1) have been uncovered. A few example mutations in the donor and acceptor splice sites in different genes causing a variety of inherited disorders identified using S&S are shown in Table 2. == Genes causing immune system disorders == More than 100 immune system disorders affect humans, including inflammatory bowel diseases, multiple sclerosis, systemic lupus erythematosus, bloom syndrome, familial cold autoinflammatory syndrome, and dyskeratosis congenita. The Shapiro–Senapathy algorithm has been used to discover genes and mutations involved in many immune disorder diseases, including Ataxia telangiectasia, B-cell defects, epidermolysis bullosa, and X-linked agammaglobulinemia. Xeroderma pigmentosum, an autosomal recessive disorder is caused by faulty proteins formed due to new preferred splice donor site identified using S&S algorithm and resulted in defective nucleotide excision repair. Type I Bartter syndrome (BS) is caused by mutations in the gene SLC12A1. S&S algorithm helped in disclosing the presence of two novel heterozygous mutations c.724 + 4A > G in intron 5 and c.2095delG in intron 16 leading to complete exon 5 skipping. Mutations in the MYH gene, which is responsible for removing the oxidatively damaged DNA lesion are cancer-susceptible in the individuals. The IVS1+5C plays a causative role in the activation of a cryptic splice donor site and the alternative splicing in intron 1, S&S algorithm shows, guanine (G) at the position of IVS+5 is well conserved (at the frequency of 84%) among primates. This also supported the fact that the G/C SNP in the conserved splice junction of the MYH gene causes the alternative splicing of intron 1 of the β type transcript. Splice site scores were calculated according to S&S to find EBV infection in X-linked lymphoproliferative disease. Identification of Familial tumoral calcinosis (FTC) is an autosomal recessive disorder characterized by ectopic calcifications and elevated serum phosphate levels and it is because of aberrant splicing. == Application of S&S in hospitals for clinical practice and research == The Shapiro–Senapathy (S&S) algorithm has played a significant role in advancing the diagnosis and treatment of human diseases through its application in modern clinical genomics. With the widespread adoption of next-generation sequencing (NGS) technologies, the S&S algorithm is now routinely integrated into clinical practice by geneticists and diagnostic laboratories. It is implemented in various computational tools such as Human Splicing Finder (HSF), Splice Site Finder (SSF), and Alamut Visual, which assist in interpreting the functional impact of genetic variants on RNA splicing. The algorithm is particularly useful in identifying pathogenic splice site mutations in cases where the clinical presentation is unclear or where conventional diagnostic methods have failed to identify a causative gene. Its utility has been demonstrated across diverse patient cohorts, including individuals from different ethnic backgrounds with various cancers and inherited genetic disorders. The following are selected examples illustrating its application in clinical research. === Cancers === === Inherited disorders === == S&S - Algorithm for identifying splice sites, exons and split genes == The Shapiro–Senapathy algorithm (SSA) was developed to identify splice sites in uncharacterized genomic sequences, with early applications in the Human Genome Project. The method introduced a Position Weight Matrix (PWM)-based approach to analyze splicing sequences across eukaryotic organisms, marking the first computational framework to systematically define splice sites using probabilistic scoring. Key innovations of the algorithm included: Exon Detection – Exons were defined as sequences bounded by acceptor and donor splice sites with S&S scores above a threshold, requiring an open reading frame (ORF) for validation. Gene Prediction – The method enabled the identification of complete genes by assembling predicted exons, forming a basis for later gene-finding tools. Mutation Analysis – The algorithm distinguishes deleterious splice-site mutations (which disrupt protein function by lowering S&S scores) from neutral variations. This capability allowed researchers to study disease-linked cryptic splice sites in humans, animals, and plants. SSA's PWM-based framework influenced subsequent computational methods, including machine learning and neural network approaches, for splice-site prediction and alternative splicing research. It remains a foundational tool in genomics and disease studies. == Discovering the mechanisms of aberrant splicing in diseases == The Shapiro–Senapathy algorithm has been used to determine the various aberrant splicing mechanisms in genes due to deleterious mutations in the splice sites, which cause numerous diseases. Deleterious splice site mutations impair the normal splicing of the gene transcripts, and thereby make the encoded protei

Variable data publishing

Variable-data publishing (VDP) (also known as database publishing) is a term referring to the output of a variable composition system. While these systems can produce both electronically viewable and hard-copy (print) output, the "variable-data publishing" term today often distinguishes output destined for electronic viewing, rather than that which is destined for hard-copy print (e.g. variable data printing). Essentially the same techniques are employed to perform variable-data publishing, as those utilized with variable data printing. The difference is in the interpretation for output. While variable-data printing may be interpreted to produce various print streams or page-description files (e.g. AFP/IPDS, PostScript, PCL), variable-data publishing produces electronically viewable files, most commonly seen in the forms of PDF, HTML, or XML. Variable-data composition involves the use of data to conditionally: exhibit text (static blocks and/or variable content) exhibit images select fonts select colors format page layouts & flows Variable-data may be as simple as an address block or salutation. However, it can be any or all of the document's textual content—including words, sentences, paragraphs, pages, or the entire document. In other words, it can make up as little or as much of the document as the composer desires. Variable data may also be used to exhibit various images, such as logos, products, or membership photos. Further, variable-data can be used to build rule-based design schemes, including fonts, colors, and page formats. The possibilities are vast. The variable-data tools available today, make it possible to perform variable-data composition at nearly every stage of document production. However, the level of control that can be achieved varies, based upon how far into the document production process a variable-data tool is deployed. For example, if variable-data insertion occurs just prior to output...it's not likely that the text flow or layout can be altered with nearly as much control as would be available at the time of initial document composition. Many organizations will produce multiple forms of output (aka: multi-channel output), for the same document. This ensures that the published content is available to recipients via any form of access method they might require. When multi-channel output is utilized, integrity between those output channels often becomes important. Variable-data publishing may be performed on everything from a personal computer to a mainframe system. However, the speed and practical output volumes which can be achieved are directly affected by the computer power utilized. == Origin of the concept == The term variable-data publishing was likely an offshoot of the term "variable-data printing", first introduced to the printing industry by Frank Romano, Professor Emeritus, School of Print Media, at the College of Imaging Arts and Sciences at Rochester Institute of Technology. However, the concept of merging static document elements and variable document elements predates the term and has seen various implementations ranging from simple desktop 'mail merge', to complex mainframe applications in the financial and banking industry. In the past, the term VDP has been most closely associated with digital printing machines. However, in the past 3 years the application of this technology has spread to web pages, emails, and mobile messaging.

Cooperative coevolution

Cooperative Coevolution (CC) in the field of biological evolution is an evolutionary computation method. It divides a large problem into subcomponents, and solves them independently in order to solve the large problem. The subcomponents are also called species. The subcomponents are implemented as subpopulations and the only interaction between subpopulations is in the cooperative evaluation of each individual of the subpopulations. The general CC framework is nature inspired where the individuals of a particular group of species mate amongst themselves, however, mating in between different species is not feasible. The cooperative evaluation of each individual in a subpopulation is done by concatenating the current individual with the best individuals from the rest of the subpopulations as described by M. Potter. The cooperative coevolution framework has been applied to real world problems such as pedestrian detection systems, large-scale function optimization and neural network training. It has also be further extended into another method, called Constructive cooperative coevolution. == Pseudocode == i := 0 for each subproblem S do Initialise a subpopulation Pop0(S) calculate fitness of each member in Pop0(S) while termination criteria not satisfied do i := i + 1 for each subproblem S do select Popi(S) from Popi-1(S) apply genetic operators to Popi(S) calculate fitness of each member in Popi(S)

Removal of Sam Altman from OpenAI

On November 17, 2023, OpenAI's board of directors ousted co-founder and chief executive Sam Altman. In an official post on the company's website, it was stated that "the board no longer has confidence in his ability to continue leading OpenAI". The removal was predicated by employee concerns about his handling of artificial intelligence safety, and allegations of abusive behavior. Altman was reinstated on November 22 after pressure from employees and investors. The removal and subsequent reinstatement caused widespread reactions, including impacts felt in the financial markets and technology sector. Microsoft, a partner of OpenAI, received little notice of the removal and experienced a drop in the share price of its stock. The removal also promoted interest in investigations from regulatory agencies. == Background == === OpenAI === OpenAI is an artificial intelligence firm founded in December 2015 as a non-profit entity. The for-profit division of the organization released ChatGPT in November 2022, contributing to a resurgence in generative artificial intelligence funding. The board of directors of the controlling non-profit formerly comprised chief scientist Ilya Sutskever, as well as Adam D'Angelo, chief executive of Quora, entrepreneur Tasha McCauley, and Helen Toner, strategy director for the Center for Security and Emerging Technology. As of October 2023, the company is valued at US$80 billion and was set to bring in US$1 billion in revenue. Altman has described OpenAI's relationship with Microsoft as the "best bromance in tech". OpenAI is uniquely structured, an intentional decision to avoid investor control. A board of directors controls the non-profit OpenAI, Inc. The non-profit owns and controls a for-profit company itself controlling a capped-profit company, OpenAI Global, LLC and a holding company owned by employees and other investors. The holding company is the majority owner of OpenAI Global, LLC.; Microsoft owns a minority stake in the capped-profit company. OpenAI's bylaws, enacted in January 2016, allow a majority of its board of directors to remove any director without prior warning or a formal meeting with written consent. === Sam Altman === Sam Altman is a co-founder of OpenAI and its former chief executive; Altman took over the company following co-chair Elon Musk's resignation in 2018. Under Altman, OpenAI has shifted to becoming a for-profit entity. Altman is credited with convincing Microsoft chief executive Satya Nadella with investing US$10 billion in cash and computing credits into OpenAI and leading several tender offer transactions that tripled the company's valuation. Altman testified before the United States Congress speaking critically of artificial intelligence and appeared at the 2023 AI Safety Summit. In the days leading up to his removal, Altman made several public appearances, announcing the GPT-4 Turbo platform at OpenAI's DevDay conference, attending APEC United States 2023, and speaking at an event related to Burning Man. == Events leading up to the removal == The resignation of LinkedIn co-founder Reid Hoffman, venture capitalist Shivon Zilis, and former Republican representative Will Hurd from the board allowed the remaining members to remove Altman. According to Kara Swisher and The Wall Street Journal, Sutskever was instrumental in Altman's removal. Disagreements over the safety of artificial intelligence divided employees prior to Altman's removal. The release of ChatGPT created divisions with OpenAI as a for-profit company without considerations for the safety of artificial intelligence and a non-profit cautious of artificial intelligence's capabilities; in a staff email sent in 2019 and obtained by The Atlantic, Altman referred to these divisions as "tribes". Prior to his removal, Altman was seeking billions from Middle Eastern sovereign wealth funds to develop an artificial intelligence chip to compete with Nvidia and courted SoftBank chairman Masayoshi Son to develop artificial intelligence hardware with former Apple designer Jony Ive. Sutskever and his allies opposed these efforts, viewing them as unjustly using the OpenAI name. Altman reduced Sutskever's role in October 2023, furthering divisions; Sutskever successfully appealed to several members of the board. Swisher and The Verge reporter Alex Heath stated that opposition to Altman's profit-driven strategy culminated in the DevDay conference in which Altman announced custom ChatGPT instances. According to Axios, the removal was driven by growing discontent and distrust with Altman. On November 22, 2023, reports emerged suggesting that Sam Altman's dismissal from OpenAI might be linked to his alleged mishandling of a significant breakthrough in the organization's secretive project codenamed Q. According to sources within OpenAI, Q is aimed at developing AI capabilities in logical and mathematical reasoning, and reportedly involves performing math on the level of grade-school students. Concerns about Altman's response to this development, specifically regarding the potential safety implications of the discovery, were reportedly raised to the company's board shortly before his firing. A report from The Washington Post in December stated that OpenAI's board of directors were concerned over Altman's allegedly abusive behavior; the complaints were purportedly a major factor in his removal. The Post previously reported that Altman's alleged pattern of deception and subversiveness that ostensibly resulted in his removal from Y Combinator ultimately resulted in the board's decision to remove him. In April 2026, an investigative report from The New Yorker found that Sutskever and others, in response to the board's request, had compiled an approximately 70-page-long annotated dossier consisting of internal communications, documents, and photos. The dossier claimed that Altman "exhibits a consistent pattern of [...] Lying", and that Altman misrepresented information to the company's senior management and board, particularly regarding safety issues. == Removal == On November 17, 2023, at approximately noon PST, OpenAI's board of directors ousted Altman effective immediately following a "deliberative review process". The board concluded that Altman was not "consistently candid in his communications". Altman was informed of his removal five to ten minutes before it occurred on a Google Meet while watching the Las Vegas Grand Prix. Within thirty minutes, Sutskever invited OpenAI chairman and president Greg Brockman to a Google Meet to inform him of Altman's removal. According to an internal memo obtained by Axios, the removal was not due to "malfeasance", and OpenAI chief executive Emmett Shear denied accusations that the removal was due to disagreements. The board publicly announced Altman's removal thirty minutes later. Chief Technology Officer Mira Murati was immediately appointed to interim chief executive officer. Hours after Altman's removal, Brockman resigned as chairman, joined by director of research Jakub Pachocki and researchers Aleksander Mądry and Szymon Sidor. During an all-hands meeting, Sutskever defended the ouster and denied accusations of a hostile takeover. An OpenAI representative requested former board member Will Hurd's presence. == Reinstatement == According to The New Yorker, Altman retreated to his San Francisco home and enlisted the help of communications consultant Chris Lehane and Airbnb chief executive Brian Chesky, as well as former staff and a legal team, to plan his reinstatement. Lehane encouraged Altman to engage on social media, while Chesky sent a journalist negative information about the board. Altman told interim CEO Murati that his team was conducting opposition research on her and the individuals responsible for his removal; Altman later stated he did not remember saying this. Altman insisted multiple times that all board members who supported his removal should resign. Tiger Global Management and Sequoia Capital had attempted to reinstate Altman, according to The Information; Bloomberg News reported that Microsoft and Thrive Capital were seeking Altman's reinstatement. On November 18, The Verge reported that OpenAI's board of directors discussed reinstating Altman. The board agreed in principle to resign and to allow Altman to return, but missed the deadline. According to The Verge, Altman was ambivalent about returning and would seek significant changes to the company, including replacing the board. A list of directors had been prepared by investors in the event that the board steps down, and purportedly included former Salesforce executive Bret Taylor. According to chief strategy officer Jason Kwon, OpenAI was optimistic it could return Altman, Brockman, and other employees. On November 19, Altman and Brockman appeared at OpenAI's headquarters to negotiate, mediated by Nadella. According to Bloomberg News, Murati, Kwon, and chief operating officer Brad Lightcap were pushing for a new board of direc

Rifts (role-playing game)

Rifts is a multi-genre role-playing game created by Kevin Siembieda in August 1990 and published continuously by Palladium Books since then. It takes place in a post-apocalyptic future, deriving elements from cyberpunk, science fiction, fantasy, horror, western, mythology and many other genres. Rifts serves as a cross-over environment for a variety of other Palladium games with different universes connected through "rifts" on Earth that lead to different spaces, times, and realities that Palladium calls the "Rifts Megaverse". Rifts describes itself as an "advanced" role-playing game and not an introduction for those new to the concept. Palladium continues to publish books for the Rifts series, with about 80 books published between 1990 and 2011. Rifts Ultimate Edition was released in August 2005 and designed to update the game with Palladium's incremental changes to its system, changes in the game world, and additional information and character types. The web site is quick to point out that this is not a second edition but an improvement and expansion of the original role playing game. == Background == The RPG had the tentative title Boomers, named after the original name for the Glitter Boy power armor until Kevin Siembieda changed the name after finding out it was in use for Bubblegum Crisis. == Setting == The Rifts world is Earth, but hundreds of years into the future. Ley lines, lines of magic energy, criss-cross the earth forming supernatural geographic areas such as the Bermuda Triangle. Points where Ley Lines intersect, called a nexus, are places of powerful magic, such as the Pyramids of Giza and Stonehenge. If a Ley Line nexus energy surges or is purposely activated, the fabric of space and time can be torn, creating a rift - a hole in space-time leading to another place, time, or dimension. Ley lines contain magical energy called Potential Psychic Energy (PPE), which is found in various places, objects, and animals and is particularly strong in children. An adult's level of PPE can vary based on other factors. PPE also allows Psionics which uses energy known as Inner Strength Points or ISP. Psychic phenomenon (more commonly called psionics) can also vary from individuals, ranging from none at all to Master level abilities. Psychic abilities can manifest in virtually any way imaginable. Some psychics develop differently, such as psi-stalkers; human mutants that feed on psychic energy. === Earth === Rifts begins with two future-historical premises: first, a golden age of humanity occurs, with tremendous advances in science, technology, military, and society. Humanity as a whole is at peace as a majority of Earth's nations decide to cease world war and begin to share ideas and technology freely. Much of the Solar System is conquered, humanity's wars will end, and harmony will reign. This golden age is followed by an unknown cause near the winter solstice and a rare planetary alignment, causing a disaster that cascades into tremendous destruction via a ripple effect. The cataclysm begins with unprecedented storms, earthquakes, tsunamis, and volcanic eruptions, which kill millions of people. The Ley Line networks that crisscross the globe are energized, causing rifts to open both on Earth and throughout the Megaverse. For hundreds of years after the holocaust, many creatures, both mythical beasts and aliens, come through the Rifts to wreak havoc. The old world gone, a new Dark Age dawns and humanity's shrinking population is reduced, due to catastrophe and domestic failure, immeasurably. This period is covered in Palladium's Rifts Chaos Earth spin-off series. Rifts initially takes place in 101 P.A. (equivalent to the year 2387) 289 years after this event. The "Post-Apocalypse" calendar was established by the formation of the Coalition States in 2286. By this time, most of the disasters have quieted down, though Earth is still bathed in PPE. The planet's mystical energy has attracted aliens from other dimensions, who continue to arrive through the Rifts both accidentally and deliberately. The humanoid creatures that arrive on Earth are referred to as Dimensional Beings (called D-Bees). Some resemble familiar fantasy races, such as elves and dwarfs, while others were created specifically for the game setting. Non-humanoid creatures have also arrived, including monstrous creatures and mystical demons. To cope with these natural, supernatural, and alien menaces, the human race has adapted in a variety of ways, many of them borrowed from the technological developments of the lost Golden Age. Powered armor suits and giant vehicles are frequently used to combat the dangers of Rifts, but more invasive augmentation is common. This has three basic categories: "Juicers" augment themselves chemically, the "Borgs" augment themselves mechanically, and "Crazies" use performance-enhancing brain implants. All such augmentations boost strength, speed, endurance, and dexterity to superhuman levels. However, all come at great cost. Chemicals cause the body to wear out faster, decreasing life span to a few years. Mechanical Borg augmentation causes a loss of humanity when those with multiple limb and organ replacements become more machine than human. Brain implants cause mental instability ranging from mild phobias to crippling neurosis or psychosis. ==== North America ==== The strongest power in North America is the Coalition States (CS), which is based in the arcological city of Chi-Town and lays claim to northern Illinois, all of Iowa, the Texas Panhandle, Missouri, and the eastern half of Ontario, Canada. The second greatest power is Free Quebec, a former Coalition State that seceded following a civil war with the other Coalition States. Mexico is ruled by a group of vampire kingdoms, who treat humans as little more than food. North of the Rio Grande, west of Texas and roaming most of the American Southwest are large nomadic bands/tribes of bandits who collectively form the Pecos Empire, consisting of El Paso, Los Alamos, and Houstown. Much of the western United States has more or less willingly reverted to a mix of modern and past technology akin to the Wild West. The Royal Canadian Mounted Police managed to survive the great cataclysm, though Canada itself did not. The Mounties have become an independent law enforcement force called the Tundra Rangers, patrolling the northern wilderness. The Midwest, both upper and central, is home to most of North America's population. The Manistique Imperium and Northern Gun in Michigan's Upper Peninsula, both Coalition allies, are among the largest weapons manufacturing areas on the continent. New Lazlo is one of the largest cities in Michigan's southern portion. Chillicothe in Missouri is a large supplier of Coalition food processing and growing. Missouri's southern half, home to the city-states of Whykin (Poplar Bluff) and Kingsdale (West Plains) are in constant opposition to the CS and claim independence. Arkansas is home to the independent CS ally El Dorado. Southern Illinois and the Ohio Valley is home to the Federation of Magic. Also in the Ohio Valley is Psyscape, a city-state founded by psychics. Tolkeen was a major city in the former Minneapolis region in early Rifts books; the city welcomed users of magic. A military campaign made by the Coalition States (which is the primary event of 109 PA) resulted in the magic-user kingdom being wiped off the map. In the Northeast, the city-state of Lazlo, named after supernatural researcher and writer Victor Lazlo, was built upon the ruins of Toronto. This major center of civilization is well known as a melting pot of humans, D-Bees and other beings, and is the home of Techno-Wizardry. Mad Haven is the name given to the ruins of Manhattan; tectonic forces during the cataclysm have moved it into the coast, creating a peninsula. It is seen by most denizens of Rifts Earth as a refuge of demons and madness. ==== South America ==== The return of Atlantis caused the Amazon River basin to flood most of western South America, giving it the nickname The Land of a Thousand Islands. The Empire of the Sun, consisting of Cuzco, Nazca, Arequipa and Lima, created a wide range of technology and magic, including magic derived from the Nazca lines. In Argentina, the Silver River Republics of Cordoba (the South American Chi-Town), Santiago (one of the most tolerant human nations on Rifts Earth), Achilles (a nation founded by mutants), and New Babylon, a nation where humans and aliens coexist) have thrived and created nations whose strength rivals that of the CS. In Bolivia, freed Human and D-Bees formed the Megaversal Legion: a mercenary company with one of the highest levels of technology on Rifts Earth. ==== Europe ==== England has become a vast wilderness again, broken up by the occasional giant Millennium Tree or feudal kingdom, complete with a New Camelot and a new King Arthur, partially being manipulated by an alien intelligence disguised as Merlin. Also the magic of

Diagnostically acceptable irreversible compression

Diagnostically acceptable irreversible compression (DAIC) is the amount of lossy compression which can be used on a medical image to produce a result that does not prevent the reader from using the image to make a medical diagnosis. The term was first introduced at a workshop on irreversible compression convened by the European Society of Radiology (ESR) in Palma de Mallorca October 13, 2010, the results of which were reported in a subsequent position paper. == Determination == The "amount of compression" in irreversible compression used to be determined by the compression ratio, where the acceptable minimum is determined by the algorithm (typically JPEG or J2K) and the data type (body part and imaging method). Such a definition is easy to follow, and has been used by medical bodies in 2010 around the world. However, its downside is obvious: the compression ratio tells nothing about the real quality of the image, as different compressors can produce vastly different qualities under the same file size. For example, the JPEG format of 1992 can perform as well as many modern formats given newer techniques exploited in mozjpeg and ISO libjpeg, yet they would be lumped together with the legacy encoders in such a scheme. The image compression community has long used objective quality metrics like SSIM to measure the effects of compression. In the absence of good data regarding SSIM, the ESR review of 2010 concluded that it is still difficult to establish a criterion for whether a particular irreversible compression scheme applied with particular parameters to a particular individual image, or category of images, avoids the introduction of some quantifiable risk of a diagnostic error for any particular diagnostic task. A 2017 study showed that a SSIM variant called 4-G-r (4-component, gradient, structural component of SSIM) best reflects changes in images that affect the decision of radiologists out of 16 SSIM variants. A 2020 study shows that visual information fidelity (VIF), feature similarity index (FSIM), and noise quality metric (NQM) best reflect radiologist preferences out of ten metrics. It also mentions that the original version of SSIM works as poorly as a basic root-mean-square distance (RMSD) for this purpose, a result echoed by the 2017 study. The 4-G-r modification is not tested in the study.

Model collapse

Model collapse, also known by other names such as "AI inbreeding", "AI cannibalism", "Habsburg AI", and "model autophagy disorder" or "MAD" is a phenomenon noted in artificial intelligence studies, where machine learning models gradually degrade due to errors coming from uncurated synthetic data, or due to training on the outputs of another model such as prior versions of itself. It is unclear to what extent the phenomenon threatens the long-term development of such models, and some techniques have been proposed to mitigate the effect. == Characteristics == Shumailov et al. coined the term to describe two specific stages to the degradation of machine learning models: early model collapse and late model collapse: In early model collapse, the model begins losing information about the tails of the distribution – mostly affecting minority data. Later work highlighted that early model collapse is hard to notice, since overall performance may appear to improve, while the model loses performance on minority data. In late model collapse, the model loses a significant proportion of its performance, confusing concepts and losing most of its variance. == Mechanism == Using synthetic data as training data can lead to issues with the quality and reliability of the trained model. Model collapse occurs for three main reasons: functional approximation errors sampling errors learning errors Importantly, it happens in even the simplest of models, where not all of the error sources are present. In more complex models the errors often compound, leading to faster collapse. == Disagreement over real-world impact == Some researchers and commentators on model collapse warn that the phenomenon could fundamentally threaten future generative AI development: As AI-generated data is shared on the Internet, it will inevitably end up in future training datasets, which are often crawled from the Internet. If training on "slop" (large quantities of unlabeled synthetic data) inevitably leads to model collapse, this could therefore pose a difficult problem. However, recently, other researchers have disagreed with this argument, showing that if synthetic data accumulates alongside human-generated data, model collapse is avoided. The researchers argue that data accumulating over time is a more realistic description of reality than deleting all existing data every year, and that the real-world impact of model collapse may not be as catastrophic as feared. An alternative branch of the literature investigates the use of machine learning detectors and watermarking to identify model generated data and filter it out. == Mathematical models of the phenomenon == === 1D Gaussian model === In 2024, a first attempt has been made at illustrating collapse for the simplest possible model — a single dimensional normal distribution fit using unbiased estimators of mean and variance, computed on samples from the previous generation. To make this more precise, we say that original data follows a normal distribution X 0 ∼ N ( μ , σ 2 ) {\displaystyle X^{0}\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , and we possess M 0 {\displaystyle M_{0}} samples X j 0 {\displaystyle X_{j}^{0}} for j ∈ { 1 , … , M 0 } {\displaystyle j\in {\{\,1,\dots ,M_{0}\,{}\}}} . Denoting a general sample X j i {\displaystyle X_{j}^{i}} as sample j ∈ { 1 , … , M i } {\displaystyle j\in {\{\,1,\dots ,M_{i}\,{}\}}} at generation i {\displaystyle i} , then the next generation model is estimated using the sample mean and variance: μ i + 1 = 1 M i ∑ j X j i ; σ i + 1 2 = 1 M i − 1 ∑ j ( X j i − μ i + 1 ) 2 . {\displaystyle \mu _{i+1}={\frac {1}{M_{i}}}\sum _{j}X_{j}^{i};\quad \sigma _{i+1}^{2}={\frac {1}{M_{i}-1}}\sum _{j}(X_{j}^{i}-\mu _{i+1})^{2}.} Leading to a conditionally normal next generation model X j i + 1 | μ i + 1 , σ i + 1 ∼ N ( μ i + 1 , σ i + 1 2 ) {\displaystyle X_{j}^{i+1}|\mu _{i+1},\;\sigma _{i+1}\sim {\mathcal {N}}(\mu _{i+1},\sigma _{i+1}^{2})} . In theory, this is enough to calculate the full distribution of X j i {\displaystyle X_{j}^{i}} . However, even after the first generation, the full distribution is no longer normal: It follows a variance-gamma distribution. To continue the analysis, instead of writing the probability density function at each generation, it is possible to explicitly construct them in terms of independent random variables using Cochran's theorem. To be precise, μ 1 {\displaystyle \mu _{1}} and σ 1 {\displaystyle \sigma _{1}} are independent, with μ 1 ∼ N ( μ , σ 2 M 0 ) {\displaystyle \mu _{1}\sim {\mathcal {N}}\left(\mu ,{\frac {\sigma ^{2}}{M_{0}}}\right)} and ( M 0 − 1 ) σ 1 2 ∼ σ 2 Γ ( M 0 − 1 2 , 1 2 ) {\displaystyle (M_{0}-1)\,\sigma _{1}^{2}\sim \sigma ^{2}\,\Gamma \left({\frac {M_{0}-1}{2}},{\frac {1}{2}}\right)} , following a Gamma distribution. Denoting with Z {\displaystyle Z} Gaussian random variables distributed according to N ( 0 , 1 ) {\displaystyle {\mathcal {N}}(0,1)} and with S i {\displaystyle S^{i}} random variables distributed with 1 M i − 1 − 1 Γ ( M i − 1 − 1 2 , 1 2 ) {\displaystyle {\frac {1}{M_{i-1}-1}}\Gamma \left({\frac {M_{i-1}-1}{2}},{\frac {1}{2}}\right)} , it turns out to be possible to write samples at each generation as X j 0 = μ + σ Z j 0 , {\textstyle X_{j}^{0}=\mu +\sigma Z_{j}^{0},} X j 1 = μ + σ M 0 Z 1 + σ S 1 Z j 1 , {\textstyle X_{j}^{1}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+\sigma {\sqrt {S^{1}}}Z_{j}^{1},} and more generally X j n = μ + σ M 0 Z 1 + σ M 1 S 1 Z 2 + ⋯ + σ M n − 1 S 1 × ⋯ × S n − 1 Z n + σ S 1 × ⋯ × S n Z j n . {\displaystyle X_{j}^{n}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+{\frac {\sigma }{\sqrt {M_{1}}}}{\sqrt {S^{1}}}Z^{2}+\dots +{\frac {\sigma }{\sqrt {M_{n-1}}}}{\sqrt {S^{1}\times \dots \times S^{n-1}}}Z^{n}+\sigma {\sqrt {S^{1}\times \dots \times S^{n}}}Z_{j}^{n}.} Note, that these are not joint distributions, as Z n {\displaystyle Z^{n}} and S n {\displaystyle S^{n}} depend directly on Z j n − 1 {\displaystyle Z_{j}^{n-1}} , but when considering X j n {\displaystyle X_{j}^{n}} on its own the formula above provides all the information about the full distribution. To analyse the model collapse, we can first calculate variance and mean of samples at generation n {\displaystyle n} . This would tell us what kind of distributions we expect to arrive at after n {\displaystyle n} generations. It is possible to find its exact value in closed form, but the mean and variance of the square root of gamma distribution are expressed in terms of gamma functions, making the result quite clunky. Following, it is possible to expand all results to second order in each of 1 / M i {\displaystyle 1/M_{i}} , assuming each sample size to be large. It is then possible to show that 1 σ 2 Var ⁡ ( X j n ) = 1 M 0 + 1 M 1 + ⋯ + 1 M n − 1 + 1 + O ( M i − 2 ) . {\displaystyle {\frac {1}{\sigma ^{2}}}\operatorname {Var} (X_{j}^{n})={\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n-1}}}+1+{\mathcal {O}}\left(M_{i}^{-2}\right).} And if all sample sizes M i = M {\displaystyle M_{i}=M} are constant, this diverges linearly as n → ∞ {\displaystyle n\to \infty } : Var ⁡ ( X j n ) = σ 2 ( 1 + n M ) ; E ( X j n ) = μ . {\displaystyle \operatorname {Var} (X_{j}^{n})=\sigma ^{2}\left(1+{\frac {n}{M}}\right);\quad \mathbb {E} (X_{j}^{n})=\mu .} This is the same scaling as for a single dimensional Gaussian random walk. However, divergence of the variance of X j n {\displaystyle X_{j}^{n}} does not directly provide any information about the corresponding estimates of μ n + 1 {\displaystyle \mu _{n+1}} and σ n + 1 {\displaystyle \sigma _{n+1}} , particularly how different they are from the original μ {\displaystyle \mu } and σ {\displaystyle \sigma } . It turns out to be possible to calculate the distance between the true distribution and the approximated distribution at step n + 1 {\displaystyle n+1} , using the Wasserstein-2 distance (which is also sometimes referred to as risk): E [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 3 2 σ 2 ( 1 M 0 + 1 M 1 + ⋯ + 1 M n ) + O ( M i − 2 ) , {\displaystyle \mathbb {E} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {3}{2}}\sigma ^{2}\left({\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n}}}\right)+{\mathcal {O}}\left(M_{i}^{-2}\right),} Var ⁡ [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 1 2 σ 4 ( 3 M 0 2 + 3 M 1 2 + ⋯ + 3 M n 2 + ∑ i ≠ j 4 M i M j ) + O ( M i − 3 ) . {\displaystyle \operatorname {Var} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {1}{2}}\sigma ^{4}\left({\frac {3}{M_{0}^{2}}}+{\frac {3}{M_{1}^{2}}}+\dots +{\frac {3}{M_{n}^{2}}}+\sum _{i\neq j}{\frac {4}{M_{i}M_{j}}}\right)+{\mathcal {O}}\left(M_{i}^{-3}\right).} This directly shows why model collapse occurs in this simple model. Due to errors from re-sampling the approximated distribution, each generation ends up corresponding to a