본문 바로가기
IT & Tech 정보

Controversy Over Upstage’s Solar Open 100B and Chinese GLM-4.5 AI Model

by 지식과 지혜의 나무 2026. 1. 5.
반응형


Background

Upstage, a South Korean AI startup, is a participant in the government-funded Independent AI Foundation Model Project, which mandates that models be developed “from scratch” without using foreign pre-trained weights . In late 2025, Upstage unveiled Solar Open 100B, a 100-billion-parameter large language model (LLM) intended as a homegrown foundation model. Around the same time, China’s Zhipu AI released GLM-4.5-Air, an open-source AI model of comparable scale. GLM-4.5-Air is a Mixture-of-Experts (MoE) model with 106 billion total parameters (about 12B active at a time), designed for high performance at a 100B-scale while remaining efficient  . The emergence of Solar Open 100B quickly invited comparisons to GLM-4.5-Air, given their similar parameter counts and contemporary release. Any substantial overlap between the two models would be problematic for Upstage, as the national project explicitly prohibits models derived from overseas systems  in order to ensure true domestic innovation.

Allegations of Similarity

In late 2025, Seok-hyun Ko, CEO of a rival startup (Sionic AI), sparked controversy by alleging that Upstage’s Solar Open 100B was essentially built on a Chinese model rather than from scratch  . Ko published a technical analysis (via GitHub and social media) claiming that Solar Open 100B shows “striking structural similarities” to Zhipu’s GLM-4.5-Air . Several key points of evidence were raised:
• LayerNorm Weight Similarity: Ko reported that certain normalization layers in Solar Open 100B exhibited an extremely high overlap with those in GLM-4.5-Air. In particular, the layer normalization parameters were said to have a 96.8% cosine similarity (nearly identical directionally) between Solar and GLM-4.5 . This unusually high similarity was presented as a red flag, since such alignment would be unlikely if Solar had been trained entirely independently. Ko described this selective parameter match as “decisive evidence of derivation,” suggesting Solar’s weights were copied and then only slightly modified .
• Token Embeddings & Vocabulary: The analysis also pointed to Solar’s token embeddings (the learned vector representations of its vocabulary) as being nearly identical to those of GLM-4.5-Air . This implies that a large portion of the two models’ vocabularies – or at least the distribution of token vector values – might overlap, hinting that Solar could be using the same tokenizer or even the same embedding weights as the Chinese model. The mere presence of many Chinese tokens in Solar’s vocabulary was viewed with suspicion by some observers, given Solar was promoted as a Korean-led model. Overall, the overlapping vocabulary and similar embedding patterns were taken as additional clues that Solar Open 100B may not be an entirely original model.
• MoE Architecture Structure: Both Solar Open 100B and GLM-4.5 employ large-scale transformer architectures, but Ko argued the structural configuration of Solar closely mirrored that of GLM-4.5-Air . GLM-4.5 is built with a Mixture-of-Experts design (multiple expert sub-networks in certain layers), which is not a common choice in all LLMs. The allegation suggested that Solar’s developers may have reused GLM’s unique architectural choices – such as the number and placement of MoE (or “dense”) layers – rather than designing their own from the ground up. This raised the question of whether Solar’s architecture was intentionally made to match GLM’s as a shortcut.
• Code and Metadata Clues: Beyond weight comparisons, Ko highlighted that Solar’s code repository contained references to Zhipu AI. For example, configuration files in Solar’s code included parameters and notations typical of GLM models, and the open-source license file even listed Zhipu AI in the copyright notice . These artifacts suggested that Upstage may have started from GLM’s publicly released code or model as a base. While using open-source code is legally permitted (GLM-4.5 was released under an MIT/Apache license), Ko questioned whether leveraging a foreign model’s codebase aligns with the spirit of a “sovereign AI” project meant to foster independent development .

Ko’s public post expressing “regret that a model presumed to be a fine-tuned copy of a Chinese model was submitted for a taxpayer-funded project” quickly gained attention . If proven true, the implications for Upstage would be severe: the company could be disqualified from the national AI initiative for violating the from-scratch requirement . The controversy soon spread across tech communities and social media, prompting intense debate over the validity of these claims.

Technical Similarities Under Scrutiny

To assess the degree of overlap between Solar Open 100B and GLM-4.5-Air, experts and the companies involved examined several technical aspects in detail. The key areas of comparison were the vocabulary/tokenizer, the LayerNorm parameters, and the model architecture (MoE structure). Below we summarize findings for each aspect, along with interpretations of whether the similarities indicate true copying or coincidental alignment.
• Vocabulary & Tokenization: Despite speculation about shared vocabularies, data shows that Solar Open 100B and GLM-4.5 use notably different token sets. Solar Open 100B’s tokenizer encompasses roughly 196,000 tokens, whereas GLM-4.5’s vocabulary is about 150,000 tokens in size . An analysis of the two vocabularies found only about 41% of tokens in common, meaning less than half of Solar’s tokens also appear in GLM’s lexicon . This overlap rate is relatively low – by comparison, models that intentionally build on the same tokenizer (e.g. one model fine-tuned from another) often share well over 70% of tokens . The modest overlap suggests that Upstage developed a tokenizer independently (likely to accommodate Korean, English, and other languages relevant to their training data) rather than directly reusing GLM-4.5’s exact vocabulary. While Solar’s and GLM’s token sets may both include common Chinese and English words (since both models are multilingual), the evidence points to different tokenization strategies. In short, the vocabulary similarity exists but is limited, which leans toward independent design rather than wholesale copying of GLM’s tokenizer. Ko’s claim that Solar’s token embeddings “appeared nearly identical” to GLM’s might be explained by both models learning similar semantic spaces for shared words, or by coincidental distributional resemblance, rather than literal reuse of embedding weights . No direct proof of copied embedding vectors has been demonstrated publicly, and the significant differences in vocab size and content undermine the case for a fully shared vocabulary.
• LayerNorm Weight Similarity: The most headline-grabbing claim was that Solar Open 100B’s LayerNorm parameters are 96.8% similar to those in GLM-4.5. This figure refers to a cosine similarity of 0.968 between the LayerNorm weight vectors of the two models . LayerNorm (layer normalization) layers each have a pair of learned parameters (gain γ and bias β vectors) that scale and shift the neuron activations. In a 100B-parameter model, these LayerNorm vectors account for only a microscopic portion (~0.00004%) of the total weights . Importantly, high cosine similarity in such vectors does not necessarily mean the actual values are the same – it only indicates the two vectors point in nearly the same direction in parameter space. Upstage and independent experts have cautioned that this metric is misleading on its own  . Because LayerNorm’s function is to normalize distributions, different models often converge on similarly oriented normalization vectors even when trained separately . In fact, Upstage demonstrated that Solar Open 100B’s LayerNorm vectors have comparably high cosine similarity (>0.98) with those of many other models like Meta’s LLaMA and Microsoft’s Phi series . If one naively took cosine similarity as proof of copying, “nearly every modern LLM would be implicated,” Upstage noted pointedly . To dig deeper, Upstage re-ran the comparison using Pearson correlation, which measures actual value alignment (not just direction). The result was a Pearson coefficient around -0.01, essentially zero correlation, between Solar’s and GLM’s LayerNorm weights . In other words, when looking at the numerical values, one model’s LayerNorm parameters do not predict the other’s at all – an outcome incompatible with one being a fine-tuned copy of the other. They also found the distribution of values differed: Solar’s LayerNorm gains clustered around 0.3, whereas GLM’s concentrated near 0.9–1.0 . This indicates that, despite the high cosine alignment (possibly due to similar training objectives or normalization behavior), the specific weight values are distinct . Most experts thus view the LayerNorm overlap as a curious statistical artifact rather than hard evidence of plagiarism. It illustrates how two large models can independently learn analogous normalization schemes, especially if they share high-level architecture and are trained on vast, similarly diverse data.
• Model Architecture (MoE Structure): Given that GLM-4.5-Air introduced a novel Mixture-of-Experts architecture, an important question was whether Solar Open 100B adopted the same architectural structure. Any exact match in hidden-layer configuration or MoE setup would be suspicious if not coincidental. Here, however, Upstage has highlighted clear architectural differences. Notably, Solar Open 100B does not include the “dense layers” (expert layers) that are found in GLM’s MoE design – in Solar’s training configuration, the MoE component was explicitly set to 0 (disabled)  . In contrast, GLM-4.5-Air’s architecture uses multiple expert sub-networks in certain transformer layers (accounting for its 106B total parameters with 12B active) . This means Solar was implemented as a fully dense transformer model without MoE, diverging from GLM’s approach. Furthermore, Solar Open 100B has 48 transformer layers, whereas GLM-4.5-Air uses 46 layers . There may be other differences in size of hidden dimensions or feed-forward networks not publicly detailed, but these known discrepancies already indicate the blueprints aren’t identical. Upstage’s CEO Kim Sung-hoon acknowledged that Solar’s overall architecture was influenced by common industry practices and the need to be compatible with standard AI frameworks (for example, using transformer backbone patterns that work with HuggingFace and other open-source tooling) . He argued that a certain degree of structural similarity is “unavoidable when designing models to work smoothly with widely used open source LLM serving tools,” and that this kind of similarity is about usability and scalability, not plagiarism . In summary, while Solar and GLM are the same general class of model (large decoder-only transformers) and thus share some high-level traits, Solar’s developers did not simply clone GLM’s unique MoE structure. The absence of GLM’s dense expert layers in Solar suggests independent architectural decisions, undermining claims that Solar must have been built by fine-tuning GLM’s exact model code.

Expert Opinions on Replication vs. Coincidence

The debate over Solar Open 100B has drawn a range of opinions from AI researchers and industry experts. While the initial allegations sounded alarming, many experts urge a cautious, evidence-based interpretation of the similarity metrics. LayerNorm parameters, for instance, are considered a very limited and potentially misleading basis for accusing model copying. As one specialist noted, these are “relatively simple auxiliary values” and different models trained on similar objectives can end up with statistically alike norms “even without direct reuse of weights.” . In other words, high cosine similarity in tiny subsets of weights can occur by chance or due to convergent learning patterns, so it does not automatically prove one model is a clone. Experts emphasize that to truly establish if one model was derived from another, one must look at a broader array of evidence: overall weight distributions, attention mechanism patterns (Q,K,V matrices), training loss curves, and ideally the entire training trajectory . A single metric or isolated layer comparison is not enough.

Seunghyun Lee, Vice President of AI at FortyTwoMaru (a Korean AI firm), commented that based on the information available publicly, “it is difficult to conclude that Upstage reused a Chinese model.”  He and others point out that licensing and code reuse issues should be conceptually separated from the question of weight origin. Using open-source code (with proper attribution) is legal and common in AI development; it doesn’t equate to copying the actual model weights, which is what the project rules forbid. The consensus among many neutral observers is that there’s no smoking gun yet – the similarities noted can plausibly be explained by coincidence or standard practices in model building, rather than an illicit fine-tuning of GLM-4.5.

That said, Ko Seok-hyun (the original accuser) has not fully recanted his concerns. He later clarified that his intent was to highlight the spirit of the sovereign AI initiative: even if Upstage’s model weights were not literally taken from a Chinese system, heavy reliance on a foreign open-source architecture or codebase might still be against the project’s goals . In his view, a truly “independent from the ground up” model should minimize borrowing from external blueprints, especially from potential competitor nations. This perspective has broadened the controversy into a policy discussion: What does “homegrown AI” really mean? Is it acceptable to base your model on open-source designs from abroad as long as you train from scratch, or must every component be original? These questions don’t have easy answers, and the Upstage case has become a catalyst for redefining standards in Korea’s AI effort .

Observers also note that this incident will likely put pressure on all five teams in the national AI project (which include major players like Naver, LG, SK Telecom, etc.) to increase transparency . Going forward, companies may be expected to document their training process in detail, share intermediate checkpoints, and prove the provenance of their models to avoid similar controversies. In summary, the expert community by and large has not condemned Upstage – instead, they see this as a valuable test case underscoring the importance of robust verification methods before drawing conclusions about model plagiarism.

Upstage’s Response and Verification Efforts

Figure: Upstage released training metrics from Solar Open 100B’s development to substantiate that it was trained from scratch. This chart shows Solar’s MMLU (multi-task accuracy) performance over the course of training, rising steadily from near-random baseline (~0.25) at the start to a high of ~0.77 after long-duration training . The gradual curve indicates the model learned incrementally over 9 trillion tokens, which is expected when training from random initialization. In contrast, a model that was merely fine-tuned from a pre-trained 100B model would typically begin with a much higher starting performance and not exhibit such a slow, continuous improvement . This evidence supports Upstage’s claim that Solar Open 100B was not “jump-started” by any existing weights, but built up its knowledge from scratch through sustained training.

From the moment the allegations surfaced, Upstage firmly denied any plagiarism or model “theft.” CEO Kim Sung-hoon stated unequivocally that “Solar Open 100B was built entirely from scratch and not copied or fine-tuned from any existing foreign model.”   On January 2, 2026, Upstage took the unusual step of holding a public verification session to address the claims head-on  . Kim invited Ko Seok-hyun and other industry experts to attend in person, and the event was livestreamed on YouTube for transparency  . Rather than just telling everyone to trust their word, Upstage opted to “open its development process to public scrutiny.” They presented actual training artifacts – including intermediate model checkpoints and detailed Weights & Biases training logs – so that independent observers could examine the evidence directly  . These logs recorded metrics throughout Solar’s training run, providing a timestamped “training trajectory” that would be very hard to fabricate after the fact.

During the session, Upstage’s engineers showed Solar’s early training loss curve, which had the signature shape of a model starting from random weights: an initially very high loss that rapidly drops in the first few hundred steps, then slowly trends downwards over a long period  . Such a curve is “a classic learning pattern of randomly initialized models,” not one you’d see if the model began from a pre-trained state . They also shared the evolution of Solar’s performance on knowledge benchmarks (like MMLU) over time, which, as shown above, began near chance and improved gradually and consistently . This is in stark contrast to a fine-tuning scenario, where a model would start off already performing well (thanks to prior knowledge) and only gain modest improvements. According to Kim, these pieces of evidence are “the clearest indicator of a from-scratch model,” since no pre-trained model was underlying Solar Open 100B’s initial state . By disclosing the full training curve and stats, Upstage essentially challenged skeptics to explain how such behavior could arise if Solar were merely a tweaked GLM-4.5 – a scenario that would likely have shown a different training profile.

Upstage also directly addressed the specific similarities that had been questioned. On the LayerNorm issue, Kim reiterated that focusing on one small metric was “not sound” and provided context by comparing against multiple models (as noted earlier, Solar’s LayerNorm cosine similarity with models like LLaMA was in the same range as with GLM)  . The configuration files referencing Zhipu AI were explained as well: Upstage clarified that some of their inference code was adapted from open-source repositories (including Zhipu’s Hugging Face implementation) – hence names like “zhipu” appeared in comments or variable names . Crucially, this was runtime code, not the model weights, and was properly attributed under an Apache 2.0 open-source license . Upstage acknowledged that the presence of those references caused understandable confusion, but emphasized it was an attribution/licensing matter rather than any hidden dependency on a Chinese model . They even updated the code license attributions to make it clearer which parts came from external open sources . In summary, Upstage’s response was one of full transparency and data-driven refutation: they didn’t merely assert innocence, but opened up logs and technical details for third-party validation.

The South Korean government, via the Ministry of Science and ICT, took note of the dispute but remained neutral pending verification. An official from the ministry stated they were “aware that various opinions are being presented” in the community and would observe the public verification session, leaving it to experts to evaluate the evidence afterwards . This suggests that the authorities would rely on the judgment of technical evaluators to decide if any rules were broken. As of the latest updates, there has been no announcement of Upstage being disqualified or penalized; instead, the emphasis has shifted to how well Upstage could substantiate their claims and whether any doubt remains.

Conclusion and Ongoing Developments

The Solar Open 100B controversy has evolved from a narrow technical quarrel into a broader discussion about AI model transparency and sovereignty. On the specific question of whether Upstage’s model is a repackaged Chinese AI model, the evidence presented so far leans heavily against the notion of direct plagiarism. Upstage’s thorough reveal of training records and the analyses by independent reviewers have shown no concrete proof that Solar Open 100B’s weights were copied from GLM-4.5-Air  . The similarities in LayerNorm and certain structures appear to be explainable by common design principles and coincidence, rather than illicit reuse. Expert consensus is that nothing disclosed publicly indicates a secret model swap or fine-tune; rather, Solar Open 100B seems to be an original model that just happened to converge on some comparable features due to similar goals and constraints in training.

However, this incident has underscored the importance of clearly defining “from scratch” development and ensuring public trust in nationally funded AI projects  . It is no longer enough for a team to simply claim their model is independent – they may need to prove it with artifacts (like training checkpoints, logs, and configuration transparency). The Korean AI community is already calling this a “turning point” that will likely lead to new standards for verification in such projects . We may see project organizers requiring participants to submit detailed training documentation or even intermediate model dumps going forward, to allow forensic checks for any sign of pre-trained initialization. In the words of one industry insider, this case made it clear that “defining and verifying what counts as a truly independent national AI model is now a practical challenge… not just a theoretical question.” 

Upstage’s handling of the controversy – by inviting scrutiny and publicly demonstrating their process – has been cited as a positive example of transparency . It set a precedent that when serious questions arise about a model’s originality, opening the books and letting the data speak is the best course. The company’s reputation and its stake in the national project rode on convincing skeptics, and by most accounts, their robust defense succeeded in shifting opinions. Even Ko Seok-hyun expressed regret at how the situation blew up and engaged with the evidence presented (reports indicate he later refuted some of the wilder allegations and focused on the policy discussion instead ). No official verdict has been declared in public as of now, but it appears that Upstage has managed to address the core technical doubts.

In conclusion, while Solar Open 100B and GLM-4.5-Air do share some surface similarities in vocabulary, normalization, and architecture, the current analysis suggests these are likely due to common techniques and not because one simply copied the other’s weights. The controversy has ultimately served to highlight an emerging norm: AI developers, especially those in high-stakes “sovereign AI” programs, must be ready to verify their claims with hard evidence. Going forward, the legacy of this episode may be a more rigorous verification regime and a clearer understanding of how to balance using open-source advances with building truly independent AI systems  .

Sources: Recent news analysis and official statements were used in this report, including Seoul Economic Daily  , Korea Metaverse Journal (KMJ) technical reviews  , and statements from Upstage’s CEO . These sources provide detailed comparisons between Solar Open 100B and GLM-4.5, expert commentary on the findings , and documentation of Upstage’s public rebuttal and evidence  . The information reflects the state of the debate as of January 2026, illustrating both sides of the argument and the steps taken to verify the model’s originality.

반응형