VibeCrypto — Veille crypto

TAO Daily26 juin, 19h · il y a 4j

Score (SN44) entre dans la course des Vision Language Models avec Satori 1.0

Le subnet Score passe de la vision par ordinateur classique à un modèle de langage visuel compact qui tourne directement sur la caméra.

Score (SN44), réseau décentralisé où des mineurs s'affrontent pour développer les meilleures compétences de vision par ordinateur (CCTV, dashcams, drones), étend son périmètre. Le subnet lance désormais son propre Vision Language Model, Satori 1.0 2B, disponible sur Manako. Ce modèle compact fusionne neuf primitives de perception en un seul checkpoint conçu pour fonctionner directement sur l'appareil, sans data center.

Satori 1.0 couvre la détection, la segmentation, la description en langage naturel, le raisonnement, l'OCR, le comptage, la reconnaissance d'actions et la compréhension temporelle vidéo. La compétition entre mineurs reste active et alimente directement l'amélioration de chaque génération du modèle.

Source ↗

Détails

Source: TAO Daily
Publication: 26 juin à 19h00
Lien direct: https://taodaily.io/score-enters-the-vision-language-model-race-with-satori-1-0/

Contenu source (brut)

<div id="bsf_rt_marker"></div><div class='booster-block booster-read-block'> <div class="twp-read-time"> <i class="booster-icon twp-clock"></i> <span>Read Time:</span>3 Minute, 41 Second </div> </div> <p class="wp-block-paragraph"><a href="https://taodaily.io/8-takeaways-from-crypto-millies-interview-with-max-sebti-of-score-sn44/">Score (SN44)</a> has spent its first year as a decentralized network where miners compete to build the best computer vision skills across CCTV, dashcams, and drones.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="511" src="https://taodaily.io/wp-content/uploads/2026/06/image-196-1024x511.png" alt="" class="wp-image-20868" srcset="https://taodaily.io/wp-content/uploads/2026/06/image-196-1024x511.png 1024w, https://taodaily.io/wp-content/uploads/2026/06/image-196-300x150.png 300w, https://taodaily.io/wp-content/uploads/2026/06/image-196-768x383.png 768w, https://taodaily.io/wp-content/uploads/2026/06/image-196.png 1866w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><a href="https://www.wearescore.com/">Score’s Vision</a></figcaption></figure> <p class="wp-block-paragraph">That scope just expanded, and the subnet is now training its own Vision Language Model, Satori 1.0 2B, available on <a href="https://taodaily.io/taoweave-just-moved-into-physical-ai-through-manako-labs/">Manako</a>, which folds all nine perception primitives into a single compact model that runs on the camera itself rather than in a data center.</p> <p class="wp-block-paragraph">The skill competition stays live and feeds directly back into making each Satori generation sharper than the last.</p> <div id="ez-toc-container" class="ez-toc-v2_0_76 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction"> <div class="ez-toc-title-container"> <p class="ez-toc-title" style="cursor:inherit">Table of Contents</p> <span class="ez-toc-title-toggle"><a href="#" class="ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle" aria-label="Toggle Table of Content"><span class="ez-toc-js-icon-con"><span class=""><span class="eztoc-hide" style="display:none;">Toggle</span><span class="ez-toc-icon-toggle-span"><svg style="fill: #999;color:#999" xmlns="http://www.w3.org/2000/svg" class="list-377408" width="20px" height="20px" viewBox="0 0 24 24" fill="none"><path d="M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z" fill="currentColor"></path></svg><svg style="fill: #999;color:#999" class="arrow-unsorted-368013" xmlns="http://www.w3.org/2000/svg" width="10px" height="10px" viewBox="0 0 24 24" version="1.2" baseProfile="tiny"><path d="M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z"/></svg></span></span></span></a></span></div> <nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class="ez-toc-link ez-toc-heading-1" href="https://taodaily.io/score-enters-the-vision-language-model-race-with-satori-1-0/#What_a_VLM_Is" >What a VLM Is</a></li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class="ez-toc-link ez-toc-heading-2" href="https://taodaily.io/score-enters-the-vision-language-model-race-with-satori-1-0/#What_Satori_10_2B_Covers" >What Satori 1.0 2B Covers</a></li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class="ez-toc-link ez-toc-heading-3" href="https://taodaily.io/score-enters-the-vision-language-model-race-with-satori-1-0/#The_Family_and_the_Flywheel" >The Family and the Flywheel</a></li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class="ez-toc-link ez-toc-heading-4" href="https://taodaily.io/score-enters-the-vision-language-model-race-with-satori-1-0/#The_Altitude_Just_Changed" >The Altitude Just Changed</a></li></ul></nav></div> <h2 class="wp-block-heading"><span class="ez-toc-section" id="What_a_VLM_Is"></span>What a VLM Is<span class="ez-toc-section-end"></span></h2> <p class="wp-block-paragraph">The simplest way to understand Satori is to start with what a Vision Language Model is and how it differs from a Large Language Model.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="502" src="https://taodaily.io/wp-content/uploads/2026/06/image-195-1024x502.png" alt="" class="wp-image-20867" srcset="https://taodaily.io/wp-content/uploads/2026/06/image-195-1024x502.png 1024w, https://taodaily.io/wp-content/uploads/2026/06/image-195-300x147.png 300w, https://taodaily.io/wp-content/uploads/2026/06/image-195-768x377.png 768w, https://taodaily.io/wp-content/uploads/2026/06/image-195.png 1276w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><a href="https://www.couchbase.com/blog/vision-language-models/" class="broken_link">Couchbase: Review of Language Model</a></figcaption></figure> <p class="wp-block-paragraph">1. <strong>An LLM has read but never seen.</strong> It reasons fluently across text but cannot interpret an image on its own.</p> <p class="wp-block-paragraph">2. <strong>A VLM gives that brain eyes.</strong> It takes pixels and words together, so you can point it at any scene and ask in plain language what is there and what is happening.</p> <p class="wp-block-paragraph">A VLM is not a vision skill stitched to a chatbot. It is one model that understands the full picture natively.</p> <h2 class="wp-block-heading"><span class="ez-toc-section" id="What_Satori_10_2B_Covers"></span>What Satori 1.0 2B Covers<span class="ez-toc-section-end"></span></h2> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="584" src="https://taodaily.io/wp-content/uploads/2026/06/image-197-1024x584.png" alt="" class="wp-image-20869" srcset="https://taodaily.io/wp-content/uploads/2026/06/image-197-1024x584.png 1024w, https://taodaily.io/wp-content/uploads/2026/06/image-197-300x171.png 300w, https://taodaily.io/wp-content/uploads/2026/06/image-197-768x438.png 768w, https://taodaily.io/wp-content/uploads/2026/06/image-197.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><a href="https://x.com/MaxScore/status/2069506482341552619/photo/1">Satori’s Dashboard</a></figcaption></figure> <p class="wp-block-paragraph">Satori is built through distillation, which compresses what the best frontier models know into one compact checkpoint. The result runs nine perception primitives in a single forward pass with no teacher needed at inference time.</p> <figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>Primitive</strong></td><td><strong>What It Does</strong></td></tr><tr><td>Detection</td><td>Identifying objects in a frame</td></tr><tr><td>Open-vocabulary detect-by-name</td><td>Finding objects by free-form description</td></tr><tr><td>Segmentation</td><td>Pixel-level masks for objects</td></tr><tr><td>Description</td><td>Generating natural-language descriptions of scenes</td></tr><tr><td>Reasoning</td><td>Answering questions about what is happening</td></tr><tr><td>OCR (Optical Character Recognition)</td><td>Reading text inside images</td></tr><tr><td>Counting</td><td>Quantifying instances of objects</td></tr><tr><td>Action</td><td>Recognizing what subjects are doing</td></tr><tr><td>Temporal video</td><td>Understanding sequence and change across frames (still emerging)</td></tr></tbody></table></figure> <p class="wp-block-paragraph">Coverage is the point. Strong vision specialists top one or two benchmarks but cover only four or five jobs total. Even GPT-4o natively covers two. Satori covers all nine, posts the highest TextVQA score in its peer set at 83, and handles action