<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Datadef Blog]]></title><description><![CDATA[Datadef Blog]]></description><link>https://blog.datadef.io</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1757448822881/6fc3609a-765c-447c-82d0-fb75e6e891ba.png</url><title>Datadef Blog</title><link>https://blog.datadef.io</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 24 Apr 2026 11:31:09 GMT</lastBuildDate><atom:link href="https://blog.datadef.io/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Building a ChatGPT App with Next.js 16.1and MCP]]></title><description><![CDATA[I spent the last few days building a ChatGPT App for Datadef. The Model Context Protocol (MCP) is powerful—it basically lets you turn ChatGPT into a host for your own interactive apps—but the documentation is still a bit scattered.
This is a breakdow...]]></description><link>https://blog.datadef.io/building-a-chatgpt-app-with-nextjs-161and-mcp</link><guid isPermaLink="true">https://blog.datadef.io/building-a-chatgpt-app-with-nextjs-161and-mcp</guid><category><![CDATA[app development]]></category><category><![CDATA[chatgpt]]></category><category><![CDATA[sdk]]></category><category><![CDATA[mcp]]></category><category><![CDATA[dataengineering]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[openai]]></category><category><![CDATA[dns]]></category><category><![CDATA[Next.js]]></category><category><![CDATA[Vercel]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Théophile Louvart]]></dc:creator><pubDate>Mon, 29 Dec 2025 17:32:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/R_W_9D-53lw/upload/62fe906e75db94ac2544f61da9ce8d15.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I spent the last few days building a ChatGPT App for <a target="_blank" href="https://datadef.io">Datadef</a>. The Model Context Protocol (MCP) is powerful—it basically lets you turn ChatGPT into a host for your own interactive apps—but the documentation is still a bit scattered.</p>
<p>This is a breakdown of what I learned while getting it to work, from the initial setup to the weird debugging hurdles you'll likely hit.</p>
<hr />
<h2 id="heading-core-resources">Core Resources</h2>
<p>These are the docs I kept open the whole time. They’re the best source of truth right now:</p>
<ul>
<li><p><a target="_blank" href="https://developers.openai.com/apps-sdk/quickstart/"><strong>OpenAI Apps SDK Quickstart</strong></a>: Start here.</p>
</li>
<li><p><a target="_blank" href="https://developers.openai.com/apps-sdk/build/mcp-server"><strong>Building an MCP Server</strong></a>: The technical details of the protocol.</p>
</li>
<li><p><a target="_blank" href="https://developers.openai.com/apps-sdk/app-submission-guidelines"><strong>App Submission Guidelines</strong></a>: Good to read early so you don't have to rewrite things later.</p>
</li>
</ul>
<hr />
<h2 id="heading-how-it-works-the-architecture">How it works: The Architecture</h2>
<p>There are two main parts to a ChatGPT App:</p>
<ol>
<li><p><strong>Tools:</strong> These are the actions ChatGPT can take. When a user asks for something, ChatGPT sends a JSON request to your <code>/mcp</code> endpoint.</p>
</li>
<li><p><strong>Resources:</strong> This is the UI. Your tool returns a URI, and ChatGPT fetches the HTML to render it in a sandbox.</p>
</li>
</ol>
<p><strong>The Flow:</strong></p>
<ul>
<li><p><strong>User prompt</strong> triggers a tool call.</p>
</li>
<li><p><strong>Your server</strong> runs the handler and returns <code>structuredContent</code> (data for the widget) and <code>_meta</code>.</p>
</li>
<li><p><strong>ChatGPT loads the HTML</strong> (served as <code>text/html+skybridge</code>).</p>
</li>
<li><p><strong>The Widget Runtime</strong> passes the data into your code via <code>window.openai</code>.</p>
</li>
</ul>
<p><strong>The Iframe Approach:</strong> The ChatGPT sandbox is very restrictive. To get around this, I serve a tiny HTML wrapper that just loads my "real" app inside an <code>&lt;iframe&gt;</code>. This lets me use my normal tech stack while still living inside the ChatGPT interface.</p>
<hr />
<h2 id="heading-setup-amp-tunneling">Setup &amp; Tunneling</h2>
<h3 id="heading-dependencies">Dependencies</h3>
<p>The MCP SDK is moving fast, so I'd recommend pinning these versions to avoid breaking changes:</p>
<pre><code class="lang-bash">npm install mcp-handler@1.0.2 @modelcontextprotocol/sdk@1.20.0
</code></pre>
<p><em>Note: There is a Vercel starter out there, but I found it easier to build from scratch since the SDK updates so frequently. (</em><a target="_blank" href="https://github.com/vercel-labs/chatgpt-apps-sdk-nextjs-starter">https://github.com/vercel-labs/chatgpt-apps-sdk-nextjs-starter</a>)</p>
<h3 id="heading-local-development">Local Development</h3>
<p>ChatGPT needs an HTTPS endpoint to talk to. I use a VPS with nginx and an SSH tunnel to point traffic to my local machine. So basically it does : ChatGPT → Cloudflare redirects DNS like XXXX.domain.com → VPS → Nginx to specific local port → My local machine with a SSH reverse proxy.</p>
<p><strong>The Tunnel:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Run this locally to map your VPS port to your local dev server</span>
ssh -R 8888:localhost:3000 -N -o ServerAliveInterval=30 user@your-vps-ip
</code></pre>
<h3 id="heading-nextjs-config">Next.js Config</h3>
<p>You need to set specific headers so OpenAI can embed your app. Here’s what my <code>next.config.js</code> looks like:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> headers() {
  <span class="hljs-keyword">return</span> [
    {
      <span class="hljs-attr">source</span>: <span class="hljs-string">'/mcp'</span>,
      <span class="hljs-attr">headers</span>: [
        { <span class="hljs-attr">key</span>: <span class="hljs-string">'Access-Control-Allow-Origin'</span>, <span class="hljs-attr">value</span>: <span class="hljs-string">'*'</span> },
        { <span class="hljs-attr">key</span>: <span class="hljs-string">'Access-Control-Allow-Methods'</span>, <span class="hljs-attr">value</span>: <span class="hljs-string">'GET,POST,OPTIONS'</span> },
        { <span class="hljs-attr">key</span>: <span class="hljs-string">'Access-Control-Allow-Headers'</span>, <span class="hljs-attr">value</span>: <span class="hljs-string">'*'</span> },
      ],
    },
    {
      <span class="hljs-attr">source</span>: <span class="hljs-string">'/:path*'</span>, 
      <span class="hljs-attr">headers</span>: [
        { <span class="hljs-attr">key</span>: <span class="hljs-string">'X-Frame-Options'</span>, <span class="hljs-attr">value</span>: <span class="hljs-string">'ALLOWALL'</span> },
        {
          <span class="hljs-attr">key</span>: <span class="hljs-string">'Content-Security-Policy'</span>,
          <span class="hljs-attr">value</span>: <span class="hljs-string">"frame-ancestors 'self' https://*.openai.com https://*.oaiusercontent.com https://*.web-sandbox.oaiusercontent.com"</span>
        },
      ],
    },
  ];
}
</code></pre>
<h2 id="heading-the-mcp-route-handler">The MCP Route Handler</h2>
<p>In Next.js, you'll handle everything in <code>app/mcp/route.ts</code>. This is where you define your tools and resources.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { createMcpHandler } <span class="hljs-keyword">from</span> <span class="hljs-string">"mcp-handler"</span>;
<span class="hljs-keyword">import</span> { z } <span class="hljs-keyword">from</span> <span class="hljs-string">"zod"</span>;

<span class="hljs-keyword">const</span> handler = createMcpHandler(<span class="hljs-keyword">async</span> (server) =&gt; {
  <span class="hljs-comment">// 1. The UI Resource</span>
  server.registerResource(
    <span class="hljs-string">"my-app-ui"</span>,
    <span class="hljs-string">"ui://widget/view.html"</span>,
    {
      title: <span class="hljs-string">"App View"</span>,
      mimeType: <span class="hljs-string">"text/html+skybridge"</span>, 
      _meta: {
        <span class="hljs-string">"openai/widgetPrefersBorder"</span>: <span class="hljs-literal">true</span>,
        <span class="hljs-string">"openai/widgetDescription"</span>: <span class="hljs-string">"Interactive viewer"</span>,
        <span class="hljs-string">"openai/widgetDomain"</span>: <span class="hljs-string">"https://your-app.com"</span>,
        <span class="hljs-string">"openai/widgetCSP"</span>: {
          connect_domains: [<span class="hljs-string">"https://api.your-app.com"</span>],
          resource_domains: [<span class="hljs-string">"https://static.your-app.com"</span>],
          frame_domains: [<span class="hljs-string">"https://your-app.com"</span>], 
        }
      }
    },
    <span class="hljs-keyword">async</span> () =&gt; ({
      contents: [{
        uri: <span class="hljs-string">"ui://widget/view.html"</span>,
        mimeType: <span class="hljs-string">"text/html+skybridge"</span>,
        text: createIframeWrapper(<span class="hljs-string">"https://your-app.com/embed"</span>),
      }]
    })
  );

  <span class="hljs-comment">// 2. The Tool</span>
  server.registerTool(
    <span class="hljs-string">"create_something"</span>,
    {
      description: <span class="hljs-string">"Creates a visualization"</span>,
      inputSchema: { prompt: z.string() },
      _meta: {
        <span class="hljs-string">"openai/outputTemplate"</span>: <span class="hljs-string">"ui://widget/view.html"</span>,
        <span class="hljs-string">"openai/toolInvocation/invoking"</span>: <span class="hljs-string">"Generating..."</span>,
        <span class="hljs-string">"openai/toolInvocation/invoked"</span>: <span class="hljs-string">"Ready!"</span>,
        <span class="hljs-string">"openai/resultCanProduceWidget"</span>: <span class="hljs-literal">true</span>,
        <span class="hljs-string">"openai/widgetAccessible"</span>: <span class="hljs-literal">true</span>,
      }
    },
    <span class="hljs-keyword">async</span> ({ prompt }) =&gt; ({
      content: [{ <span class="hljs-keyword">type</span>: <span class="hljs-string">"text"</span>, text: <span class="hljs-string">`Generating: <span class="hljs-subst">${prompt}</span>`</span> }],
      structuredContent: { prompt }, 
      _meta: { <span class="hljs-string">"openai/outputTemplate"</span>: <span class="hljs-string">"ui://widget/view.html"</span> }
    })
  );
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> GET = handler;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> POST = handler;
</code></pre>
<hr />
<h2 id="heading-the-iframe-wrapper-amp-widget-runtime">The Iframe Wrapper &amp; Widget Runtime</h2>
<p>The <code>createIframeWrapper</code> function returns the HTML that ChatGPT renders. It grabs the data from ChatGPT and passes it to your app via URL parameters.</p>
<h3 id="heading-using-windowopenai">Using <code>window.openai</code></h3>
<p>Inside the sandbox, you get a <code>window.openai</code> object. This is how you talk to the host:</p>
<ul>
<li><p><code>toolOutput</code>: The data your tool just returned.</p>
</li>
<li><p><code>widgetState</code>: State that persists between turns.</p>
</li>
<li><p><code>callTool(name, args)</code>: Trigger other tools from your UI.</p>
</li>
<li><p><code>openExternal(url)</code>: Open links outside the chat.</p>
</li>
</ul>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">createIframeWrapper</span>(<span class="hljs-params">appUrl</span>) </span>{
  <span class="hljs-keyword">return</span> <span class="hljs-string">`
    &lt;!DOCTYPE html&gt;
    &lt;html&gt;
      &lt;body style="margin:0; padding:0; overflow:hidden;"&gt;
        &lt;iframe id="frame" style="width:100vw; height:100vh; border:none;"&gt;&lt;/iframe&gt;
        &lt;script&gt;
          const frame = document.getElementById('frame');

          // Get data from OpenAI
          const output = window.openai?.toolOutput || {};
          const prompt = output.structuredContent?.prompt || '';
          frame.src = \`\${appUrl}?prompt=\${encodeURIComponent(prompt)}\`;

          // Listen for updates if the user asks for changes
          window.addEventListener('openai:set_globals', (e) =&gt; {
            const newPrompt = e.detail?.globals?.toolOutput?.structuredContent?.prompt;
            if (newPrompt) frame.src = \`\${appUrl}?prompt=\${encodeURIComponent(newPrompt)}\`;
          });
        &lt;/script&gt;
      &lt;/body&gt;
    &lt;/html&gt;
  `</span>;
}
</code></pre>
<hr />
<h2 id="heading-development-workflow">Development Workflow</h2>
<p>Testing can be slow if you keep starting new chats. Here’s how I sped it up:</p>
<ol>
<li><p><strong>Developer Mode:</strong> Enable this in ChatGPT settings to add your local server.</p>
</li>
<li><p><strong>The Reload Button:</strong> You don't need a new chat for every change. Just click the <strong>"Reload"</strong> icon on your app in the sidebar to refresh tool definitions.</p>
</li>
<li><p><strong>Inspect Element:</strong> You can right-click the widget in ChatGPT and "Inspect" it. This is the only way to see console logs from your wrapper.</p>
</li>
</ol>
<hr />
<h2 id="heading-common-pitfalls">Common Pitfalls</h2>
<ul>
<li><p><strong>Blank Widgets:</strong> Usually a CSP issue. Check the console for <code>frame-ancestors</code> errors.</p>
</li>
<li><p><strong>CORS Errors:</strong> ChatGPT sends <code>OPTIONS</code> requests. If your proxy or server isn't handling preflight, it will fail.</p>
</li>
<li><p><strong>MIME Types:</strong> If you don't use <code>text/html+skybridge</code>, ChatGPT will just show your HTML as a code block.</p>
</li>
<li><p><strong>Auth:</strong> If your app needs a login, the iframe might get stuck. I usually serve a public "viewer" route and pass a token in the URL.</p>
</li>
</ul>
<hr />
<h2 id="heading-submission-amp-launch">Submission &amp; Launch</h2>
<p>When you're ready to submit to the OpenAI App Store, keep these in mind:</p>
<h3 id="heading-metadata-hints">Metadata Hints</h3>
<p>OpenAI uses these to decide how to treat your tools:</p>
<ul>
<li><p><code>readOnlyHint</code>: For tools that don't change data.</p>
</li>
<li><p><code>openWorldHint</code>: For tools that touch external APIs.</p>
</li>
<li><p><code>destructiveHint</code>: For tools that delete things (triggers a confirmation).</p>
</li>
</ul>
<h3 id="heading-requirements">Requirements</h3>
<ol>
<li><p><strong>Domain Verification</strong>: You need to host a token at <code>/.well-known/openai-apps-challenge</code>.</p>
</li>
<li><p><strong>Tool Descriptions</strong>: Be very specific. The model needs to know exactly when to use your tool.</p>
</li>
<li><p><strong>Privacy Policy</strong>: You'll need a hosted policy on your domain.</p>
</li>
</ol>
<hr />
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>Building an MCP app is a bit different from standard web dev, but once you get the tunneling and the iframe wrapper working, it opens up a lot of possibilities.</p>
<p>If you're looking for more examples, check out the <a target="_blank" href="https://github.com/vercel/mcp-starter-nextjs">Vercel MCP Starter</a> or the official <a target="_blank" href="https://platform.openai.com/docs/guides/apps/quickstart">OpenAI docs</a>.</p>
<p>Happy building!</p>
<p><em>Theo - Founder of</em> <a target="_blank" href="https://datadef.io"><em>Datadef</em></a></p>
]]></content:encoded></item><item><title><![CDATA[What I learned debugging a factory at 5 AM]]></title><description><![CDATA[The call came at 5:14 AM. A bearing on production line 7 had failed, and the plant manager wanted to know why our "predictive maintenance system" hadn't caught it. I spent the next four hours tracing data from the dashboard back through our pipelines...]]></description><link>https://blog.datadef.io/what-i-learned-debugging-a-factory-at-5-am</link><guid isPermaLink="true">https://blog.datadef.io/what-i-learned-debugging-a-factory-at-5-am</guid><category><![CDATA[datadef]]></category><category><![CDATA[MedallionArchitecture]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Databricks]]></category><category><![CDATA[data-engineering]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Industrial Automation]]></category><category><![CDATA[industrial iot]]></category><category><![CDATA[Governance]]></category><category><![CDATA[project management]]></category><category><![CDATA[data]]></category><category><![CDATA[data structures]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Pipeline]]></category><category><![CDATA[dbt]]></category><dc:creator><![CDATA[Théophile Louvart]]></dc:creator><pubDate>Mon, 01 Dec 2025 20:51:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/QRKJwE6yfJo/upload/b53dbf2d565d66fa60769308c48e87e9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The call came at 5:14 AM. A bearing on production line 7 had failed, and the plant manager wanted to know why our "predictive maintenance system" hadn't caught it. I spent the next four hours tracing data from the dashboard back through our pipelines, and what I found changed how I think about industrial data architecture.</p>
<p>The sensor had actually detected the anomaly. The temperature readings showed a clear upward trend starting six hours before the failure. But somewhere between the raw sensor data and the predictive model, we'd lost the signal. A validation rule, one I'd written myself trying to be clever about filtering outliers, had flagged the rising temperatures as "implausible" and excluded them from the clean dataset. The model never saw the warning signs because I'd accidentally taught the system to ignore them.</p>
<p>That night taught me something no architecture diagram ever could: in manufacturing, the cost of being too aggressive with data cleaning is measured in broken machinery and production downtime. And it completely reshaped how I approach industrial IoT pipelines.</p>
<hr />
<h2 id="heading-scada-data-is-different-from-everything-else">SCADA Data is different from everything else</h2>
<p><a target="_blank" href="https://datadef.io"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764621390442/6f609922-28a6-4972-80a7-53f9f8d18e9e.png" alt="Medaillon architecture for factories" class="image--center mx-auto" /></a></p>
<p>Let me tell you about SCADA data, because until you've worked with it, you don't really understand why factory floor data is different from everything else in the data engineering world.</p>
<p>SCADA systems were designed in an era when storage was expensive and networks were unreliable. They compress aggressively, they batch opportunistically, and they have their own ideas about what timestamps mean. I once spent two weeks debugging a data quality issue that turned out to be a SCADA system reporting timestamps in local time during winter and UTC during summer — not because of daylight saving time, but because someone had configured it that way a decade ago and nobody remembered why.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764621552628/463b5827-950f-481c-8478-9a533a55d442.png" alt class="image--center mx-auto" /></p>
<p>The instinct when you first encounter this is to normalize everything immediately. Parse the timestamps, convert to UTC, standardize the schemas. Get it into a clean format as fast as possible so you can work with it like normal data. This instinct is wrong, and it took me years to understand why.</p>
<p>The problem is that normalization destroys information. When you convert that quirky timestamp to UTC, you lose the ability to debug timestamp-related issues. When you parse a numeric value from a string, you can no longer see that the source was sending "23.5°C" with a unit suffix that your parser silently dropped. When you unify SCADA data and PLC data into a single table because "they're both sensor readings," you lose the ability to reason about their different reliability characteristics.</p>
<p>I now keep SCADA and PLC data completely separate all the way through ingestion. They land in different tables, with different schemas, preserving all the quirks of their respective sources. When something goes wrong, I can look at exactly what each system sent, in exactly the format it was sent, without wondering whether my transformation logic introduced the bug.</p>
<p>This might sound obvious, but I've reviewed dozens of industrial data architectures, and at least half of them try to unify everything too early. "But it's more efficient!" they say. "We don't want to maintain multiple pipelines!" And then they spend weeks debugging issues that would have been obvious if they could just see what the source systems actually sent.</p>
<hr />
<h2 id="heading-flag-dont-filter">Flag, Don't Filter</h2>
<p>The real art in industrial data isn't ingestion. It's knowing when to validate and when to preserve.</p>
<p>Here's the tension: sensors lie. They drift out of calibration, they malfunction, they send garbage when there's electrical interference. You absolutely need validation logic to catch these issues. But sensors also tell the truth in ways that look like lies. A temperature spike that seems "implausible" might be the first sign of a bearing failure. A pressure reading outside normal ranges might indicate a process change that operations made without telling anyone.</p>
<p>The validation rule that bit me at 5 AM was checking whether temperature readings fell within two standard deviations of the rolling average. Statistically sound, right? The problem is that equipment failure doesn't follow normal distributions. It starts with small anomalies that grow over time, exactly the kind of gradual deviation that a rolling-average filter is designed to smooth out.</p>
<p>I've since moved to a different approach: flag, don't filter. When a reading looks suspicious, I add a warning flag and a severity level, but I keep the data. The downstream models can decide whether to include or exclude flagged readings based on their specific use case. Real-time alerting might want to include everything and tolerate some false positives. Historical analysis for predictive maintenance definitely wants the "anomalous" readings — those are often the most valuable.</p>
<p>The implementation is simple but the discipline is hard. Every time I write validation logic, I ask myself: "If this rule had existed last month, what real events would it have filtered out?" Usually I don't know the answer, which is exactly why I shouldn't be dropping data.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764621625904/c5f76341-7234-4e02-b024-1743a0667b4b.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-anomaly-detection-needs-context">Anomaly detection needs context</h2>
<p>Let's talk about where anomaly detection actually belongs, because I see this done wrong constantly.</p>
<p>The temptation is to put anomaly detection as early as possible in the pipeline. "We want to alert quickly! We need low latency!" So teams run anomaly detection on raw sensor streams, or on lightly cleaned data, and they wonder why the false positive rate is so high.</p>
<p>The problem is context. A temperature reading of 85°C from a sensor on machine 12 is completely normal during one shift and a critical warning during another. A pressure spike in isolation might be noise, but a pressure spike correlated with a temperature drop across multiple sensors is a clear pattern. Anomaly detection without context is just noise detection.</p>
<p>In my current architecture, anomaly detection happens after dimensional enrichment. By the time the model runs, each sensor reading has been joined with factory metadata, shift information, machine operating mode, and historical baselines for that specific machine under those specific conditions. The model isn't asking "is this reading unusual?" It's asking "is this reading unusual for this machine, during this shift, given what we know about normal operation?"</p>
<p>This requires more data movement and more compute. It adds latency. But the alternative is an alerting system that cries wolf so often that operators learn to ignore it. I'd rather have accurate alerts that arrive 30 seconds later than instant alerts that nobody trusts.</p>
<p>The one exception is obvious threshold violations. If a temperature sensor reads 500°C when the physical maximum for that sensor is 200°C, you don't need a sophisticated model to know something is wrong. Those checks happen early, flagged as data quality issues rather than operational anomalies. But the subtle stuff, the gradual degradation, the unusual patterns, the early warning signs, that needs the full context of enriched data.</p>
<hr />
<h2 id="heading-operations-and-maintenance-need-different-views">Operations and Maintenance need different views</h2>
<p>Something I wish someone had told me when I started building industrial dashboards: operations teams and maintenance teams need completely different views of the same data.</p>
<p>Operations wants to know what's happening right now. They need a dashboard that refreshes every minute or two, shows current status at a glance, and highlights any machine that needs immediate attention. The design principle is minimize cognitive load. A quick glance should tell them whether to worry.</p>
<p>Maintenance wants to know what's going to happen. They need historical trends, degradation curves, predictions about when components will need replacement. They don't care about minute-by-minute updates; they care about weekly and monthly patterns. The design principle is maximize insight. Give them the data to make proactive decisions.</p>
<p>I used to try to build one dashboard that served both. It was a disaster. Operations found it too cluttered with historical charts they didn't need. Maintenance found it too focused on real-time metrics that weren't useful for planning. Both teams complained that they couldn't find what they needed.</p>
<p>Now I build separate consumption endpoints, each optimized for its specific use case. The production dashboard is wide tables, pre-aggregated, heavily cached, designed for fast reads. The predictive maintenance reports are deeper, more detailed, with drill-down capabilities and longer historical windows. They're fed by different aggregations of the same underlying data, but they're shaped completely differently.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764621482223/126306ba-bd55-461d-bd82-b9294fe6d6bf.png" alt class="image--center mx-auto" /></p>
<p>The third consumption pattern I've learned to build explicitly is real-time alerting. This isn't a dashboard. It's a feed that goes directly into the plant's notification systems. When an anomaly flag goes true on a critical machine, operators get a push notification immediately. This has different requirements than either dashboard: it needs lower latency, higher reliability, and much stricter filtering to avoid alert fatigue.</p>
<p>Three consumption patterns, three different designs, one underlying data platform. Trying to serve all three from a single "gold table" is how you end up with something that works poorly for everyone.</p>
<hr />
<h2 id="heading-simple-beats-clever">Simple Beats Clever</h2>
<p>The most counterintuitive lesson I've learned about industrial data: simple transformations are almost always better than clever ones.</p>
<p>Early in my career, I was proud of my sophisticated data quality rules. Statistical outlier detection! Machine learning for anomaly classification! Automated schema inference! The pipelines were elegant and impressive in architecture reviews. They were also impossible to debug when they broke, which was often.</p>
<p>These days, my transformation logic is boring. Parse timestamps using explicit format strings. Cast numeric values with fallback to null on failure. Check range limits against hardcoded thresholds from equipment specifications. Flag records that fail validation instead of dropping them. Every transformation can be explained in one sentence.</p>
<p>The sophistication moves to the end of the pipeline, where it belongs. The enrichment logic that adds factory context? Simple joins against dimension tables. The anomaly detection that catches equipment failures? A dedicated model with its own testing and monitoring, not embedded in a transformation job. The aggregations that feed dashboards? Explicit SQL that anyone can read and verify.</p>
<p>This approach makes debugging possible. When the plant manager calls at 5 AM asking why something went wrong, I can trace the data flow in minutes. I can show exactly what the source sent, what transformations were applied, what validation checks passed or failed, and how the final result was calculated. No mysteries, no black boxes, no "the model decided this was an outlier but I'm not sure why."</p>
<hr />
<p>I want to close with something that has nothing to do with specific technologies or patterns, but has fundamentally changed how I work: I sketch every architecture before I build it.</p>
<p>Not as documentation. Documentation comes after. I mean as a design tool, a way of thinking through the data flows before writing any code. When I can see the whole system on a single page, sources on the left, consumption on the right, all the transformations in between, problems become obvious that would be invisible in code. "Wait, where does the factory dimension data come from?" "Why are there two paths from this source to this output?" "What happens when this source is unavailable?"</p>
<p>I use <a target="_blank" href="http://Datadef.io">Datadef.io</a> for this, though the tool matters less than the habit. The diagram I've been implicitly describing throughout this article — SCADA and PLC sources, cleaning and validation, factory enrichment, production dashboards and maintenance reports — I drew it before I wrote a single line of Spark. And I caught three significant design issues just by looking at the picture.</p>
<p>The diagram also becomes the onboarding artifact. When new team members join, I don't hand them thousands of lines of code. I show them the picture. "Data flows from left to right. These are our sources, these are their quirks. This is where cleaning happens. This is where we detect anomalies. These are the outputs and who consumes them." Ten minutes, and they have a mental model that makes everything else easier to understand.</p>
<p>Factory floor data is unforgiving. The sensors are unreliable, the volumes are massive, the latency requirements are real, and the cost of errors is measured in broken equipment and production downtime. I've learned to build systems that are simple enough to debug at 5 AM, flexible enough to preserve the signals I might not recognize as important yet, and explicit enough that I can explain every transformation to a skeptical plant manager.</p>
<p>Draw first. Build simple. Debug fast. The rest follows.</p>
<p>T.L</p>
]]></content:encoded></item><item><title><![CDATA[Datadef v0.1.2 — Starter Templates, Drawing Revolution, and Professional Dashboard]]></title><description><![CDATA[Ready to build data lineage diagrams that actually get used? This release is all about getting you and your team productive instantly. We've shipped production-ready templates, made drawing as intuitive as Draw.io, and created a dashboard experience ...]]></description><link>https://blog.datadef.io/datadef-v012-starter-templates-drawing-revolution-and-professional-dashboard</link><guid isPermaLink="true">https://blog.datadef.io/datadef-v012-starter-templates-drawing-revolution-and-professional-dashboard</guid><category><![CDATA[Data Science]]></category><category><![CDATA[data]]></category><category><![CDATA[canvas]]></category><category><![CDATA[projects]]></category><category><![CDATA[project management]]></category><category><![CDATA[team]]></category><category><![CDATA[Collaboration Tools]]></category><dc:creator><![CDATA[Théophile Louvart]]></dc:creator><pubDate>Thu, 25 Sep 2025 21:25:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/vZJdYl5JVXY/upload/e73cfc660420ae9ff8881ea696435f4c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ready to build data lineage diagrams that actually get used? This release is all about getting you and your team productive instantly. We've shipped production-ready templates, made drawing as intuitive as <a target="_blank" href="http://Draw.io">Draw.io</a>, and created a dashboard experience that feels like the modern tools you already love.</p>
<h2 id="heading-whats-new">What's New</h2>
<h3 id="heading-start-with-production-ready-templates">Start with Production-Ready Templates</h3>
<p>Stop staring at blank canvases. We've built three comprehensive starter templates based on real-world data architectures:</p>
<p><strong>Banking Data Platform</strong> — Complete financial infrastructure with payments processing, fraud detection, and compliance reporting. Everything from raw transactions to executive dashboards.</p>
<p><strong>Marketplace Analytics</strong> — End-to-end two-sided marketplace intelligence. Track supply and demand, optimize matching algorithms, and monitor transaction health across your platform.</p>
<p><strong>Low-Code Analytics Platform</strong> — Perfect for no-code tools and workflow platforms. Monitor app builder usage, feature adoption, and user activation flows.</p>
<p>Each template comes with dozens of pre-built nodes, realistic data flows, and comprehensive documentation. Click "Use Template" and you're minutes away from a professional diagram.</p>
<h3 id="heading-drawing-that-just-works">Drawing That Just Works</h3>
<p>Remember the joy of using <a target="_blank" href="http://Draw.io">Draw.io</a>? We've brought that same intuitive experience to data lineage diagrams.</p>
<p><strong>Professional Shape Tools</strong> — Create arrows, rectangles, circles, and text boxes that look crisp and professional. Every element has consistent styling, smooth interactions, and the visual polish your stakeholders expect.</p>
<p><strong>Dedicated Text Editing</strong> — No more cramped text boxes. Edit text in a dedicated sidebar with full typography controls. Choose fonts, adjust sizes, align text, and see changes in real-time.</p>
<p><strong>Streamlined Interface</strong> — We removed the confusing accordion menus and put everything you need right at your fingertips. Color picking, sizing, and styling are now effortless.</p>
<h3 id="heading-dashboard-that-scales-with-your-team">Dashboard That Scales With Your Team</h3>
<p>Your dashboard should help you work faster, not slower. The new experience puts your most important projects front and center.</p>
<p><strong>Smart Recent Activity</strong> — See your last three projects with node counts, role indicators, and update timestamps. Jump back into any diagram with one click.</p>
<p><strong>Better Project Cards</strong> — Clean, modern design that shows what matters: project details, your role, and recent activity. Everything loads fast and looks professional.</p>
<p><strong>Team-Ready Onboarding</strong> — New users get clear guidance and quick access to templates. No more wondering "what should I build first?"</p>
<h3 id="heading-built-for-team-collaboration">Built for Team Collaboration</h3>
<p>Great diagrams are built by teams, not individuals.</p>
<p><strong>Project Invitations</strong> — Invite teammates by email or username with specific roles (Viewer, Editor, Admin). They'll get a clean notification and can start collaborating immediately.</p>
<p><strong>Public Sharing</strong> — Make any project publicly viewable with read-only access. Perfect for sharing templates, documentation, or showcasing your work.</p>
<p><strong>Role-Based Access</strong> — Fine-grained permissions ensure the right people can edit while others can view and comment.</p>
<h2 id="heading-why-this-matters">Why This Matters</h2>
<p>Data lineage shouldn't require a PhD in data engineering. Whether you're documenting existing systems, planning new architectures, or onboarding team members, Datadef now gets you from idea to professional diagram in minutes, not hours.</p>
<p>The templates alone will save you weeks of work. The improved drawing experience means you'll actually enjoy creating diagrams. And the collaboration features ensure your whole team can contribute.</p>
<h2 id="heading-get-started-today">Get Started Today</h2>
<ol>
<li><p><strong>Explore Templates</strong> — Browse our starter gallery and find one that matches your use case</p>
</li>
<li><p><strong>Try the New Drawing Tools</strong> — Create shapes, add text, and see how intuitive professional diagramming can be</p>
</li>
<li><p><strong>Invite Your Team</strong> — Share projects and collaborate in real-time with role-based access</p>
</li>
</ol>
<p>Ready to transform how your team thinks about data? Dive into v0.1.2 and see the difference.</p>
<h2 id="heading-whats-coming-next">What's Coming Next</h2>
<p>We're already working on the next wave of improvements:</p>
<ul>
<li><p><strong>Advanced Template Customization</strong> — Fork templates and create your own organizational standards</p>
</li>
<li><p><strong>Enhanced Collaboration</strong> — Comments, suggestions, and real-time editing</p>
</li>
<li><p><strong>API Integration</strong> — Connect your diagrams to live data sources and monitoring tools</p>
</li>
<li><p><strong>Enterprise Features</strong> — SSO, advanced permissions, and audit trails</p>
</li>
</ul>
<h2 id="heading-community-amp-feedback">Community &amp; Feedback</h2>
<p>Love the new features? Have ideas for what's next? We'd love to hear from you:</p>
<ul>
<li><p>Share your diagrams on Twitter with #datadef</p>
</li>
<li><p>Send feedback directly through the app</p>
</li>
</ul>
<h2 id="heading-test-it-out">Test it out!</h2>
<p>v0.1.2 is live for all users. Existing projects will automatically benefit from the improved dashboard and collaboration features. Ready to try the new templates and drawing tools?</p>
<p><a target="_blank" href="https://datadef.io"><strong>Start Building →</strong></a></p>
]]></content:encoded></item><item><title><![CDATA[Why the best innovations often hide in the most boring features]]></title><description><![CDATA[I've been building software for long enough to know that the features you think will change everything usually don't, and the ones that actually do change everything are the ones you barely thought about when you shipped them. Case in point: Datadef'...]]></description><link>https://blog.datadef.io/why-the-best-innovations-often-hide-in-the-most-boring-features</link><guid isPermaLink="true">https://blog.datadef.io/why-the-best-innovations-often-hide-in-the-most-boring-features</guid><category><![CDATA[AI]]></category><category><![CDATA[json]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[data]]></category><category><![CDATA[canvas]]></category><category><![CDATA[innovation]]></category><category><![CDATA[tech ]]></category><category><![CDATA[technology]]></category><dc:creator><![CDATA[Théophile Louvart]]></dc:creator><pubDate>Tue, 23 Sep 2025 14:15:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/HfFoo4d061A/upload/3d4f2fb6ecd3d1919846d8fd947be527.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've been building software for long enough to know that the features you think will change everything usually don't, and the ones that actually do change everything are the ones you barely thought about when you shipped them. Case in point: Datadef's JSON export feature.</p>
<p>Six months ago, if you'd told me that a mundane "Export Project" button would become the cornerstone of an entirely new way of thinking about data modeling, I would have laughed. Today, I'm watching myself spin up complex data lineage diagrams in under 30 seconds, and I'm still a little stunned by how we got here.</p>
<h2 id="heading-a-geometry-problem">A geometry problem</h2>
<p>Let me start with a fundamental truth about AI that took me way too long to internalize: <strong>AI doesn't see the world the way we do</strong>.</p>
<p>When you're building a visual tool like Datadef, where users drag nodes around a canvas, connect them with edges, and build these beautiful, sprawling maps of their data architecture, you naturally think in terms of spatial relationships. Node A goes here, Node B connects to it there, the whole thing needs to look balanced and readable.</p>
<p>But ask an LLM to generate a canvas layout directly, and you'll get something that looks like it was designed by someone wearing a blindfold. The models are incredible at understanding relationships, dependencies, and hierarchies, but they're terrible at translating that understanding into X/Y coordinates that actually make visual sense.</p>
<p>This became painfully obvious during my early experiments with AI-generated templates. I'd feed GPT-4 a description like "create a data pipeline for an e-commerce analytics system," and it would dutifully generate all the right components, customer data, order processing, inventory management, reporting layers, but the spatial arrangement would be completely nonsensical. Nodes overlapping, edges crossing at bizarre angles, the whole thing looking like a bowl of spaghetti.</p>
<h2 id="heading-using-simple-json">Using simple JSON</h2>
<p>The breakthrough came from an unexpected direction. Months earlier, I'd built what I thought was just a basic utility feature: the ability to export entire Datadef projects as JSON files. The use case seemed straightforward, users wanted to share their work, back it up, maybe import it into different environments.</p>
<p>The technical implementation was relatively simple. Take everything that makes up a project, nodes, edges, documentation, glossary terms, metadata, and, serialize it into a structured format:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"nodes"</span>: [
    {
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"customer_data"</span>,
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"data_source"</span>,
      <span class="hljs-attr">"position"</span>: {<span class="hljs-attr">"x"</span>: <span class="hljs-number">100</span>, <span class="hljs-attr">"y"</span>: <span class="hljs-number">200</span>},
      <span class="hljs-attr">"properties"</span>: {
        <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Customer Database"</span>,
        <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Core customer information..."</span>,
        <span class="hljs-attr">"schema"</span>: {...}
      }
    }
  ],
  <span class="hljs-attr">"edges"</span>: [
    {
      <span class="hljs-attr">"source"</span>: <span class="hljs-string">"customer_data"</span>,
      <span class="hljs-attr">"target"</span>: <span class="hljs-string">"analytics_pipeline"</span>,
      <span class="hljs-attr">"relationship_type"</span>: <span class="hljs-string">"feeds_into"</span>
    }
  ],
  <span class="hljs-attr">"documentation"</span>: {...},
  <span class="hljs-attr">"glossary"</span>: {...}
}
</code></pre>
<p>What I didn't realize at the time was that I'd accidentally solved the AI geometry problem. JSON is a language that LLMs speak fluently. They can read it, understand its structure, learn patterns from it, and, most importantly, generate new, valid JSON that follows those same patterns.</p>
<p>Instead of asking AI to think spatially, I could ask it to think structurally. Instead of "place these nodes on a canvas," I could ask "generate a JSON representation of this data architecture." The visual layout could be handled separately, using algorithms that actually understand geometry.</p>
<h2 id="heading-the-30-second-miracle">The 30-Second miracle</h2>
<p>Once this clicked, everything changed. I started feeding sample Datadef projects to GPT-5, watching it learn the patterns of how we represent different types of data architectures in JSON. Then I flipped it around: give it a natural language description, ask it to generate the JSON for a complete project.</p>
<p>The results were genuinely stunning. A user could type something like:</p>
<p><em>"I'm building a recommendation engine for a streaming platform. I need to track user viewing behavior, content metadata, generate embeddings for similarity matching, and feed recommendations back to the frontend."</em></p>
<p>And within seconds, they'd have a complete workspace: data ingestion nodes, processing pipelines, ML model components, API endpoints, all properly connected and documented. Not just the high-level architecture, but the detailed schemas, the relationship types, even suggested glossary terms for domain-specific concepts.</p>
<p>What used to take experienced data architects hours of careful planning and layout now happened in the time it takes to grab a coffee.</p>
<h2 id="heading-beyond-templates-the-conversational-workspace">Beyond templates: The conversational workspace</h2>
<p>But here's where it gets really interesting. Once you have AI that can generate complete project structures from natural language, you're not limited to just starter templates. You can have <em>conversations</em> with your workspace.</p>
<p>"Add a real-time fraud detection component to this payment processing pipeline."</p>
<p>"Reorganize this to follow a medallion architecture pattern."</p>
<p>"Generate documentation for all the nodes in the customer journey section."</p>
<p>Each request results in precise JSON modifications that the system can apply instantly. The workspace becomes fluid, responsive to natural language in a way that traditional GUI interactions never could be.</p>
<p>I've watched users iterate on complex data architectures by simply describing what they want to change, rather than clicking through dozens of property panels and relationship dialogs. It's like having a conversation with your diagram, and the diagram actually understands what you're saying.</p>
<h3 id="heading-json-limitation-and-whats-next">JSON limitation and what's next</h3>
<p>Now, let me be honest about where we are versus where we're heading. Right now, this conversational interaction is still pretty basic, it's essentially AI generating modified JSON blobs that get imported back into Datadef. It works, but it's a bit like having a brilliant architect who can only communicate through detailed blueprints rather than walking around the construction site with you.</p>
<p>The real future lies in proper API integration. What I'm excited about is implementing something like the Model Context Protocol (MCP) or building dedicated API endpoints that let AI interact directly with Datadef's internal systems. Instead of "generate new JSON and replace everything," we'd have granular operations like:</p>
<ul>
<li><p><code>addNode(type, position, properties)</code></p>
</li>
<li><p><code>updateDocumentation(nodeId, content)</code></p>
</li>
<li><p><code>createRelationship(source, target, type)</code></p>
</li>
<li><p><code>reorganizeLayout(pattern, constraints)</code></p>
</li>
</ul>
<p>This isn't just cleaner architecturally, it opens up entirely new possibilities. AI could make incremental changes while preserving user customizations, maintain undo/redo history properly, and even collaborate with multiple users in real-time. The conversational workspace stops being a party trick and becomes a genuine new interface paradigm.</p>
<p>Think about it: instead of learning complex software interfaces, users could just <em>talk</em> to their tools. The JSON approach proved the concept works, but proper API integration will make it feel natural.</p>
<h2 id="heading-datadef-usage-some-compound-effect">Datadef usage : some compound effect</h2>
<p>What's fascinating is how this creates a compounding effect, though not in the way you might initially think. Let me be clear upfront: Datadef doesn't access user project data. Your architectures, your business logic, your sensitive information, all of that stays completely private.</p>
<p>But here's what's interesting: the <em>patterns</em> of how people structure data architectures are surprisingly consistent across industries, even when the specific content is completely different. The AI learns these structural patterns from publicly available examples, documentation, and architectural best practices, not from user data.</p>
<p>When a user describes building "a real-time fraud detection system for payment processing," the AI draws from its understanding of common fraud detection patterns, typical payment processing flows, and industry-standard architectures.</p>
<p>The result is still remarkably contextual. A user working on healthcare data pipelines gets suggestions that naturally incorporate HIPAA compliance patterns, proper audit logging, and the data segregation approaches common in that space. Someone building financial analytics systems gets architectures that reflect standard risk management frameworks and regulatory reporting structures.</p>
<p>The AI doesn't just generate generic template, it generates contextually appropriate architectures that feel like they were designed by someone who actually understands your problem space, while keeping your actual implementation details completely private.</p>
<h2 id="heading-accidental-innovation-from-a-boring-feature-to-exciting-opportunities">Accidental innovation : from a boring feature to exciting opportunities</h2>
<p>This whole experience has shifted how I think about the relationship between AI and creative tools. We often imagine AI as this separate assistant that we consult for specific tasks. But what I'm seeing with Datadef is something more integrated: AI as a medium for creative expression, not just a helper.</p>
<p>When you can describe complex systems in natural language and have them materialize as structured, visual representations, you're operating at a different level of abstraction. You're thinking about the <em>what</em> and <em>why</em> of your architecture, and letting AI handle the <em>how</em> of representing it.</p>
<p>It's the difference between being a painter who has to mix their own pigments and one who can focus entirely on the artistic vision because all the tools just work.</p>
<p>The most important lesson here might be about how innovation actually happens in software. The features that transform user experiences aren't always the ones you plan for. Sometimes they emerge from the intersection of two seemingly unrelated decisions: choosing JSON as a serialization format, and experimenting with AI integration.</p>
<p>That boring export feature? It wasn't just about portability. It was accidentally creating a universal language that both humans and AI could speak fluently. And that changes everything.</p>
<p>Six months ago, I thought I was building a better diagramming tool. Today, I realize I've stumbled into something much more interesting: a new way of thinking about the relationship between human creativity and machine intelligence. And it all started with a simple JSON export button.</p>
<p>I will finish with a personal quote :</p>
<blockquote>
<h3 id="heading-sometimes-the-most-powerful-architectures-are-the-ones-you-never-meant-to-design">Sometimes the most powerful architectures are the ones you never meant to design .</h3>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[How to approach project management when the project is data?]]></title><description><![CDATA[Introduction
In most companies, the architecture of project management instruments usually revolves around elements like tasks, milestones, and timelines. These tools, though very effective in delivering software products or handling general business...]]></description><link>https://blog.datadef.io/how-to-approach-project-management-when-the-project-is-data</link><guid isPermaLink="true">https://blog.datadef.io/how-to-approach-project-management-when-the-project-is-data</guid><category><![CDATA[project management]]></category><category><![CDATA[data]]></category><category><![CDATA[engineering]]></category><category><![CDATA[engineering-management]]></category><category><![CDATA[visualization]]></category><category><![CDATA[canvas]]></category><dc:creator><![CDATA[Théophile Louvart]]></dc:creator><pubDate>Sun, 21 Sep 2025 13:26:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758460837550/6846b156-2f84-4d8c-93f4-37ed2d741c3b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>In most companies, the architecture of project management instruments usually revolves around elements like tasks, milestones, and timelines. These tools, though very effective in delivering software products or handling general business operations, somehow lag in reflecting the specifics that make data projects different.</p>
<h2 id="heading-data-project-management-challenges">Data project management challenges</h2>
<p>Compositions ban not only refer to tasks but also talk about flows. The fact is, data fades away from the mixed sources through several layers like transformation, modeling, and quality control before it is delivered to a business user. This way, the data gear "learns" definitions, assumptions, and ownership. So to "manage" a data project really means to manage the data lineage, the documentation for every piece of the data, and the governance structures for the compliance and trust assurance.</p>
<p>My impression is that many teams piece this together using different tools. Miro might show architecture at a very abstract level, Jira will be used to keep track of implementation tasks, Confluence or Notion will be utilized for storing definitions, and spreadsheets will be holding Indicators. Such fragmentation always leads to the same problem: The map of the organization is out of sync with the territory. The diagrams are not updated when pipelines change. The glossaries do not match the metrics that are in production. Documentation turns from a place where half-done pages coexist to a graveyard. Compliance inspections become detective work.</p>
<p>With <a target="_blank" href="http://Datadef.io">Datadef.io</a>, I am trying to bridge such a gulf. The objective is to connect the three - diagrams, glossaries, and documentation - instead of keeping them fundamentally different. A canvas is the core analogy: the teams can show the data they have in the way that matches the origin, changes, and destination. But it is not like a pure whiteboard - every object on the canvas is deep. Once a table, an indicator, or a connection is described, then it can be used without the need for further description throughout the glossary, documentation, and project inventory. In case of any change, the change will be communicated.</p>
<p>Document is the second level. Since each element of a diagram is associated with metadata, the documentation of the project, no matter if it is a wiki, a PDF, or a Word file, can be automatically generated. Here, the headache of "keeping documentation up to date" turns into a resolved issue: documentation is not a parallel activity anymore, it is simply a byproduct of the project work. The assistance of AI makes raw descriptions done by the team into user-friendly narrative documents cutting down the description time for stakeholders.</p>
<p>Eventually, governance is a part of the design rather than an afterthought. The features of ownership, quality, rules, examples, and compliance can be among those that are connected to objects. It is, therefore, that an audit or an ISO inspection can use the very same knowledge base that the team accesses for development.</p>
<h2 id="heading-practical-application-and-comparaison-with-other-solutions">Practical application and comparaison with other solutions</h2>
<p>A common case can be a team working on a pipeline that is the data source for a Power BI dashboard. Instead of the team going through stages like describing the architecture in Miro, defining indicators in Excel, and writing a Word document for compliance, they can open the canvas and map the complete flow there, mark each element with definitions and responsibilities, and let the system make consistent documentation across all outputs. The organization acts upon the definitions that are the same for all and governance is no longer a property of the workflow at a later stage but occurs simultaneously with it.</p>
<p>Some people tend to rely on extremely complicated solutions like Collibra or Informatica which, albeit being very powerful, are, most of the time, too cumbersome for small teams. Others are using generalist tools like Confluence, Notion, Jira, and Miro, which are characterized by being quite agile but hardly synchronized or specially tailored for data lineage. An overwhelming majority of people if not all resorting to workarounds of combining spreadsheets, diagrams, and texts are there as well.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In short, the way most organizations handle data projects today is fragmented: architecture lives in one place, definitions in another, and documentation somewhere else entirely. This separation creates friction, slows down collaboration, and undermines trust in the data itself. What I am building with <a target="_blank" href="https://datadef.io">Datadef.io</a> is an attempt to unify those layers—lineage, documentation, and governance—into a single, living workspace where change in one place reflects everywhere.</p>
<p>The project is still young, but I believe the foundation is solid enough to be useful already. If you have faced the same pain points—out-of-date diagrams, scattered glossaries, compliance reviews that feel like detective work—I would love for you to try it, challenge it, and tell me what works or doesn’t. Datadef is not meant to replace all your tools, but to give data teams a space designed specifically for the realities of managing complex data projects.</p>
<p>You can explore it here: <a target="_blank" href="https://datadef.io">Datadef.io</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Datadef v0.1.1 — Seeing Lineage, Shipping Confidence, Writing Smoothly]]></title><description><![CDATA[This week was about turning sharp corners into smooth curves. I wanted Datadef to feel more alive when you explore lineage, more reliable when you move projects around, and more pleasant when you write docs. v0.1.1 brings all three.
TL;DR

Lineage hi...]]></description><link>https://blog.datadef.io/datadef-v011-seeing-lineage-shipping-confidence-writing-smoothly</link><guid isPermaLink="true">https://blog.datadef.io/datadef-v011-seeing-lineage-shipping-confidence-writing-smoothly</guid><category><![CDATA[Data lineage]]></category><category><![CDATA[engineering]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[data]]></category><category><![CDATA[canvas]]></category><category><![CDATA[project management]]></category><dc:creator><![CDATA[Théophile Louvart]]></dc:creator><pubDate>Sun, 21 Sep 2025 10:56:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758451856833/7577521c-b40f-4794-88fa-3b0518031785.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This week was about turning sharp corners into smooth curves. I wanted Datadef to feel more alive when you explore lineage, more reliable when you move projects around, and more pleasant when you write docs. v0.1.1 brings all three.</p>
<h2 id="heading-tldr">TL;DR</h2>
<ul>
<li><p>Lineage highlights across nodes, edges, and table columns for instant context</p>
</li>
<li><p>Export/Import now includes indicators (with handle remapping), lineage relationships, wiki documents, and columnIndex</p>
</li>
<li><p>Copy/Paste and Duplicate fixed for indicator edges with correct handle/ID remapping</p>
</li>
<li><p>Cleaner image export (no selection borders or resize handles)</p>
</li>
<li><p>Polished edge tips on highlighted edges</p>
</li>
<li><p>Wiki: default “Project Documentation” title + stable title editing (no more UI resets)</p>
</li>
</ul>
<h3 id="heading-check-all-the-changes-at-datadefiohttpsdatadefio">Check all the changes at : <a target="_blank" href="https://datadef.io">datadef.io</a></h3>
<h2 id="heading-why-this-release-matters">Why this release matters</h2>
<p>When you select a node in the Lineage panel, you should immediately feel how data flows. In v0.1.1, connected nodes and edges light up while everything else takes a back seat. For tables, even the relevant columns highlight. It’s a tiny interaction that changes how fast you can think with the canvas.</p>
<p>Project portability also took a big step forward. Export/Import now includes indicators (with per-node cloning and handle remapping), lineage relationships, wiki documents (with hierarchy and links), and columnIndex. You can move a project and trust it’ll look and behave the same.</p>
<p>And for writing: the Wiki now starts with a sensible “Project Documentation” title and your title edits stick—even when you’re in the notion-style editor body. Simple, but it removes friction.</p>
<h2 id="heading-whats-new">What’s new</h2>
<h3 id="heading-lineage-highlight-system">Lineage highlight system</h3>
<ul>
<li><p>A shared highlight state in the canvas context</p>
</li>
<li><p>Unified Lineage panel dispatches rich highlight metadata (nodes, edges, and column maps)</p>
</li>
<li><p>Nodes and edges dim/highlight smoothly; TableNode supports column-level highlighting</p>
</li>
<li><p>Table refactor: faster, cleaner column rendering and interaction to make column-level lineage practical and performant</p>
</li>
</ul>
<p><img src="https://pbs.twimg.com/media/G1TmRswWkAEiAqT?format=jpg&amp;name=large" alt="Image" /></p>
<h3 id="heading-exportimport-coverage">Export/Import coverage</h3>
<ul>
<li><p>Indicator edges: source/target handles remapped via index tokens</p>
</li>
<li><p>Lineage relationships: exported/index-mapped and correctly re-linked on import</p>
</li>
<li><p>Wiki documents: hierarchy (parentIndex), linked nodes (linkedNodeIndex), and metadata</p>
</li>
<li><p>nodeIndicators.columnIndex persisted to keep column-level lineage fidelity across environments</p>
</li>
</ul>
<h2 id="heading-ux-polish">UX polish</h2>
<h3 id="heading-clipboard-duplication">Clipboard + duplication</h3>
<ul>
<li><p>Indicator-aware copy/paste/duplicate: indicators are cloned per-node; handles remapped; edges point to the right indicators</p>
</li>
<li><p>Table copy/duplicate respects column associations so column-indexed indicators remain coherent after paste/duplicate</p>
</li>
</ul>
<h3 id="heading-image-export">Image export</h3>
<ul>
<li>Selection borders and resize handles are hidden during capture for cleaner images</li>
</ul>
<h3 id="heading-edge-visuals">Edge visuals</h3>
<ul>
<li>The highlight glow overlay no longer renders a marker on top of the arrow. The tip now feels smaller and more precise without changing the underlying edge.</li>
</ul>
<h3 id="heading-wiki">Wiki</h3>
<ul>
<li><p>Default title “Project Documentation” on first creation</p>
</li>
<li><p>Title stays put when editing the body in the notion-style editor</p>
</li>
</ul>
<h2 id="heading-fixes">Fixes</h2>
<ul>
<li><p>Multi-select duplication used to lose indicator edges. It now clones indicators and remaps edge handle IDs correctly.</p>
</li>
<li><p>Export/Import used to miss parts of your project (indicator handles, lineage, wiki, columnIndex). It’s all in now.</p>
</li>
<li><p>TableNode stability fixes: removed edge cases where column-bound indicators could desync during complex edits.</p>
</li>
</ul>
<h1 id="heading-see-you-next-week">See you next week</h1>
<p>Datadef v0.1.1 lays another brick in the foundation, we’re smoothing the edges now so we can build faster later. The goal isn’t just polish, but momentum: every release should make the canvas feel more alive and the workflow more natural. From here, it’s about scaling the vision, bigger projects, richer context, and more ways for Datadef to grow with you.</p>
<h3 id="heading-check-all-the-changes-at-datadefiohttpsdatadefio-1">Check all the changes at : <a target="_blank" href="https://datadef.io">datadef.io</a></h3>
]]></content:encoded></item></channel></rss>