<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Tech on Degang Wang</title><link>https://degangwang.com/categories/tech/</link><description>Recent content in Tech on Degang Wang</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 10 May 2026 00:00:00 -0500</lastBuildDate><atom:link href="https://degangwang.com/categories/tech/index.xml" rel="self" type="application/rss+xml"/><item><title>From Databricks to Your Analytics Tool: A RWD Workflow</title><link>https://degangwang.com/2026/05/10/databricks-rwd-workflow/</link><pubDate>Sun, 10 May 2026 00:00:00 -0500</pubDate><guid>https://degangwang.com/2026/05/10/databricks-rwd-workflow/</guid><description>&lt;h2 id="why-not-just-load-everything-locally"&gt;Why Not Just Load Everything Locally?&lt;/h2&gt;
&lt;p&gt;Real-world data (RWD) — claims, EHR, registries — is massive. A single claims database can have billions of rows across diagnosis, procedure, and pharmacy tables. You cannot &lt;code&gt;pd.read_csv()&lt;/code&gt; your way through it.&lt;/p&gt;
&lt;p&gt;The workflow that works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Manipulate data in Databricks&lt;/strong&gt; — filter, join, aggregate using SQL on distributed compute&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create an analytic-ready table&lt;/strong&gt; — a focused cohort with only the columns and rows you need&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bring the result to your analytics tool&lt;/strong&gt; — Python, R, or SAS for statistical analysis and visualization&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This post walks through that workflow using a diabetes cohort as an example. While we use Databricks here, the same pattern applies to any big data computing engine — Snowflake, Impala, AWS Athena, or Google BigQuery. The principle is the same: reduce data remotely, then bring the result to your analytics tool.&lt;/p&gt;</description></item><item><title>Databricks Connect: Run Spark Code from Your Local IDE</title><link>https://degangwang.com/2026/05/08/databricks-connect/</link><pubDate>Fri, 08 May 2026 00:00:00 -0500</pubDate><guid>https://degangwang.com/2026/05/08/databricks-connect/</guid><description>&lt;h2 id="why-databricks-connect"&gt;Why Databricks Connect?&lt;/h2&gt;
&lt;p&gt;The Databricks web notebook is great for interactive exploration, but sometimes you want to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use your preferred local IDE (VS Code, PyCharm, etc.)&lt;/li&gt;
&lt;li&gt;Version control your code with git&lt;/li&gt;
&lt;li&gt;Run scripts in CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Debug with local tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Databricks Connect lets you do all of this while still using a remote Databricks cluster for compute.&lt;/p&gt;</description></item><item><title>Authoring mathematical formulae</title><link>https://degangwang.com/2025/07/06/mathematical-formulae/</link><pubDate>Sun, 06 Jul 2025 00:00:00 -0500</pubDate><guid>https://degangwang.com/2025/07/06/mathematical-formulae/</guid><description>&lt;h2 id="authoring-mathematical-and-chemical-equations"&gt;Authoring mathematical and chemical equations&lt;/h2&gt;
&lt;p&gt;Cleanwhite theme now has built-in &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;KaTeX&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;\KaTeX&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class="katex-html" aria-hidden="true"&gt;&lt;span class="base"&gt;&lt;span class="strut" style="height:0.8988em;vertical-align:-0.2155em;"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord textrm"&gt;K&lt;/span&gt;&lt;span class="mspace" style="margin-right:-0.17em;"&gt;&lt;/span&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist" style="height:0.6833em;"&gt;&lt;span style="top:-2.905em;"&gt;&lt;span class="pstrut" style="height:2.7em;"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord textrm mtight sizing reset-size6 size3"&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace" style="margin-right:-0.15em;"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord textrm"&gt;T&lt;/span&gt;&lt;span class="mspace" style="margin-right:-0.1667em;"&gt;&lt;/span&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist" style="height:0.4678em;"&gt;&lt;span style="top:-2.7845em;"&gt;&lt;span class="pstrut" style="height:3em;"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord textrm"&gt;E&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist" style="height:0.2155em;"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace" style="margin-right:-0.125em;"&gt;&lt;/span&gt;&lt;span class="mord textrm"&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; support, so that you can easily include
complex mathematical formulae into your web page, either inline or centred
on its own line. The theme uses Hugo&amp;rsquo;s embedded instance of the KaTeX
display engine to render mathematical markup to HTML at build time.
With this server side rendering of formulae, the same output is produced,
regardless of your browser or your environment.&lt;/p&gt;</description></item></channel></rss>