Data Science & ML on Saturn Cloud

Data Science & ML on Saturn Cloudhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/categories/data-science-ml/Recent content in Data Science & ML on Saturn CloudHugo -- gohugo.ioThu, 11 Dec 2025 00:00:00 +0000Choosing an MLOps Platform in 2026https://deploy-preview-1991--saturn-cloud.netlify.app/blog/choosing-an-mlops-platform-in-2026/Thu, 11 Dec 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/choosing-an-mlops-platform-in-2026/Pick the wrong MLOps platform, and you’ll spend the next two years babysitting custom infrastructure. Pick the right one and your teams can actually ship models. This guide is for people deciding what actually matters, what the options look like, and where Saturn Cloud fits if you need something that runs where your GPUs are. Figure out your constraints first Before you look at any vendor, answer these questions. They’ll eliminate most options:SageMaker vs. Saturn Cloud: Which One Is Better for Your Team?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/sagemaker-vs-saturn-cloud-which-one-is-better-for-your-team/Wed, 10 Dec 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/sagemaker-vs-saturn-cloud-which-one-is-better-for-your-team/SageMaker and Saturn Cloud both give ML teams managed infrastructure for notebooks, training, and deployment. But they’re built on different assumptions- and depending on how your team works, one is probably a better fit than the other. This post walks through: What each platform does How they compare on developer experience, GPU access, and flexibility When SageMaker is the right call When Saturn Cloud makes more sense If you’re evaluating both or wondering whether to stick with SageMaker, this should help you decide.Production Inference at Scale with Saturn Cloud & Nebius Token Factoryhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/production-inference-at-scale-with-saturn-cloud-nebius-token-factory/Mon, 01 Dec 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/production-inference-at-scale-with-saturn-cloud-nebius-token-factory/If you’re deploying models to production and handling high request volumes while managing costs and latency, Nebius offers two complementary approaches. Where AWS has Bedrock and Sagemaker, Nebius has Saturn Cloud for MLOps and orchestration, and Token Factory for managed inference-as-a-service. This post covers what Nebius provides for inference workloads, how Saturn Cloud simplifies deployment and management, and the joint integration between the two. Comparing GPU cloud providers? Download our GPU Cloud Comparison Report analyzing 17 providers across pricing, InfiniBand networking, storage, and enterprise readiness.Top 15 Cloud Platforms for AI/ML Teams in 2026https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-15-cloud-platforms-for-ai-ml-teams-in-2026/Thu, 27 Nov 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-15-cloud-platforms-for-ai-ml-teams-in-2026/AI and machine learning teams need reliable access to on-demand GPU computing without breaking their budgets. This guide highlights 15 cloud platforms for AI/ML teams with varying needs. Whether you need bare-metal GPUs, managed notebooks, or complete ML platforms, this list covers options for different priorities and workflows. This guide highlights 15 cloud platforms for AI/ML teams with varying needs. Whether you need bare-metal GPUs, managed notebooks, or complete ML platforms, this list covers options for different priorities and workflows.Saturn Cloud on Neoclouds: Setting Up a Portable AI Development Platformhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/saturn-cloud-on-neoclouds-setting-up-a-portable-ai-development-platform/Thu, 09 Oct 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/saturn-cloud-on-neoclouds-setting-up-a-portable-ai-development-platform/AI teams are increasingly moving to Cloud Services like Neoclouds, such as Nebius, Crusoe, and Vultr, for better GPU availability and lower costs. However, infrastructure alone doesn’t get you to production, which can mean weeks of internal platform work before ML work can begin. Saturn Cloud bridges that gap as a single-tenant, portable AI platform that installs directly into your Neocloud account and runs on your managed Kubernetes cluster. You get enterprise-ready MLOps tooling—workspaces, scheduled jobs, model deployments, and team management—without having to build it yourself or relinquish control over your infrastructure and data.Finetune Llama with Affordable On-Demand H100 and H200 GPU Instanceshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/finetune-llama-with-affordable-on-demand-h100-and-h200-gpu-instances/Sun, 21 Sep 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/finetune-llama-with-affordable-on-demand-h100-and-h200-gpu-instances/Accessing enterprise-grade GPUs like NVIDIA H100s and H200s has traditionally meant choosing between expensive on-demand pricing or navigating long reservation queues with cloud providers like AWS. Through Saturn Cloud’s MLOps platform with Nebius’ flexible GPUs, teams can now get instant access to high-performance GPUs at a significantly lower cost, all without compromising on availability or flexibility. ML teams can spin up distributed GPU clusters in seconds using familiar tools like Dask, Ray, or PyTorch Distributed.Introducing Saturn Cloud’s Pro Plan for Teamshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/introducing-saturn-clouds-pro-plan-for-teams/Wed, 27 Aug 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/introducing-saturn-clouds-pro-plan-for-teams/Supporting Your AI and ML Journey for Any Team Building the proper infrastructure for AI and machine learning (ML) workflows can be challenging. Many AI engineers are stepping up to create these foundational systems, and having the right tools can make a difference. That’s why we’re eager to introduce our enhanced Pro Plan —an annual plan perfect for small to medium teams looking for a reliable and scalable solution. Small AI/ML teams can work in the cloud securely without setting up any backend infrastructure.Automating Deployment with CI/CD for data sciencehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/automating-deployment-with-cicd-for-data-science/Wed, 20 Aug 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/automating-deployment-with-cicd-for-data-science/In the world of data science, the ability to quickly and reliably deploy models and applications is crucial. This article explores how Continuous Integration and Continuous Deployment (CI/CD) pipelines can streamline this process on Saturn Cloud, ensuring that data-driven insights and innovations reach production environments with speed and consistency. This article goes through a real world use case of using CI/CD for deployments that we use in production at Saturn Cloud.Easily Build LLMs With Saturn Cloudhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/easily-build-llms-with-saturn-cloud/Tue, 22 Jul 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/easily-build-llms-with-saturn-cloud/LLMs are creating ripples in today’s digital landscape due to their immense capabilities, from enhancing customer interactions and bridging language barriers to producing creative content and facilitating adaptive learning. However, they also come with their fair share of complexities. Given the sheer size of these models, working with these models is demanding, often exceeding standard workstations' capabilities. At Saturn Cloud, we’re providing every data science professional with the tools required to harness the potential of LLMs.Top 5 Platforms to Build AI Applications With NVIDIA NIMhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-5-platforms-to-build-ai-applications-with-nvidia-nim/Wed, 16 Jul 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-5-platforms-to-build-ai-applications-with-nvidia-nim/NVIDIA NIM helps simplify the challenges of building AI applications with industry-standard APIs and libraries in popular large language model (LLM) development frameworks, making integrating AI models into your application easy. Here’s a look at the top five platforms that have embedded NIM into their tool stack. 1. Saturn Cloud Saturn Cloud is a platform for data scientists and ML engineers who need a powerful, scalable, and easy-to-use cloud infrastructure.Top 5 Platforms to Run NVIDIA BioNeMohttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-5-platforms-to-run-nvidia-bionemo/Thu, 19 Jun 2025 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-5-platforms-to-run-nvidia-bionemo/NVIDIA’s BioNeMo framework serves researchers in their challenges across computational biology and drug discovery research and development. Teams can predict molecular properties, enhance protein engineering, and streamline the drug development process for faster and more accurate scientific discoveries. The table below displays the different models included in the BioNeMo framework featuring: DNABERT, OpenFold, MolMIM, EquiDock, DiffDock, ESM-2nv, ESM-1nv, ProtT5, MegaMolBART, and Geneformer. Learn how to integrate them into your own BioNeMo workflow here.Managing Cloud Cost for ML Teamshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/cloud-cost-ds-ml/Tue, 18 Jun 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/cloud-cost-ds-ml/Managing cloud costs is critical. ML is expensive. An AWS instance with H100 GPUs costs $98 per hour. If you forget to turn this machine off over the weekend you just wasted $5000 - that’s more than rent for most people. That doesn’t even include data storage and networking costs associated with these workloads. Your cloud cost will increase every single day as your company collects more data, and as you hire more data scientists and machine learning engineers.How To Write Unmaintainable Code (Naming)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-write-unmaintainable-code-naming/Thu, 13 Jun 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-write-unmaintainable-code-naming/Introduction Never ascribe to malice, that which can be explained by incompetence. - Napoleon In the interests of creating employment opportunities in Python and data science, I am passing on these tips from the masters on how to write code that is so difficult to maintain, that the people who come after you will take years to make even the simplest changes. Further, if you follow all these rules religiously, you will even guarantee yourself a lifetime of employment, since no one but you has a hope in hell of maintaining the code.GitHub Actions and Continuous Integration for Data Scientistshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/datas-science-continuous-integration/Fri, 07 Jun 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/datas-science-continuous-integration/Continuous Integration(CI) is pretty critical when working with code to ensure that Code is always consistently formatted Any automated testing is always passing Code is error free These checks provide a great quality of life improvement to any data scientist working with the codebase. These automated checks give team members confidence that is mostly error free and bug free. Automating continuous integration is critical because once tests aren’t passing, it can take alot of work to bring a repository back into compliance.Simple (and Ugly) Reporting with Jupyter Without Having to Learn Anything Newhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/reporting-with-jupyter-notebooks/Thu, 06 Jun 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/reporting-with-jupyter-notebooks/I’ve built a lot of dashboards in my life with Streamlit, Plotly Dash, Bokeh, Voila and Shiny. These tools all produce superior results, but I use them so infrequently that there is significant friction for me to re-learn how to use them. On the other hand I use Jupyter Notebooks at least weekly, and on a recent project I started thinking, well this Notebook is good enough. Can we just keep it up to date and hosted somewhere I can point people to?2024 Guide to MLOps Platforms & Toolshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/2024-guide-to-mlops-platforms-tools/Thu, 18 Apr 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/2024-guide-to-mlops-platforms-tools/Standardize and Streamline ML Workflows with MLOps Teams increasingly rely on ML models to automate and optimize their operations. Managing them can be challenging, especially as they become more complex and require more resources to train and deploy. This has led to the emergence of MLOps, as a way to standardize and streamline the ML workflow. MLOps emphasizes the need for continuous integration and continuous deployment (CI/CD) in the ML workflow, so that models are updated in real-time to reflect changes in data or ML algorithms.A Detailed Guide to Amazon SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-detailed-guide-to-amazon-sagemaker/Wed, 31 Jan 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-detailed-guide-to-amazon-sagemaker/In this blog post, we’re diving into the world of Amazon SageMaker, providing a detailed overview of all its components. Amazon SageMaker is a comprehensive machine learning service from Amazon Web Services (AWS), designed to cater to the needs of data scientists, developers, and businesses. Our goal is to clarify how each part of SageMaker functions and interacts, covering everything from data labeling with Ground Truth to the complexities of model training and deployment.A Detailed Guide to Amazon SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/a-detailed-guide-to-amazon-sagemaker/Wed, 31 Jan 2024 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/a-detailed-guide-to-amazon-sagemaker/In this blog post, we’re diving into the world of Amazon SageMaker, providing a detailed overview of all its components. Amazon SageMaker is a comprehensive machine learning service from Amazon Web Services (AWS), designed to cater to the needs of data scientists, developers, and businesses. Our goal is to clarify how each part of SageMaker functions and interacts, covering everything from data labeling with Ground Truth to the complexities of model training and deployment.10 Best Computational Biology Platforms 2024https://deploy-preview-1991--saturn-cloud.netlify.app/blog/10-best-computational-biology-platforms-2024/Tue, 19 Dec 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/10-best-computational-biology-platforms-2024/Computational biology is a fast and increasingly growing field, as are its challenges. Storing, organizing, and analyzing large amounts of data need to be accompanied by the right platforms and tools to support it. Some of the top fields of biology that are generating high amounts of data include genomics, proteomics, metabolomics, and transcriptomics. While some teams prefer to build their own analytics infrastructure, it is also worth looking at the landscape of existing platforms.How to Use Multiple GPUs in PyTorchhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-use-multiple-gpus-in-pytorch/Fri, 08 Dec 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-use-multiple-gpus-in-pytorch/Introduction to PyTorch? In the world of data science and software engineering, it’s a familiar scenario: you’re knee-deep in data - mountains of it - and you’re trying to build a complex model. You can feel your single GPU struggling under the workload, the clock ticking louder, and your deadlines creeping closer. Times like these call for a lifeline - that’s where harnessing the power of multiple GPUs comes in.7 of the Best Alternatives to Google Colab For 2024 (With Free Compute!)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/7-of-the-best-alternatives-to-google-colab-for-2023-with-free-compute/Sun, 05 Nov 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/7-of-the-best-alternatives-to-google-colab-for-2023-with-free-compute/Google Colab is a great tool for new and experienced data scientists. Free cloud compute with some of your favorite tools sounds pretty great. That said, Colab has had some drawbacks and if you found this article online, you may have experienced them. Whether it’s a sudden session restart and you lost your data, or you want to use Python packages that are not available in Colab, we’re here to help.How to Solve 'CUDA out of memory' in PyTorchhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-solve-cuda-out-of-memory-in-pytorch/Mon, 23 Oct 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-solve-cuda-out-of-memory-in-pytorch/If you’ve ever worked with large datasets in PyTorch, chances are you’ve encountered the dreaded ‘CUDA out of memory’ error. This error message occurs when your GPU runs out of memory while trying to allocate space for tensors in your PyTorch model. Out-of-memory errors can be frustrating, especially when you’ve spent much time fine-tuning your model and optimizing your code. 🚀 Streamline your model training with Saturn Cloud from monitoring memory to leveraging multiple GPUs.7 of the Best Alternatives to Domino Data Lab 2024 (With Free Compute)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/7-of-the-best-alternatives-to-domino-data-lab-2022-with-free-compute/Sun, 22 Oct 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/7-of-the-best-alternatives-to-domino-data-lab-2022-with-free-compute/Domino Data Lab is an enterprise MLOps platform for data scientists. It includes major analytics tools in Python and R, enabling scale and collaboration. That said, Domino Data Lab is known to be expensive and has had some drawbacks. You may be looking for alternatives if you find this article online. In this list, we went out to find several alternatives to Domino Data Lab. Some offer free tiers and enterprise plans, and others just a free trial.How to Use GPUs from a Docker Containerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-use-gpu-from-a-docker-container-a-guide-for-data-scientists-and-software-engineers/Thu, 05 Oct 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-use-gpu-from-a-docker-container-a-guide-for-data-scientists-and-software-engineers/Introduction to Docker In the rapidly evolving field of machine learning, one challenge consistently surfaces - reproducing environments for consistent model training and prediction. This is where Docker, an open-source platform, has proven transformative. Docker provides a streamlined process to “containerize” machine learning environments, eliminating the classic problem of inconsistent results due to varying dependencies, library versions, and system variables. Imagine creating a model that runs perfectly on your local machine, only not to behave as expected when deployed to other environments for testing or production.What is Assertion Error: Torch not compiled with CUDA enabled?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-assertionerror-torch-not-compiled-with-cuda-enabled/Tue, 26 Sep 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-assertionerror-torch-not-compiled-with-cuda-enabled/If you’re a data scientist or a software engineer working with deep learning frameworks, you may have received an error message stating, “AssertionError: Torch not compiled with CUDA enabled.” In programming, an assertion is a statement that a programmer confidently declares true. When this condition fails or doesn’t hold true, an AssertionError is triggered. In this context, the error message implies that the Torch framework was expected to be compiled with CUDA (Compute Unified Device Architecture) support, a crucial requirement for certain deep learning operations.Building a Question and Answering Bot with Llama, Vicuna, and semantic search with Berthttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/building-qa-chat-with-llama-vicuna-and-semantic-search-with-bert/Fri, 21 Jul 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/building-qa-chat-with-llama-vicuna-and-semantic-search-with-bert/In this blog post, we will walk you through the process of building a Question and Answering chatbot using Llama, Vicuna and Bert. This is similar to our previous blog post that was building a pure chatbot, however this application will search through a corpus of documents, which the language model will use as context for answers. We have a number of enterprise customers that are looking for easy ways to search and chat with proprietary research documents in their organization.Building a ChatBot with Llama, Vicuna, FastChat and Streamlithttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/building-a-chatbot-with-llama-vicuna-fastchat-streamlit/Mon, 12 Jun 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/building-a-chatbot-with-llama-vicuna-fastchat-streamlit/In this blog post, we will walk you through the process of building a chatbot using Llama, Vicuna and FastChat. Llama is a foundational large language model released by Meta. The Vicuna model was created by fine-tuning Llama on user-shared conversations collected from ShareGPT. FastChat is an open platform for training, serving, and evaluating large language model based chatbots. Note - the pypi package name for fastchat is fschat. There is also a fastchat package, which is unrelated.Evaluating Machine Translation Models: Traditional and Novel Approacheshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/evaluating-machine-translation-models-traditional-and-novel-approaches/Mon, 05 Jun 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/evaluating-machine-translation-models-traditional-and-novel-approaches/Introduction Machine translation has enormous potential to promote cross-cultural communication. Over the past 70 years, the quality of machine translation has evolved, culminating in systems currently powered by Neural Machine Translation. Improvements to these systems are rooted in the evaluation of system strengths and weaknesses. However, the complexity of language makes it difficult to create rules that separate high- and low-level language. A native speaker can naturally evaluate language quality. However, human evaluation of text is slow, expensive and difficult to reproduce.40 Jupyter Notebook Tips, Tricks, and Shortcuts for Data Sciencehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/40-jupyter-notebook-tips-tricks-and-shortcuts-for-data-science/Thu, 01 Jun 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/40-jupyter-notebook-tips-tricks-and-shortcuts-for-data-science/40 Jupyter Notebook Tips, Tricks, and Shortcuts for Data Science If you are a data scientist, you are probably well aware of Jupyter Notebook, a powerful tool for data analysis, visualization, and exploration. Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It supports over 40 programming languages, including Python, R, and Julia. In this blog post, we will share 40 Jupyter Notebook tips, tricks, and shortcuts that will help you to become more productive and efficient in your data science work.A Guide to MLOpshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-comprehensive-guide-to-mlops/Thu, 01 Jun 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-comprehensive-guide-to-mlops/Introduction ML models have grown significantly in recent years, and businesses increasingly rely on them to automate and optimize their operations. However, managing ML models can be challenging, especially as models become more complex and require more resources to train and deploy. This has led to the emergence of MLOps as a way to standardize and streamline the ML workflow. MLOps emphasizes the need for continuous integration and continuous deployment (CI/CD) in the ML workflow, ensuring that models are updated in real-time to reflect changes in data or ML algorithms.Will Large Language Models (LLMs) Transform the Future of Language?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/will-large-language-models-llms-transform-the-future-of-language/Thu, 18 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/will-large-language-models-llms-transform-the-future-of-language/Introduction It’s not often that a chatbot makes headlines, but Chat-GPT is no ordinary chatbot. With its ability to generate coherent and contextually appropriate responses, it quickly becomes a favorite of users worldwide (and a nightmare for the stockholders of Cheggs and teachers). Chat GPT is spearheaded by the private research company Open AI. The artificial intelligence lab in San Francisco was founded in 2015 as a non-profit organisation attempting to create “artificial general intelligence,” or AGI, which is essentially software that is as intelligent as humans.Breaking the Data Barrier: How Zero-Shot, One-Shot, and Few-Shot Learning are Transforming Machine Learninghttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/breaking-the-data-barrier-how-zero-shot-one-shot-and-few-shot-learning-are-transforming-machine-learning/Wed, 17 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/breaking-the-data-barrier-how-zero-shot-one-shot-and-few-shot-learning-are-transforming-machine-learning/Photo credit: Allison Saeng via Unsplash Introduction In today’s fast-changing world, technology is improving every day and Machine Learning and Artificial Intelligence have revolutionized a variety of industries with the power of process automation and improved efficiency. However, humans still have a distinct advantage over traditional machine learning algorithms because these algorithms require thousands of samples to respond to the underlying correlations and identify an object. Imagine the frustration of unlocking your smartphone using fingerprints or facial recognition by performing 100 scans just before the algorithm works.Tutorial: Understanding Jupyter Notebook Widgetshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/understanding-jupyter-notebook-widgets/Mon, 15 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/understanding-jupyter-notebook-widgets/Introduction: Jupyter Notebook Widgets are interactive elements that can be added to Jupyter notebooks to enhance the user experience by allowing users to manipulate and visualize data dynamically. Widgets provide a way to create interactive dashboards, data exploration tools, and user interfaces directly within Jupyter notebooks. In this tutorial, we will explore what Jupyter Notebook Widgets are, what they can be used for, how to use them, and provide an example with code.Getting Started with Jupyter Online: A Simple Guidehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-jupyter-online-a-simple-guide/Sat, 13 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-jupyter-online-a-simple-guide/Photo Credit: Christopher Burns via Unsplash What are Jupyter Notebooks? Jupyter notebooks have emerged as a leading tool in data science and programming, providing an interactive and user-friendly environment that combines code, text, images, and other multimedia elements in a single document. What is Jupyter Online? The power of Jupyter notebooks is further amplified when used online, making them accessible from anywhere and shareable with ease. Jupyter Online refers to Jupyter notebooks hosted on a cloud-based platform like Saturn Cloud, providing all the benefits of Jupyter notebooks without the need for local installation or maintenance.How to Write Data To Parquet With Pythonhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-write-data-to-parquet-with-python/Sat, 13 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-write-data-to-parquet-with-python/Photo credit: Google DeepMind via UnSplash Introduction Apache Parquet is a language-agnostic, open-source file format that was built to handle flat columnar storage data formats. Parquet operates well with complex data in large volumes. It is known for its both performant data compression and its ability to handle a wide variety of encoding types. Parquet files are highly compatible with OLAP systems and provide an efficient way to store and access data hence they are very useful for big data processingParsing Data with ChatGPThttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/parsing-data-with-chatgpt/Mon, 08 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/parsing-data-with-chatgpt/Photo credit: Ilgmyzin on Unsplash In today’s rapidly changing world, artificial intelligence (AI) has become increasingly important in various industries. An interesting development in AI is the creation of language models, such as ChatGPT, capable of parsing and understanding natural language data. In this article, we will talk about data parsing, how ChatGPT parses data, and examine its advantages and limitations as a tool for data parsing. We will also look at some of the applications of ChatGPT’s data parsing capabilities.Access Denied Issues with Cross Account S3 Bucketshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/fixing-s3-object-ownership-cross-account-access-denied/Fri, 05 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/fixing-s3-object-ownership-cross-account-access-denied/Cross account S3 buckets are pretty important for large organizations, however if managed improperly can cause permissions nightmares. AWS now recommends that S3 buckets be created with ACLs (access control lists) disabled, however older S3 buckets may still have this setting enabled. with ACLs, each object in the S3 bucket can have a different owner (AWS account). If another AWS account is writing to your bucket, you may not have permissions to modify or delete the objects, even if you own the bucket.Setting Up Kubeflow on AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-kubeflow-on-aws/Thu, 04 May 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-kubeflow-on-aws/Photo credit: Uriel SC on Unsplash Introduction Kubeflow is an open-source platform designed to manage the lifecycle of machine learning (ML) models in a Kubernetes environment. Kubeflow enables the creation, deployment, and management of machine learning workloads in a scalable and efficient way. Data science and machine learning (ML) are rapidly growing fields. In recent years, there has been a significant increase in companies that use machine learning to generate all sorts of intelligence.How to Setup AWS Batchhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-aws-batch/Thu, 27 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-aws-batch/Photo Credit: Unsplash (Shubham Dhage) Introduction In today’s world of technology, running large-scale computing workloads can be challenging and time-consuming. The demand for compute-intensive workloads increases daily, and organizations constantly look for efficient ways to process and manage their batch jobs. Batch processing refers to a series of programs that IT teams typically defined through scripts, command lines, or a programming language that execute without human intervention, making sequencing and scheduling those programs especially important.Fine Tuning OpenAI’s GPT3 Modelhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/fine-tuning-openais-gpt3-model/Wed, 26 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/fine-tuning-openais-gpt3-model/Photo credit: Jonathan Kemper on Unsplash Introduction OpenAI’s GPT-3 models have taken the world by storm with their impressive capabilities straight out of the box. However, it’s important to note that these models are primarily generalized and may require additional training to excel in specific use cases. While the base models perform well with well-crafted prompts and contextual examples, fine-tuning can eliminate the need for providing such examples and produce even better results.A Simple Guide to Jupyter Notebook Extensionshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-simple-guide-to-jupyter-notebook-extension/Tue, 25 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-simple-guide-to-jupyter-notebook-extension/Jupyter Notebook is a popular open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It has become an indispensable tool for data scientists, researchers, and educators in various fields. One of the main advantages of Jupyter Notebook is its extensibility, which allows users to customize and enhance their workflow with various extensions. Jupyter Notebook extensions are software components that add new functionality to the Jupyter Notebook interface.The Impact of Transfer Learning and Fine-Tuning on ChatGPT's Conversational Performancehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/the-impact-of-transfer-learning-and-fine-tuning-on-chatgpts-conversational-performance/Mon, 24 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/the-impact-of-transfer-learning-and-fine-tuning-on-chatgpts-conversational-performance/Source: Freesoundslibraries ChatGPT became one of the most popular NLP products between December 2022 and February 2023, adding about 100 million users globally and generating 1 billion monthly visitors. That explosion was partly caused by its conversational interface, which appeals to many people with varied levels of AI competence. It is simple to use, improves most people’s productivity, and embodies everything that makes a great product. In case you missed it—ChatGPT is a conversational AI product powered by state-of-the-art large language models (LLMs) that have revolutionized the field of natural language processing (NLP).A Quick Guide to Exploratory Data Analysis Using Jupyter Notebookhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-quick-guide-to-exploratory-data-analysis-using-jupyter-notebook/Tue, 18 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-quick-guide-to-exploratory-data-analysis-using-jupyter-notebook/Getting started First, we need to install the necessary packages. We will use Pandas for data manipulation, Matplotlib for data visualization, and Scikit-Learn for machine learning. To install these packages, we can use the following command in the terminal: pip install pandas matplotlib scikit-learn Let’s start the jupyter notebook on your terminal jupyter notebook After installing the necessary packages, we can create a new Jupyter Notebook and import the packages in the first code cell.Apache Airflow for Data Science: How to Setup Airflowhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/apache-airflow-for-data-science-how-to-setup-airflow/Tue, 18 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/apache-airflow-for-data-science-how-to-setup-airflow/Introduction Have you ever found yourself juggling a complex web of tasks and dependencies, struggling to keep track of everything to ensure everything runs smoothly? Apache Airflow, an open-source platform, will relieve stress when developing, scheduling, and monitoring workflows. Due to its simplicity and extensibility, it has gained popularity, especially in pipeline orchestration in the Python ecosystem. Whether you’re a data scientist, engineer, or analyst, Airflow can revolutionize your work.A Comprehensive Guide to Jupyter Notebookhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-comprehensive-guide-to-jupyternotebook/Thu, 13 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-comprehensive-guide-to-jupyternotebook/Introduction to Jupyter Notebook Jupyter Notebook has become a popular tool among data scientists and analysts for its flexibility, ease of use, and ability to combine code, data, and documentation in a single document. In this comprehensive tutorial, we’ll cover everything you need to know to get started with Jupyter Notebook, from installation to advanced features. Looking for an easy solution for cloud-based Jupyter Notebooks? Saturn Cloud offers seamless collaboration with cloud-based Jupyter notebooks designed for smooth teamwork and high-performance computing.How To Fine-Tune GPT-3 For Custom Intent Classificationhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-fine-tune-gpt-3-for-custom-intent-classification/Wed, 12 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-fine-tune-gpt-3-for-custom-intent-classification/Photo Credit: Andrew Neel on Unsplash Introduction OpenAI’s GPT-3 is a state-of-the-art language model that has made groundbreaking strides in natural language processing (NLP). It can generate human-like text that is coherent, contextually appropriate, and grammatically accurate. While GPT-3 is an incredibly versatile tool, fine-tuning the model for specific tasks can significantly improve its performance. In this comprehensive blog post, we will delve deep into the process of fine-tuning GPT-3 for custom intent classification – a crucial component in developing intelligent chatbots and voice assistants.How to Set up a Dask Clusterhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-a-dask-cluster/Sat, 08 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-a-dask-cluster/Photo Credit: JJ YIng on Unsplash Outline Introduction Clusters, Parallel & Distributed Computing Dask + What makes Dask a unique parallel computing product Different ways to set up the cluster and then use the command line in this tutorial Requirements for setup (if any) Setting up the cluster with Cloud VM Spin-Up the VMs Enable Communication between the VMs Port forward SSH into VM Install Dask and other dependencies on VMs Use Dask to spin up parallel computing within the cluster Introduction Complex and expensive computing tasks, like big data analytics, machine learning, and deep learning algorithms, require advanced techniques and resources to handle their computation efficiently.How to Install PyTorch on the GPU with Dockerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-install-pytorch-on-the-gpu-with-docker/Fri, 07 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-install-pytorch-on-the-gpu-with-docker/Introduction PyTorch is one of the popular open-source deep-learning frameworks in Python that provides efficient tensor computation on both CPUs and GPUs. PyTorch is also available in the R language, and the R package torch lets you use Torch from R in a way that has similar functionality to PyTorch in Python while still maintaining the feel of R. PyTorch allows developers to build complex deep-learning models easily and quickly. Using PyTorch with GPU can significantly speed up the training and inference process, especially for large-scale models.What are Large Language Models and How Do They Work?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-are-large-language-models-and-how-do-they-work/Fri, 07 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-are-large-language-models-and-how-do-they-work/Credit: Fabio (Unsplash) What are Large Language Models? Large language models are a type of artificial intelligence (AI) model designed to understand, generate, and manipulate natural language. These models are trained on vast amounts of text data to learn the patterns, grammar, and semantics of human language. They leverage deep learning techniques, such as neural networks, to process and analyze the textual information. The primary purpose of large language models is to perform various natural language processing (NLP) tasks, such as text classification, sentiment analysis, machine translation, summarization, question-answering, and content generation.Document Segmentation using BERThttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/document-segmentation-using-bert/Tue, 04 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/document-segmentation-using-bert/Introduction Documents sometimes are distributed under scanned versions and usually combined together into one merge doc. To retrieve the specific document, the user needs to locate the required document inside the concatenated PDF file, and it is not an easy task. The concatenation contains at least more than a hundred pages of documents, and they have a different structure or number of pages in each document. To un-concatenated complex documents is still a challenging topic, there are multiple reasons behind this that the concatenated documents contain multiple documents, and each of these documents may be multi-themed and different in document length, and structure.What is K-Means Clustering and How Does its Algorithm Work?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-k-means-clustering-and-how-does-its-algorithm-work/Tue, 04 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-k-means-clustering-and-how-does-its-algorithm-work/Photo Credit: Maria on Unsplash Introduction Fundamentally, there are four types of machine learning algorithms; supervised algorithms, semi-supervised algorithms, unsupervised algorithms and reinforcement learning algorithms. Supervised algorithms are those that work on data that has labels. Semi-supervised is where part of the data is labeled and another part is not. Unsupervised is where the data doesn’t have labels. Reinforcement learning is a type of machine learning where we have an agent that works towards a certain goal and does it through trial and error.Deep Learning with Rhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/deep-learning-with-r/Mon, 03 Apr 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deep-learning-with-r/Credit: DeepMind Introduction Who hasn’t been amused by technological advancements particularly in the artificial intelligence, from Alexa to Tesla self-driving cars and a myriad other innovations. I marvel at the advancements every other daym but what’s even more interesting is, when you get to have an idea of what underpins those innovations. Welcome to Artificial Intelligence and to the endless possibilities of deep learning. If you’ve been wondering what it is, then you’re home.Layer Pruning for Transformer Modelshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/layer-pruning-for-transformer-models/Fri, 31 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/layer-pruning-for-transformer-models/Photo credit: ANIRUDH on Unsplash Table of Content Introduction Transformer and various ways to shrink transformer model What is pruning Types of pruning and its use cases Let’s prune a pre-train model to reduce the inference latency Introduction: As the demand and applications for large language model become popular and impressive, the development of models becomes important as well. One of the most notable advancements in recent years in the machine learning ecosystem is the Transformer models, which have set new performance benchmarks across various NLP tasks like translation, chatbot (ChatGPT, dialogflow etc), classification and computer vision.PyTorch DataLoader: Features, Benefits, and How to Use ithttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pytorch-dataloader-features-benefits-and-how-to-use-it/Fri, 31 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pytorch-dataloader-features-benefits-and-how-to-use-it/Photo credit: Maxim Berg Introduction: Deep learning models require large amounts of data to train effectively. In many cases, the data is stored in separate files, databases, or other external sources that need to be preprocessed and loaded into memory before training. PyTorch is a popular open-source machine learning library that is widely used in research and industry. It is known for its ease of use, flexibility, and speed, making it a popular choice for building deep learning models.What Are Foundation Models and How Do They Work?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-are-foundation-models-and-how-do-they-work/Thu, 30 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-are-foundation-models-and-how-do-they-work/Credit: Deepmind (Unsplash) What Are Foundation Models? Foundation models are pre-trained machine learning models built on vast amounts of data. They is a ground-breaking development in the world of artificial intelligence (AI). They serve as the base for various AI applications, thanks to their ability to learn from vast amounts of data and adapt to a wide range of tasks. These models are pre-trained on enormous datasets and can be fine-tuned to perform specific tasks, making them highly versatile and efficient.What is Generative AI and How Does it Work?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-generative-ai-and-how-does-it-work/Thu, 30 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-generative-ai-and-how-does-it-work/Credit: Midjourney What Is Generative AI Generative AI refers to a class of artificial intelligence algorithms that can generate new and unique data, rather than simply making decisions based on existing data. It is a rapidly growing field within artificial intelligence, focusing on creating new data that mimics the underlying patterns and structures of existing data. How Does Generative AI Work Generative AI works by using deep learning models to generate new and original content, such as text, images, or music, based on patterns and insights learned from training data.Authenticate Box on JupyterHub on Kuberneteshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/authenticate-box-on-jupyterhub-on-kubernetes/Thu, 23 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/authenticate-box-on-jupyterhub-on-kubernetes/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Introduction Imagine a data science team at a large beverage company that is working on a new project that involves machine maintenance prediction. The team includes researchers, data analysts, and machine learning engineers who are collaborating on a large dataset of logs from a machine.How to Install Tensorflow on the GPU with Dockerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-install-tensorflow-on-the-gpu-with-docker/Mon, 20 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-install-tensorflow-on-the-gpu-with-docker/This tutorial will discuss setting up Tensorflow on GPUs with docker. Introduction The pace at which deep learning has risen is speedy and spectacular. It has led to significant innovations and several new research and training methods. An example is the popular deep learning library used to build and construct models to find solutions to numerous tasks, i.e., Tensorflow. It is regarded as one of the best libraries which can solve almost any question related to deep learning and neural networks.Getting Started with Metaflowhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-metaflow/Thu, 16 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-metaflow/Introduction Meet Emily, a data scientist at a healthcare startup. Emily realized she spends significant time on non-coding tasks such as setting up infrastructure, managing dependencies, and keeping track of experiments. Emily needed a tool that would help her streamline her workflow and allow her to focus on actual data science and analysis. That’s when she discovered Metaflow, an open-source framework for building and managing data science workflows. With Metaflow, Emily created reusable and reproducible workflows that automated much of the drudgery associated with data science.Getting Started With Ray Clustershttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-ray-clusters/Mon, 06 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-ray-clusters/Ray is a framework for developing and running parallel and distributed applications emphasizing ML tasks. Ray enables users to harness the power of distributed computing without much effort. Introduction Meet Alice, a machine learning engineer at a large tech company. Alice works on developing and training deep learning models to solve complex problems, often requiring many computational resources to train. When working on a single machine, the training process could take weeks on the strongest workstations.How to Set Up Luigihttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-luigi/Mon, 06 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-luigi/Introduction Automating different workflows is necessary with most projects and processes. For example, getting data from one point to another (ETL/ELTs), running machine learning models, or general workflow automation. In this article, we will setup Luigi: a tool that can automate workflows and much more. We will use it to orchestrate downloading a CSV file, transforming it, and warehousing it. What is Luigi? Luigi is an open-source Python package for building complex and long-running data pipelines and scheduling and monitoring tasks or batch jobs.Deploying Bokeh Applicationshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-bokeh-applications/Thu, 02 Mar 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-bokeh-applications/Introduction Bokeh is a powerful and versatile Python library for creating interactive data visualizations and dashboards in a web browser. With Bokeh, you can quickly and easily build a variety of visualizations, ranging from simple line charts to complex, multi-layered plots. One of the key strengths of Bokeh is its ability to create interactive visualizations that allow users to explore and interact with the data in real-time. This makes it an ideal tool for building interactive dashboards, which can provide a dynamic, responsive interface for exploring complex datasets.Github Action + ECR + Optimizing Disk Spacehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/github-action-ecr-optimizing-disk-space/Fri, 24 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/github-action-ecr-optimizing-disk-space/Table of Contents Introduction Github Action AWS ECR(Elastic Container Registry) How ECR(Elastic Container Registry) Works Build simple Github actions to build and push docker container image to ECR(Elastic Container Registry) Handling or maximizing Github actions runner resources Introduction: When building a data-intensive application, setting up Github Actions workflow for it can be challenging especially when dealing with issues such as Github runner running out of disk spaceHow to Authenticate With Google Drive From JupyterHubhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-authenticate-with-google-drive-from-jupyterhub/Wed, 22 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-authenticate-with-google-drive-from-jupyterhub/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Table of content Google Drive Create a Google Drive account Service Account Authenticate Google Drive from Jupyterhub Create a service account Authorize access to a Google Drive Folder using Service Account Deploy Jupyterhub on AWS EKS Store your service account credentials using Kubernetes Secrets and configure your Kubernetes yaml file to obtain the secret on every pod Connect to Google Drive folder from Jupyterhub using Google client library for Python Resources Introduction: Google Drive is a popular and free cloud-based storage provided by Google that allows users store, manage, organize, sync and access their files online.How to Authenticate With BigQuery From JupyterHubhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-authenticate-with-bigquery-from-jupyterhub/Thu, 16 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-authenticate-with-bigquery-from-jupyterhub/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Table of content BigQuery Service Account Create a BigQuery account and a service account Connect Bigquery with Jupyterhub using Python Client SDK Conclusion Introduction If you’re working with a lot of data, it can be challenging to manage and make sense of all of it.Introduction to Docker for Data Scientistshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/introduction-to-docker-for-data-scientists/Tue, 14 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/introduction-to-docker-for-data-scientists/Introduction As a data scientist, have you ever struggled with reproducing the results of your experiments and projects? Whether it’s due to differences in library versions or system configurations, the challenges of reproducibility can be a frustrating and time-consuming obstacle. Imagine a scenario where you’ve developed your code in a Python 3.6 environment, but the production server is configured with Python 3.10. As the open-source world is constantly evolving, there may be functions that have been deprecated in newer versions of libraries.Deploying Plotly Applicationshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-plotly-applications/Tue, 07 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-plotly-applications/Introduction Plotly is a data visualization library for creating interactive, web-based graphs and charts. It supports a wide variety of chart types: line graphs, scatter plots, bar charts, pie charts, and more. Plotly allows you to easily customize your visualizations. Plotly also allows for real-time updates to the data and provides the ability to zoom, pan, and hover over the data points for deeper analysis. Some of the key advantage of using Plotly over conventional visualization tools like matplotlib and seaborn are:Using AWS SageMaker Input Modes: Amazon S3, EFS, or FSxhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/using-aws-sagemaker-input-modes-amazon-s3-efs-or-fsx/Sun, 05 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/using-aws-sagemaker-input-modes-amazon-s3-efs-or-fsx/In this blog post, we discuss how to use AWS SageMaker Input modes for Amazon S3 and file systems in Amazon EFS and Amazon FSx for Lustre. Introduction One persistent issue with ML training is the ease and flexibility of reading training data in a performant manner. With various effective, high-throughput data ingestion mechanisms known as data sources and their corresponding input modes, AWS SageMaker makes the process of ingesting data simpler.Using AWS SageMaker Input Modes: Amazon S3, EFS, or FSxhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/using-aws-sagemaker-input-modes-amazon-s3-efs-or-fsx/Sun, 05 Feb 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/using-aws-sagemaker-input-modes-amazon-s3-efs-or-fsx/In this blog post, we discuss how to use AWS SageMaker Input modes for Amazon S3 and file systems in Amazon EFS and Amazon FSx for Lustre. Introduction One persistent issue with ML training is the ease and flexibility of reading training data in a performant manner. With various effective, high-throughput data ingestion mechanisms known as data sources and their corresponding input modes, AWS SageMaker makes the process of ingesting data simpler.How to Work with Custom S3 Buckets and AWS SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-work-with-custom-s3-buckets-and-aws-sagemaker/Wed, 25 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-work-with-custom-s3-buckets-and-aws-sagemaker/Introduction AWS Sagemaker is a managed service in the AWS public cloud. It’s used to create, train, and deploy machine learning models, but it’s also great for doing exploratory data analysis and prototyping. One of the advantages of working with AWS Sagemaker is it provides a more convenient way to store your data privately in S3 Bucket which could contain any type of data like, csv, pickle, zip or photos and videos.How to Work with Custom S3 Buckets and AWS SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-work-with-custom-s3-buckets-and-aws-sagemaker/Wed, 25 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-work-with-custom-s3-buckets-and-aws-sagemaker/Introduction AWS Sagemaker is a managed service in the AWS public cloud. It’s used to create, train, and deploy machine learning models, but it’s also great for doing exploratory data analysis and prototyping. One of the advantages of working with AWS Sagemaker is it provides a more convenient way to store your data privately in S3 Bucket which could contain any type of data like, csv, pickle, zip or photos and videos.Getting Started With MLFlowhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-mlflow/Mon, 23 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/getting-started-with-mlflow/Introduction Three major components can define the machine learning (ML) lifecycle at an abstract level: data preparation, model building, and model deployment. However, even these three steps involve enormous collaboration across multiple teams. These teams can include data engineers, data scientists, machine learning engineers, and software engineers. Depending on the organization and the specific project, stakeholders may be involved, such as product managers, business analysts, and domain experts. With different contributors specializing in specific areas of the pipeline, handoffs and transitions to each stage of the cycle might not be as streamlined as teams would wish.How to Set up Snowflake on JupyterHubhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-snowflake-on-jupyterhub/Fri, 20 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-snowflake-on-jupyterhub/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Table of content JupyterHub Snowflake Key pair authentication create a snowflake account and use key pair auth Deploy JupyterHub on AWS EKS Connect Snowflake with JupyterHub ConclusionDeploying FastAPI Applicationshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-fastapi-applications/Wed, 18 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-fastapi-applications/Introduction FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. It is built on top of Starlette for the web parts and Pydantic for the data parts. With FastAPI, you can start building your API quickly and easily with minimal setup. It provides a simple and easy-to-use interface for routing, validation, and documentation of your API’s endpoints. Some of the key features of FastAPI includeHow to Build Custom Docker Images For AWS SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-build-custom-docker-images-for-aws-sagemaker/Tue, 17 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-build-custom-docker-images-for-aws-sagemaker/In this post, we show how to create a custom Docker container image for AWS SageMaker. Introduction The adoption of containers for cloud-native apps and Docker usage in particular has grown extremely robust. According to the Docker Index report in 2020, there were 8 billion pulls in the month of November which was an increase from 5.5 billion a month the previous year. But July left that number in the dust, with 11 billion pulls.How to Build Custom Docker Images For AWS SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-build-custom-docker-images-for-aws-sagemaker/Tue, 17 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-build-custom-docker-images-for-aws-sagemaker/In this post, we show how to create a custom Docker container image for AWS SageMaker. Introduction The adoption of containers for cloud-native apps and Docker usage in particular has grown extremely robust. According to the Docker Index report in 2020, there were 8 billion pulls in the month of November which was an increase from 5.5 billion a month the previous year. But July left that number in the dust, with 11 billion pulls.Top 33 JupyterLab Extensions 2023https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-33-jupyterlab-extensions-2024/Tue, 10 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-33-jupyterlab-extensions-2024/JupyterLab is a browser-based interactive development environment (IDE) for notebooks, code, and data maintained by Project Jupyter. It supports Julia, Python, R as well as Matlab, Scala and many more programming languages. In this article, we will share the top JupyterLab extensions in our 2023 survey of the landscape. To shortcut and turbocharge your JupyterLab capabilities, get started on Saturn Cloud’s robust and customizable platform and jump into hosted Jupyter notebooks right away.How to Work With Pycharm and AWS SageMaker Using AWS SageMaker Python SDKhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-work-with-pycharm-and-aws-sagemaker-using-aws-sagemaker-python-sdk/Fri, 06 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-work-with-pycharm-and-aws-sagemaker-using-aws-sagemaker-python-sdk/In this blog, we are going to discuss how to make use of AWS SageMaker services locally on PyCharm using the AWS SageMaker Python SDK. Amazon SageMaker, which is a fully managed ML service, has made it easier for organizations to put their ML ideas into production faster and it has improved the productivity of data science teams to a greater height. Many teams are able to easily and quickly train models, tune the models for better results, and deploy the models to production-ready environments.How to Work With Pycharm and AWS SageMaker Using AWS SageMaker Python SDKhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-work-with-pycharm-and-aws-sagemaker-using-aws-sagemaker-python-sdk/Fri, 06 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-work-with-pycharm-and-aws-sagemaker-using-aws-sagemaker-python-sdk/In this blog, we are going to discuss how to make use of AWS SageMaker services locally on PyCharm using the AWS SageMaker Python SDK. Amazon SageMaker, which is a fully managed ML service, has made it easier for organizations to put their ML ideas into production faster and it has improved the productivity of data science teams to a greater height. Many teams are able to easily and quickly train models, tune the models for better results, and deploy the models to production-ready environments.8 Popular Alternatives to JupyterHub 2023https://deploy-preview-1991--saturn-cloud.netlify.app/blog/8-popular-alternatives-to-jupyterhub-2024/Wed, 04 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/8-popular-alternatives-to-jupyterhub-2024/JupyterHub is an open-source solution that serves as a platform for data science and machine learning teams. There are many ways to set up JupyterHub for your team, depending on your security and customization needs. But this can lead to a lot of engineering headache, which may not interest you if you are trying to spin up quickly and securely. In this list, we’ll share the best alternatives to setting up JupyterHub.How to Set up JupyterHub Authentication with Oktahttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-jupyterhub-authentication-with-okta/Sun, 01 Jan 2023 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-jupyterhub-authentication-with-okta/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Introduction With the innovation in cloud technologies, more organizations are migrating to using more apps in their workspace. Some of these applications require user name and password. Usernames and passwords are the primary targets of cyber attacks in most organisations.Install Jupyterhub in a VPN with AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/install-jupyterhub-in-a-vpn-with-aws/Thu, 29 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/install-jupyterhub-in-a-vpn-with-aws/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Introduction AWS VPN Install jupyterhub in a VPN with AWS Introduction It is essential for every data team to protect their information, data and code from unauthorized access and other malicious threats.Setting up S3 Buckets For Data Sciencehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-s3-buckets-for-data-science/Mon, 26 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-s3-buckets-for-data-science/Introduction Amazon Simple Storage Service, also known as Amazon S3, is an object storage service that offers industry-leading scalability, data availability, security, and performance. Amazon S3 is so widely used that companies like Netflix store all of their video content to stream to the masses. Even Dropbox has built its entire cloud storage infrastructure on top of S3! In S3, files are stored in buckets, similar to how your files are stored on your computer in folders.How to Set up AWS SageMaker for Multiple Usershttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-aws-sagemaker-for-multiple-users/Fri, 23 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-set-up-aws-sagemaker-for-multiple-users/Introduction Amazon SageMaker is a fully managed service that provides every machine learning (ML) developer and data scientist with the ability to build, train, and deploy ML models quickly. Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. Amazon SageMaker Studio provides all the tools you need to take your models from experimentation to production while boosting your productivity.How to Set up AWS SageMaker for Multiple Usershttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-set-up-aws-sagemaker-for-multiple-users/Fri, 23 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/how-to-set-up-aws-sagemaker-for-multiple-users/Introduction Amazon SageMaker is a fully managed service that provides every machine learning (ML) developer and data scientist with the ability to build, train, and deploy ML models quickly. Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. Amazon SageMaker Studio provides all the tools you need to take your models from experimentation to production while boosting your productivity.How to change column type in Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-change-column-type/Tue, 20 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-change-column-type/Changing a column’s data type is often a necessary step in the data cleaning process. There are several options for changing types in pandas - which to use depends on your data and what you want to accomplish. as_numeric() and astype() To convert one or more columns to numeric values, pandas.to_numeric() is often a good option. This function will attempt to convert non-numeric values, such as strings, to either float64 or int64 depending on the input data.How to reorder columns in Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-reorder-columns/Tue, 20 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-reorder-columns/There are several ways to change the order of columns in a pandas DataFrame; which to choose will depend on the size of your dataset and the transformation you want to perform. If you have a relatively small dataset and/or need to specify a custom column order, you can simply reassign columns in the order you want them (note the double brackets): import pandas as pd data = pd.How to drop Pandas DataFrame rows with NAs in a specific columnhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-drop-na-rows-in-pandas/Mon, 19 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-drop-na-rows-in-pandas/During the data cleaning process, you may find that you need to discard rows from your pandas DataFrame based on whether or not they have NA values in a certain column. While this task is slightly more complex than dropping rows containing any NA values, there are some quick and easy ways to go about it. The first is to manually subset your DataFrame, keeping only rows where your column of interest contains non-null values using DataFrame.How to get a list of column names from a Pandas DataFramehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-list-of-column-headers/Mon, 19 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-list-of-column-headers/Pandas makes it easy to obtain a list of column names from a DataFrame. For an extremely concise solution, you can simply call list() on your DataFrame object, which will return a list of header names: import pandas as pd data = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [10, 20, 30, 40, 50]}) list(data) There are also two built-in tolist() methods for Index objects. If performance is a priority, the first method listed below is faster than the second, but either works:Running Julia on Dask with Saturn Runhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/julia-on-dask/Tue, 13 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/julia-on-dask/This article goes over dispatching parallel Julia code over dask clusters using Saturn Run. We use a toy example (fibonacci computation), but the approaches here can generalize to most real world problems. Fibonacci with Julia To start, I used Chat GPT to figure out how to write a Julia CLI for computing fibonacci. If you’re a Julia programmer you can do this from scratch. I am not, but I aspire to be one day.How to count rows in Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-count-rows/Mon, 12 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-count-rows/You can count the number of rows in a pandas DataFrame using len() or DataFrame.shape. Here’s a quick example: import pandas as pd data = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [10, 20, 30, 40, 50]}) #three different ways to count rows len(data) len(data.index) data.shape[0] All three commands above return a row count. If you’re looking to shave milliseconds off of your computation time, len(data.index) is the fastest of the three, but the difference is negligible in most cases as all are constant time operations.How to delete a Pandas DataFrame columnhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-delete-column/Wed, 07 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-delete-column/For getting, setting, and deleting columns from a pandas DataFrame, you can treat a DataFrame like a dict-like collection of Series objects. So, it’s possible to delete columns using the familiar del and pop operations, just like a regular Python dictionary. Note that both modify the DataFrame in-place. You can use del to delete a column by name: import pandas as pd data = pd.DataFrame({'one': [1, 2, 3], 'two': [10, 20, 30], 'three': [100, 200, 300]}) del data["two"] data Or, if you want to return the deleted column, use pop:How to rename DataFrame columns in Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-renaming-columns/Tue, 06 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-renaming-columns/Whether you’re changing the name of one or several columns or completely reassigning your DataFrame’s header, renaming columns in pandas is very simple. Renaming specific columns can be easily accomplished with the built-in rename() method. This method takes a dictionary of columns to be renamed and their new names, in the format old: new. Remember to specify that you want to change column names with the axis argument, as this method can also be used to rename rows.Using SSH with AWS SageMaker and Ngrokhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/using-ssh-with-aws-sagemaker-and-ngrok/Tue, 06 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/using-ssh-with-aws-sagemaker-and-ngrok/Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don’t have to manage servers. (Sagemaker documentation) All SageMaker content is organised under Sagemaker studio.Using SSH with AWS SageMaker and Ngrokhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/using-ssh-with-aws-sagemaker-and-ngrok/Tue, 06 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/using-ssh-with-aws-sagemaker-and-ngrok/Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don’t have to manage servers. (Sagemaker documentation) All SageMaker content is organised under Sagemaker studio.Visual Studio Code for Data Sciencehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/visual-studio-code-for-data-science/Tue, 06 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/visual-studio-code-for-data-science/Not long ago, Data Science developers used Integrated Development Environments (IDEs) such as JupyterLab or Spyder to write small-scale prototypes for gathering and cleaning data, as well as build and train their Machine Learning (ML) models, and switched to other IDEs like Sublime to write the automated script in a Python file, and then switched to the terminal again to run the Python file in a daemon process for large projects.How to select rows by column value in Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-select-rows-by-column-value/Mon, 05 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-select-rows-by-column-value/If you need to select rows in your pandas DataFrame based on the value of one or more columns, you’re in luck - there are several methods for accessing the data you need. Which method to use depends on performance considerations and how your data is set up. One straightforward method is boolean indexing. Pandas.DataFrame.loc allows you to simply select rows by value: import pandas as pd data = pd.DataFrame({'Color': 'Tabby Black Calico Tabby Tabby Black'.How to Set up JupyterHub Authentication with Azure Active Directory(AD)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/jupyterhub-and-azure-ad/Thu, 01 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/jupyterhub-and-azure-ad/Introduction As an ML engineer or Data Scientist, you should be familiar with JupyterHub. In as much as many organizations or Data Science teams prefer to host their JupyterHub production environment on their on-premise or cloud server, there is also a chance of it being comprised when authentication is one set of user credentials(User name and password). SSO(Single sign-on) plays a significant role in helping the team increase the security layer for their code and data.How to Set up JupyterHub on Azurehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-jupyterhub-on-azure/Thu, 01 Dec 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-jupyterhub-on-azure/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Introduction JupyterHub is an Open-Source solution to provide access to computational environments without having users actively manage DevOps challenges. System Administrators can customize and manage JupyterHub to provide isolated or shared resources to data science teams. Not only is it scalable and customizable but also provides the option to improve privacy, by providing users with their own workspaces.How to iterate over rows in Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-iterate-over-rows/Wed, 30 Nov 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pandas-tips-iterate-over-rows/Whether you’re a veteran data scientist or trying out the Python package pandas for the first time, chances are good that at some point you’ll need to access elements in your data frame by row. Luckily, Pandas provides the built-in iterators DataFrame.iterrows and DataFrame.itertuples to help you achieve just that. iterrows() allows you to iterate over rows as (index, Series) pairs, while itertuples() allows you to iterate over rows as namedtuples.Setting up HTTPS and SSL for JupyterHubhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/securing-jupyterhub/Tue, 22 Nov 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/securing-jupyterhub/JupyterHub provides a shared computational environment for Data Science teams and other groups of users, allowing for customized collaboration that scales for big data. Importantly, it also allows for a single place to implement security protocols. In this post, we will go over some basic measures you can take to secure your JupyterHub deployments. In our previous blog post on JupyterHub, we walked through the basic deployment steps for The Littlest JupyterHub (TLJH) and Zero-to-JupyterHub (ZTJH).The Busy Data Scientist's Guide to Data Science Resourceshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/the-busy-data-scientists-guide-to-data-science-resources-2022/Fri, 04 Nov 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/the-busy-data-scientists-guide-to-data-science-resources-2022/There are plenty of places to start when building your list of data science resources – but you’re a busy data scientist. We’ve collected a handful of resources for different needs, all serving the purpose of making your work easier and more productive. Here is a reference guide to the top resources you need to know about, organizing into a few lists that meet a variety of needs. Machine Learning and Deep Learning Tools Tensorflow PyTorch XGBoost LightGBM scikit-learn Free & Enterprise Data Science and Compute Platforms Saturn Cloud Domino RStudio And small shoutout to AWS SageMaker, AzureML, Google Vertex Data Visualizations Tools Bokeh Plotly D3.Using JupyterHub with a Private Container Registryhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/using-jupyterhub-with-a-private-container-registry/Fri, 04 Nov 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/using-jupyterhub-with-a-private-container-registry/In our previous blog post on JupyterHub, we walked through the basic deployment steps for The Littlest JupyterHub (TLJH) and Zero-to-JupyterHub (ZTJH). Our recommendation for anyone looking to deploy JupyterHub as a Data Science platform in production was to use ZTJH. We’ll assume you’re using that for this blog post. Once you have Zero-JupyterHub up and running, security is the top priority. You should feel confident that your data science platform is safe and that your users can access it easily.Top 10 Data Science Platforms And Their Customer Reviews 2022https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-10-data-science-platforms-and-what-their-customers-say-2022/Wed, 02 Nov 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-10-data-science-platforms-and-what-their-customers-say-2022/Data science platforms are software products that enable data science teams to run code, train models, and deploy APIs, and can replace a data scientist having to manually set up their programming environment themselves. In this article, we’re rounding up some of the best platforms from the voices of the users themselves, sharing the pros and cons below: Features of Data Science Platforms Accessible Computing Environments Data scientists get access to prebuilt computation environments - high memory notebooks, GPUs etc, each connected to hardware in the back-end and ready to use.Top 10 GPU Computing Platformshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-10-gpu-computing-platforms/Wed, 02 Nov 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-10-gpu-computing-platforms/GPU technology has led to massive performance gains for machine learning tasks, as well as enabling us to solve more complex and difficult Data Science problems. By applying GPUs to data science problems judiciously and thoughtfully, you can accelerate your work and your productivity substantially. In this article, we’re exploring the top 10 cloud GPU computing platforms and services focused on factors including pricing, infrastructure, design, performance, support, and more. Dive in below to consider your cloud GPU necessities.Setting up JupyterHub with Single Sign-on (SSO) on AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-jupyterhub-with-single-sign-on-sso-aws/Thu, 27 Oct 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-jupyterhub-with-single-sign-on-sso-aws/Single sign-on (SSO) is a method to authenticate login into multiple services with a single set of user credentials. SSO offers an increased security layer to your data science team, code and data by reducing the attack surface area to only one set of user credentials. It is considered to be a standard enterprise feature for any software used in modern corporate environments. Below, we will discuss how to set up JupyterHub as well as how to set up SSO to meet your team’s security needs.How to Set Up JupyterHub on AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-jupyterhub-on-aws/Wed, 12 Oct 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-jupyterhub-on-aws/You have been tasked with setting up JupyterHub. Where do you start? You are probably reading AWS documentation, googling alternatives, and finding a bit of information overload. If you are setting up JupyterHub for a business, you will also likely want to do this with a security-first approach. In this article, we will share the top tutorials to provide you with a comprehensive guide on how to set up JupyterHub that works well for teams and businesses.How to Setup Jupyter Notebooks on EC2https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-jupyter-notebook-on-ec2/Wed, 12 Oct 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/how-to-setup-jupyter-notebook-on-ec2/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. You have been tasked with getting Jupyter notebooks working on EC2. Where do you start? You are probably reading AWS documentation, googling alternatives, and finding a bit of information overload. If you are running Jupyter Notebooks for a business, we’re sharing ample ways to get setup quickly!Saturn Cloud VS. SageMakerhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog2/saturn-cloud-vs-sagemaker/Mon, 03 Oct 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog2/saturn-cloud-vs-sagemaker/Check out a side by side comparison here When data scientists are choosing a data science platform, they look for several qualities and capabilities that will ensure their work is productive and valuable. They seek a platform that is easy to use, scales with their team, and allows them to use their preferred tools and languages. Saturn Cloud and SageMaker are two popular platforms among data scientists. They both allow data scientists to do their work in the cloud using hosted notebooks, but they differ significantly in their features and ease of use.Top 10 Free Machine Learning Platforms in 2022https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-10-free-machine-learning-platforms-in-2022/Thu, 22 Sep 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/top-10-free-machine-learning-platforms-in-2022/Machine learning platforms are software products that help machine learning engineers in training and deploying machine learning models. Machine learning platforms help you automate the full machine learning lifecycle: from training and testing a model through deploying and running it. Even someone who is not an expert in machine learning or coding can utilize built in tools to help bring models to production. Companies are seeking AI-driven solutions in almost all sectors of industry like healthcare, finance, manufacturing, etc.Build or Buy Data Science Toolshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/build-vs-buy-data-science-tools/Wed, 17 Aug 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/build-vs-buy-data-science-tools/We’re kicking off a series of blog posts on setting up data science infrastructure. Infrastructure decisions around data science often fall to the first data scientist hired by the company. The company may have hired a new head of data science or a single data scientist who reports up to the head of analytics. These articles will be written with that audience in mind - people who need functional data science infrastructure and who may not have substantial DevOps knowledge.Docker for Data Scientistshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/ds-deploy-docker/Mon, 01 Aug 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/ds-deploy-docker/Want to learn a lot more about using Docker for data science? Check out this in-depth presentation on the topic. There are many ways to deploy data science applications like dashboards or APIs, which are helpfully enumerated in my earlier blog post. However, one of the downsides of deploying code is you have to set up dependencies, install the right version of the programming language (like R or Python), and ensure the environment is exactly how you want it.Deploying Your Data Science Codehttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/ds-deploy-methods/Sun, 31 Jul 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/ds-deploy-methods/One of the surprising challenges for data scientists is figuring out how to deploy your code. You may have made a cool dashboard with R and Shiny, or want to deploy a machine learning model as an API with a framework in Python like Flask or FastAPI. While these tools are often easy to get running on your local machine, if you run them locally then your application is usually only available on your local machine (and will stop running the moment your power off your machine).Sharing Your Data Science Workhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/ds-deploy-code-formats/Sat, 30 Jul 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/ds-deploy-code-formats/One of the surprising challenges for data scientists is figuring out how to deploy your code so that other people can use it. You may want to make your machine learning models run in a way that business people can interact with them. Or you might want to write models that engineers can call. While it can be easy to get data science code running on on your local machine, if you run them locally then your application is usually only available on your local machine and will stop running the moment your power off your machine.PyTorch for Natural Language Processing - Building a Fake News Classification Modelhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/pytorch-for-natural-language-processing-building-a-fake-news-classification-model/Tue, 14 Jun 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/pytorch-for-natural-language-processing-building-a-fake-news-classification-model/PyTorch for Natural Language Processing: Building a Fake News Classification Model Source The proliferation of fake news has become a pressing concern in today’s digital age. To attempt to combat the spread of misinformation, you need effective tools and techniques. Specifically, Natural Language Processing (NLP) techniques help you extract meaning from vast amounts of text data, which can solve the issue of misinformation. In this article, we explore the power of PyTorch, a popular deep-learning framework, in building a fake news classification model.6 Powerful Scalable Computing Platforms 2024 Editionhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/6-powerful-computing-platforms-2024-edition/Sun, 12 Jun 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/6-powerful-computing-platforms-2024-edition/As data volumes grow, the demand for scalable computing tools does as well. Fortunately, the open source community has responded with a plethora of new tools to parallelize code, accelerate computation with GPUs, and deliver faster time-to-value for teams with big data. While some teams have the DevOps resources and budget to create the infrastructure to host open source tools securely, others simply do not have time, budget or resources. We have compiled a list of the top scalable compute platforms that provide top hosted solutions that work securely with enterprise data.Structured Vs Unstructured Datahttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/structured-vs-unstructured-data/Tue, 07 Jun 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/structured-vs-unstructured-data/Data can be broadly thought of as two different types, structured data andunstructured data. Structured data is data that is stored set of tables with rows and columns–think of Excel spreadsheets or CSV files. The data may be spread over multiple sheets, but by using indices you can connect the data together. Unstructured data is data that is not stored in a tabular format, meaning it isn’t coerced into a set of tables.Setting up JupyterHub Securely on AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/jupyterhub_security/Sat, 04 Jun 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/jupyterhub_security/In our previous blog post on JupyterHub, we walked through the basic deployment steps for The Littlest JupyterHub (TLJH) and Zero-to-JupyterHub (ZTJH). Our recommendation for anyone looking to deploy JupyterHub as a data science platform in production was to use ZTJH. We’ll assume you’re using that for this blog post. Once you have Zero-JupyterHub up and running, security is the top priority. You should feel confident that your data science platform is safe and that your users can access it easily.An Introduction to Data Science Platformshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/why-data-science-platform/Mon, 30 May 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/why-data-science-platform/A data science platform is a set of centralized tools for data scientists to do their work, and they can be immensely valuable to a data science organization. They are infrastructure for data scientists to run code, train models, and deploy APIs, and can replace a data scientist having to manually set up their programming environment themselves. Some examples of data science platforms are Saturn Cloud, SageMaker, and Databricks. At their best, data science platforms can help the team work closely together, use more sophisticated hardware and analyses, and keep work more reproducible.Setting up JupyterHub on AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/jupyterhub_aws/Wed, 13 Apr 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/jupyterhub_aws/Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams. Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools. Dealing with large datasets and tight security can make it difficult for a data science team to get their work done. A shared computational environment that scales for big data, and a single place to establish security protocols, makes it much easier.Introduction to GPUshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/intro-to-gpus/Wed, 23 Mar 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/intro-to-gpus/GPU technology has led to massive performance gains for machine learning tasks, as well as enabling us to solve more complex and difficult data science problems. By applying GPUs to data science problems judiciously and thoughtfully, you can accelerate your work and your productivity substantially - but before this, you’ll need to understand how a GPU works and why it makes such a difference. GPU Tutorials If you already know how GPUs work, and you’d just like to try some examples of GPUs in Saturn Cloud, check out our tutorials:Bring Old Photos Back to Life Using Saturn Cloudhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/bringing-old-photos-back-to-life/Mon, 14 Feb 2022 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/bringing-old-photos-back-to-life/My mother recently asked me to take a look at some old family photos that she wanted to scan, print, and frame as gifts. Unfortunately, they were not in great condition. Some had suffered a bit over time–colors had faded and scratches had been added–and some were originally out of focus. My father had attempted to clean them up on his personal computer using PhotoShop, but to little avail. Despite his efforts, he wasn’t able to significantly improve the condition of the images.What it's like being Prefecthttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/whats-its-like-being-prefect/Mon, 13 Dec 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/whats-its-like-being-prefect/Prefect is a data pipeline automation tool based on a simple premise: “Your code probably works. But sometimes it doesn’t.” If you check the Prefect documentation, you will find the terms positive data engineer and negative data engineering have been often mentioned to describe what Prefect is for. To understand Prefect, let us first understand what each of these terms means. Positive Data Engineering : Writing your code and expecting it to run successfully.Dask DataFrame is not Pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-is-not-pandas/Tue, 26 Oct 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-is-not-pandas/This article is the second article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The next article in the series is about parallelizing for loops, and other embarrassingly parallel operations with dask.delayed The Allure You start with medium-sized data sets. Pandas does quite well. Then the data sets get larger, and so you scale up to a larger machine.My First Experience Using RAPIDShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/my-experience-of-first-time-using-rapids/Thu, 07 Oct 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/my-experience-of-first-time-using-rapids/If you’ve ever heard of RAPIDS, the tool from NVIDIA that uses GPUs for machine learning, you may have asked yourself: How tough is it to accelerate machine learning workflows? Will I need to take courses to learn this or watch some tutorials to understand scalability? Don’t worry, it turns out RAPIDS is very straightforward to learn. One week earlier, I had similar questions about RAPIDS and after researching, I was surprised how simple it was to get started.Host a Jupyter Notebook as an APIhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/notebook-apis/Thu, 09 Sep 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/notebook-apis/Often times data scientists have Jupyter notebooks that they want to run in a recurring manner. Often this is a Jupyter notebook that has an analysis that needs to frequently be rerun with newer data. Another situation where having a notebook rerun may be useful is if the notebook trains a machine learning model and the model needs to be retrained on more recent data points. There are a number of ways that such a notebook could be rerun, depending on the use case:If You Can Write Functions, You Can Use Daskhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-for-beginners/Thu, 26 Aug 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-for-beginners/This article is the third article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The first article in the series is about using LocalCluster. The second article in the series is about using Dask DataFrame. I’ve been chatting with many data scientists who’ve heard of Dask, the Python framework for distributed computing, but don’t know where to start.Multi-GPU TensorFlow on Saturn Cloudhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/tensorflow_intro/Mon, 23 Aug 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/tensorflow_intro/TensorFlow is a popular, powerful framework for deep learning used by data scientists across industries. However, sometimes its efficacy can be hamstrung by a lack of compute resources. You might start with training on a CPU, and when that’s too slow for your needs, bump up to a GPU. But that can still be insufficient! When you’re sitting there for an hour, or two, or three waiting for your model to train, you’re wasting time when you could be making progress.Speeding up Neural Network Training With Multiple GPUs and Daskhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-with-gpus/Thu, 19 Aug 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-with-gpus/The talk this blog post was based on. A common moment when training a neural network is when you realize the model isn’t training quickly enough on a CPU and you need to switch to using a GPU. A less common, but still important, moment is when you realize that even a large GPU is too slow to train a model and you need further options. One option is to connect multiple GPUs together across multiple machines so they can work as a unit and train a model more quickly.Dealing with Long Running Jupyter Notebookshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/long-running-notebooks/Thu, 15 Jul 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/long-running-notebooks/We’ve gotten a number of customers struggling with long running Jupyter notebooks–ones that take several hours or more to execute. Often, they would come to us because these long running notebooks would, at some point, lose connectivity between the server and the browser, as it is common with cloud services. Normally, cloud services gracefully reconnect and there are no issues. In the case of Jupyter, if the connection is lost, then Jupyter stops saving any output.Just Start with the Dask LocalClusterhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/local-cluster/Wed, 14 Jul 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/local-cluster/This article is the first article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The next article in the series is about parallelizing for loops, and other embarssingly parallel operations with .delayed* There are many ways to run Dask clusters. This article urges users to start as simplistically as possible, and runs through easy ways of doing just that.Deploy Your Machine Learning Model - Part 3 (Flask API or Web App)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploy-ml-models-3/Sun, 30 May 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploy-ml-models-3/This post continues our series about deploying machine learning models with Saturn Cloud - if you missed it, read Part 1 here and Part 2 here. Our Toolkit Saturn Cloud (so you can deploy easily!) Flask Plotly (python and JS) Scikit-learn (for our model) Other Helpful Links codebook for the dataset plotly.js cheat sheet Jinja (helpful for Flask) See it in Action! If you have a Saturn Cloud account, you can see the Flask web application running live now, or see the Flask REST API.Deploy Your Machine Learning Model - Part 2 (Voila Web App)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploy-ml-models-2/Sat, 29 May 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploy-ml-models-2/This post continues our series about deploying machine learning models with Saturn Cloud - if you missed it, read Part 1 here. If you’d rather deploy using Flask, either as a web app or an API, move on to Part 3. Our Toolkit Saturn Cloud (so you can deploy easily!) Voila Plotly (python and JS) Scikit-learn (for our model) Other Helpful Links ipywidgets (helpful for Voila) codebook for the dataset plotly.Deploy Your Machine Learning Model - Part 1 (The Model)https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploy-ml-models-1/Fri, 28 May 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploy-ml-models-1/Data science model deployment can sound intimidating if you have never had a chance to try it in a safe space. Do you want to make a REST API, or a full frontend app? What does it take to do either of these? It’s not as hard as you might think! In this three-part series of posts, we’re going to go through how you can take a model and deploy it to a web app or a REST API (using Saturn Cloud), so that others can interact with it.Deploying Data Pipelines at Saturn Cloud with Dask and Prefecthttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-data-pipelines-at-saturn/Wed, 19 May 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/deploying-data-pipelines-at-saturn/Let’s talk about how we deploy data pipelines on Saturn Cloud internally at Saturn Cloud. This article will discuss how we do that and some lessons learned. Is also assumes that you’re already a fan of Prefect and Dask. Use Dask! But only when you need it Scaling up should be progressive. The more you scale, the more inherent complexity you deal with. I believe most jobs should be written with Pandas first, then Dask on a local cluster, and finally Dask on a multi-node cluster (if you really need it).What is Dask and How Does it Work?https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-dask/Tue, 27 Apr 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/what-is-dask/Check out Dask in 15 Minutes by Dan Bochman for a video introduction to Dask Dask is an open-source Python library that lets you work on arbitrarily large datasets and dramatically increases the speed of your computations. It is available on various data science platforms, including Saturn Cloud. This article will first address what makes Dask special and then explain in more detail how Dask works. So: what makes Dask special?Dask and pandas: There’s No Such Thing as Too Much Datahttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-and-pandas-theres-no-such-thing-as-too-much-data/Mon, 01 Feb 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/dask-and-pandas-theres-no-such-thing-as-too-much-data/Do you love pandas, but hate when you reach the limits of your memory or compute resources? Dask gives you the chance to use the pandas API with distributed data and computing. In this article, you’ll learn how it really works, how to use it yourself, and why it’s worth the switch. Introduction Pandas is the beloved workhorse of the PyData toolkit — it makes incredibly diverse data analysis and data science tasks possible, using a user friendly and robust API.Lazy Evaluation with Daskhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-data-scientist-s-guide-to-lazy-evaluation-with-dask/Tue, 26 Jan 2021 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-data-scientist-s-guide-to-lazy-evaluation-with-dask/We talk a lot about Dask and parallel computing, but sometimes we don’t do enough to explain the concepts that make it possible. Read on to learn how lazy evaluation works, how Dask uses it, and how it makes parallelization not only possible, but easy! What is Dask? Dask is an open-source framework that enables parallelization of Python code. This can be applied to all kinds of Python use cases, not just machine learning.Combining Dask and PyTorch for Better, Faster Transfer Learninghttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/combining-dask-and-py-torch-for-better-faster-transfer-learning/Tue, 01 Dec 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/combining-dask-and-py-torch-for-better-faster-transfer-learning/Introducing a new Python package: dask-pytorch-ddp This tutorial is run on the Saturn Cloud platform, which makes Dask clusters available at the click of a button to users. If you need access to clusters so you can try out the steps below, we have a free version Data parallelism within a single machine is a reasonably well-documented method for optimizing deep learning training performance, particularly in PyTorch . However, taking the step from one machine to training a single neural net on many machines at once can seem difficult and complicated.Handy Dandy Guide to Working With Timestamps in pandashttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/handy-dandy-guide-to-working-with-timestamps-in-pandas/Fri, 20 Nov 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/handy-dandy-guide-to-working-with-timestamps-in-pandas/This article is your handy dandy guide to working with timestamps in pandas. We’ll cover the most common problems people deal with when working with pandas as it relates to time. Specifically we’ll cover: reading Timestamps from CSVs working with timezones comparing datetime objects resampling data moving window functions datetime accessors Reading Timestamps From CSV Files One of the most common things is to read timestamps into pandas via CSV.Computer Vision at Scale With Dask and PyTorchhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/computer-vision-at-scale-with-dask-and-py-torch/Tue, 03 Nov 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/computer-vision-at-scale-with-dask-and-py-torch/Applying deep learning strategies to computer vision problems has opened up a world of possibilities for data scientists. However, to use these techniques at scale to create business value, substantial computing resources need to be available – and this is just the kind of challenge Saturn Cloud is built to solve! In this tutorial, you’ll see the steps to conducting image classification inference using the popular Resnet50 deep learning model at scale using NVIDIA GPU clusters on Saturn Cloud.Cross-Entropy Loss Functionhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/cross-entropy-loss-function/Tue, 06 Oct 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/cross-entropy-loss-function/Photo by Fatos Bytyqi on Unsplash When working on a Machine Learning or a Deep Learning Problem, loss/cost functions are used to optimize the model during training. The objective is almost always to minimize the loss function. The lower the loss the better the model. Cross-Entropy loss is a most important cost function. It is used to optimize classification models. The understanding of Cross-Entropy is pegged on understanding of Softmax activation function.Random Forest on GPUs: 2000x Faster than Apache Sparkhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/random-forest-on-gp-us-2000-x-faster-than-apache-spark/Thu, 30 Jul 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/random-forest-on-gp-us-2000-x-faster-than-apache-spark/If you prefer to watch a video demo, click here. Random forest is a machine learning algorithm trusted by many data scientists for its robustness, accuracy, and scalability. The algorithm trains many decision trees through bootstrap aggregation, then predictions are made from aggregating the outputs of the trees in the forest. Due to its ensemble nature, a random forest is an algorithm that can be implemented in distributed computing settings. Trees can be trained in parallel across processes and machines in a cluster, resulting in significantly faster training time than using a single process.Supercharging Hyperparameter Tuning with Daskhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/supercharging-hyperparameter-tuning-with-dask/Mon, 20 Jul 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/supercharging-hyperparameter-tuning-with-dask/TL;DR: Dask improves scikit-learn parameter search speed by over 16x, and Spark by over 4x. Hyperparameter tuning is a crucial, and often painful, part of building machine learning models. Squeezing out each bit of performance from your model may mean the difference of millions of dollars in ad revenue or life-and-death for patients in healthcare models. Even if your model takes one minute to train, you can end up waiting hours for a grid search to complete (think a 10×10 grid, cross-validation, etc.Practical Issues Setting up Kubernetes for Data Science on AWShttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/practical-issues-setting-up-kubernetes-for-data-science-on-aws/Sun, 19 Jul 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/practical-issues-setting-up-kubernetes-for-data-science-on-aws/Kubernetes provides a ton of useful primitives in setting up your own infrastructure. However, the standard way of provisioning Kubernetes isn’t set up very well for data science workflows. This article describes those problems, and how we think of them. UPDATE: The EIP limits are still an issue with the standard aws-cni. We’ve updated our Kubernetes clusters to use the calico-cni, which avoids these issues. This is now what we recommend.Setting Up Your Data Science & Machine Learning Capability in Pythonhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-your-data-science-machine-learning-capability-in-python/Wed, 15 Jul 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/setting-up-your-data-science-machine-learning-capability-in-python/Why Python? Python is the clear winning programming language in data science & machine learning (DSML). With its rich and dynamic open-source software ecosystem, Python stands unmatched in how adoptable, reliable, and functional it is. If you disagree with this premise, then please take a quick detour here. The Purpose of Your Data Science & Machine Learning Capability Your goal as a lead of a DSML team is to deliver the best return on investment to the business.Snowflake and Daskhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/snowflake-and-dask/Wed, 15 Jul 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/snowflake-and-dask/Snowflake is the most popular data warehouse amongst our Saturn users. This article will cover efficient ways to load Snowflake data into Dask so you can do non-SQL operations (think machine learning) at scale. UPDATE: Snowflake is working on exposing the underlying shards of result sets, so that we can efficiently load them into Dask without having to partition the data manually. Follow that work here!. This blog post is still our recommended approach until that work is merged and released.3 Ways to Schedule and Execute Python Jobshttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/3-ways-to-schedule-and-execute-python-jobs/Sat, 18 Apr 2020 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/3-ways-to-schedule-and-execute-python-jobs/Why would anyone want 3 ways to schedule and execute Python jobs? For many reasons! In particular, building your capability from one-time run tasks that generate some value to your business to reusable, automated tasks that produce sustainable value can be a game-changer to companies. This holds true whether those tasks are ETL, machine learning, or other functions entirely. For example, training a model one time and predicting on one test sample might be academically interesting.A Guide to Convolutional Neural Networks — the ELI5 wayhttps://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way/Sat, 15 Dec 2018 00:00:00 +0000https://deploy-preview-1991--saturn-cloud.netlify.app/blog/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way/Artificial Intelligence has been witnessing monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision. The agenda for this field is to enable machines to view the world as humans do, perceive it in a similar manner, and even use the knowledge for a multitude of tasks such as Image & Video recognition, Image Analysis & Classification, Media Recreation, Recommendation Systems, Natural Language Processing, etc.