The main aim of this blog to show, how you can configure R, Python, and SQL in a single R-markdown file. Most of time we have to use data from databases and python code along with R functions, and having a setup that bring goodness of all the tool in one place comes really handy.

Setup Python in Rstudio

To set up R Python And SQL in the Rstudio you have to first install miniconda. Miniconda helps to create python virtual environments and let you organize it. Such as for, data cleaning you might use a data cleaning environment for web scraping you will use a web scraping environment and so on.

Commands to manage conda environment inside RStudio:

Once you install miniconda You have to use R-reticulate in Rstudio, it is REPL (Read-eval-print loop) for Python in R, and helps to run Python code inside Rstudio. You can add miniconda into your windows terminal, but to use in Rstudio it is recommended to create and select environment from Rstudio itself.

Reticulate package provides all the necessary functionalities to run and manage python commands, such as

  • load reticulate package with install.packages(reticulate) command in R console.
  • command conda_ create() will create a new python environment.
  • use_condaenv(condaenv = "ENVNAME") to select a newly created environment or existing python environment
  • to know the existing environments that you have created earlier, use conda_list() function.
  • once you select a environment now you are ready to install python packages with py_install("PackageName").

Use conda prompt

I use conda prompt to create environment but some how those environment doesn’t work with my R-reticulate. But if you are using python only then you can configure conda with the following commands (run in system terminal):

  1. conda help for help.
  2. conda create --name your_environment to create a conda environment.
  3. conda env list to list all the environment in conda.
  4. conda activate your_environment to activate an specific conda environment.
  5. Use conda config to add other channels for package providers, but I tried pip and it worked just fine.
    1. conda config --add channels conda-forge (conda-forge is one of many channel)
    2. conda config --set channel_priority strict (set preference of channel)
  6. Install a few packages: conda install pandas scikit-learn matplotlib

How to use SQL with R markdown

Use dplyr, dbplyr, and RSQLite packages

Example of R, Python, and SQL with Rmarkdonw (*.Rmd)

if(!require(dplyr)){install.packages("dplyr");library(dplyr)}
if(!require(dbplyr)){install.packages("dbplyr");library(dbplyr)}
if(!require(RSQLite)){install.packages("RSQLite");library(RSQLite)}

conn <- src_memdb() # create a SQLite database in memory

# Similar way you can add other database connections see `?dbplyr`
con_iso <- conn$con
copy_to(conn, 
        storms,     # this is a dataset built into dplyr
        overwrite = TRUE)
tbl(conn, sql("SELECT * FROM storms LIMIT 5"))
## # Source:   SQL [5 x 13]
## # Database: sqlite 3.38.5 [:memory:]
##   name   year month   day  hour   lat  long status       category  wind pressure
##   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>        <chr>    <int>    <int>
## 1 Amy    1975     6    27     0  27.5 -79   tropical de… -1          25     1013
## 2 Amy    1975     6    27     6  28.5 -79   tropical de… -1          25     1013
## 3 Amy    1975     6    27    12  29.5 -79   tropical de… -1          25     1013
## 4 Amy    1975     6    27    18  30.5 -79   tropical de… -1          25     1013
## 5 Amy    1975     6    28     0  31.5 -78.8 tropical de… -1          25     1012
## # … with 2 more variables: tropicalstorm_force_diameter <int>,
## #   hurricane_force_diameter <int>

Now create SLQ code chunk to directly run SQL queries from R.

```{sql connection=con_iso}
SELECT * FROM storms LIMIT 5;
```

This will print 5 entire form database table storms

Table 1: 5 records
name year month day hour lat long status category wind pressure tropicalstorm_force_diameter hurricane_force_diameter
Amy 1975 6 27 0 27.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 27 6 28.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 27 12 29.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 27 18 30.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 28 0 31.5 -78.8 tropical depression -1 25 1012 NA NA

SQL code chunk can ouput the data as r-variable i.e. storm_preview

```{sql connection=con_iso, output.var="storm_preview"}
SELECT * FROM storms LIMIT 5;
```
```{r}
class(storm_preview)
storm_preview 
```
## [1] "data.frame"
##   name year month day hour  lat  long              status category wind
## 1  Amy 1975     6  27    0 27.5 -79.0 tropical depression       -1   25
## 2  Amy 1975     6  27    6 28.5 -79.0 tropical depression       -1   25
## 3  Amy 1975     6  27   12 29.5 -79.0 tropical depression       -1   25
## 4  Amy 1975     6  27   18 30.5 -79.0 tropical depression       -1   25
## 5  Amy 1975     6  28    0 31.5 -78.8 tropical depression       -1   25
##   pressure tropicalstorm_force_diameter hurricane_force_diameter
## 1     1013                           NA                       NA
## 2     1013                           NA                       NA
## 3     1013                           NA                       NA
## 4     1013                           NA                       NA
## 5     1012                           NA                       NA

Now, load reticulate to run python codes

```{r}
# Below code will check if `reticulate` is installed or not, if not then it will install and load in the R-session. 
if(!require(reticulate)){install.packages("reticulate");library(reticulate)}
```

A example of Python code inside R markdown.

```{python}
from matplotlib import pyplot as plt    
# Importing Numpy Library    
import numpy as np    
plt.style.use('fivethirtyeight')    
    
mu = 50    
sigma = 7    
x = np.random.normal(mu, sigma, size=200)    
fig, ax = plt.subplots()    
    
ax.hist(x, 20)    
ax.set_title('Historgram')    
ax.set_xlabel('bin range')    
ax.set_ylabel('frequency')    
    
fig.tight_layout()    
plt.show() # Comment out if you are using blogdown-sites to render the site. 
```
## (array([ 2.,  1.,  2.,  8.,  9., 12., 22., 25., 31., 24., 15., 15., 13.,
##         9.,  7.,  2.,  1.,  0.,  1.,  1.]), array([31.4398407 , 33.56751485, 35.695189  , 37.82286315, 39.95053731,
##        42.07821146, 44.20588561, 46.33355976, 48.46123392, 50.58890807,
##        52.71658222, 54.84425638, 56.97193053, 59.09960468, 61.22727883,
##        63.35495299, 65.48262714, 67.61030129, 69.73797544, 71.8656496 ,
##        73.99332375]), <BarContainer object of 20 artists>)
A figure python output of Histogram plot.

Figure 1: A figure python output of Histogram plot.