The main aim of this blog to show, how you can configure R, Python, and SQL in a single R-markdown file. Most of time we have to use data from databases and python code along with R functions, and having a setup that bring goodness of all the tool in one place comes really handy.
Setup Python in Rstudio
To set up R Python And SQL in the Rstudio you have to first install miniconda. Miniconda helps to create python virtual environments and let you organize it. Such as for, data cleaning you might use a data cleaning environment for web scraping you will use a web scraping environment and so on.
Commands to manage conda environment inside RStudio:
Once you install miniconda You have to use R-reticulate in Rstudio, it is REPL (Read-eval-print loop) for Python in R, and helps to run Python code inside Rstudio. You can add miniconda into your windows terminal, but to use in Rstudio it is recommended to create and select environment from Rstudio itself.
Reticulate package provides all the necessary functionalities to run and manage python commands, such as
- load reticulate package with
install.packages(reticulate)
command in R console. - command
conda_ create()
will create a new python environment. use_condaenv(condaenv = "ENVNAME")
to select a newly created environment or existing python environment- to know the existing environments that you have created earlier, use
conda_list()
function. - once you select a environment now you are ready to install python packages with
py_install("PackageName")
.
Use conda prompt
I use conda prompt to create environment but some how those environment doesn’t work with my R-reticulate. But if you are using python only then you can configure conda with the following commands (run in system terminal):
conda help
for help.conda create --name your_environment
to create a conda environment.conda env list
to list all the environment in conda.conda activate your_environment
to activate an specific conda environment.- Use
conda config
to add other channels for package providers, but I triedpip
and it worked just fine.conda config --add channels conda-forge
(conda-forge is one of many channel)conda config --set channel_priority strict
(set preference of channel)
- Install a few packages:
conda install pandas scikit-learn matplotlib
How to use SQL with R markdown
Use dplyr
, dbplyr
, and RSQLite
packages
- Use SQL block to view the query output in ln-line mode.
- A blogpot to learn R and SQL in Rstudio https://irene.rbind.io/post/using-sql-in-rstudio/
Example of R, Python, and SQL with Rmarkdonw (*.Rmd)
if(!require(dplyr)){install.packages("dplyr");library(dplyr)}
if(!require(dbplyr)){install.packages("dbplyr");library(dbplyr)}
if(!require(RSQLite)){install.packages("RSQLite");library(RSQLite)}
conn <- src_memdb() # create a SQLite database in memory
# Similar way you can add other database connections see `?dbplyr`
con_iso <- conn$con
copy_to(conn,
storms, # this is a dataset built into dplyr
overwrite = TRUE)
tbl(conn, sql("SELECT * FROM storms LIMIT 5"))
## # Source: SQL [5 x 13]
## # Database: sqlite 3.38.5 [:memory:]
## name year month day hour lat long status category wind pressure
## <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr> <chr> <int> <int>
## 1 Amy 1975 6 27 0 27.5 -79 tropical de… -1 25 1013
## 2 Amy 1975 6 27 6 28.5 -79 tropical de… -1 25 1013
## 3 Amy 1975 6 27 12 29.5 -79 tropical de… -1 25 1013
## 4 Amy 1975 6 27 18 30.5 -79 tropical de… -1 25 1013
## 5 Amy 1975 6 28 0 31.5 -78.8 tropical de… -1 25 1012
## # … with 2 more variables: tropicalstorm_force_diameter <int>,
## # hurricane_force_diameter <int>
Now create SLQ code chunk to directly run SQL queries from R.
```{sql connection=con_iso}
SELECT * FROM storms LIMIT 5;
```
This will print 5 entire form database table storms
name | year | month | day | hour | lat | long | status | category | wind | pressure | tropicalstorm_force_diameter | hurricane_force_diameter |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Amy | 1975 | 6 | 27 | 0 | 27.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 27 | 6 | 28.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 27 | 12 | 29.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 27 | 18 | 30.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 28 | 0 | 31.5 | -78.8 | tropical depression | -1 | 25 | 1012 | NA | NA |
SQL code chunk can ouput the data as r-variable i.e. storm_preview
```{sql connection=con_iso, output.var="storm_preview"}
SELECT * FROM storms LIMIT 5;
```
```{r}
class(storm_preview)
storm_preview
```
## [1] "data.frame"
## name year month day hour lat long status category wind
## 1 Amy 1975 6 27 0 27.5 -79.0 tropical depression -1 25
## 2 Amy 1975 6 27 6 28.5 -79.0 tropical depression -1 25
## 3 Amy 1975 6 27 12 29.5 -79.0 tropical depression -1 25
## 4 Amy 1975 6 27 18 30.5 -79.0 tropical depression -1 25
## 5 Amy 1975 6 28 0 31.5 -78.8 tropical depression -1 25
## pressure tropicalstorm_force_diameter hurricane_force_diameter
## 1 1013 NA NA
## 2 1013 NA NA
## 3 1013 NA NA
## 4 1013 NA NA
## 5 1012 NA NA
Now, load reticulate
to run python codes
```{r}
# Below code will check if `reticulate` is installed or not, if not then it will install and load in the R-session.
if(!require(reticulate)){install.packages("reticulate");library(reticulate)}
```
A example of Python code inside R markdown.
```{python}
from matplotlib import pyplot as plt
# Importing Numpy Library
import numpy as np
plt.style.use('fivethirtyeight')
mu = 50
sigma = 7
x = np.random.normal(mu, sigma, size=200)
fig, ax = plt.subplots()
ax.hist(x, 20)
ax.set_title('Historgram')
ax.set_xlabel('bin range')
ax.set_ylabel('frequency')
fig.tight_layout()
plt.show() # Comment out if you are using blogdown-sites to render the site.
```
## (array([ 2., 1., 2., 8., 9., 12., 22., 25., 31., 24., 15., 15., 13.,
## 9., 7., 2., 1., 0., 1., 1.]), array([31.4398407 , 33.56751485, 35.695189 , 37.82286315, 39.95053731,
## 42.07821146, 44.20588561, 46.33355976, 48.46123392, 50.58890807,
## 52.71658222, 54.84425638, 56.97193053, 59.09960468, 61.22727883,
## 63.35495299, 65.48262714, 67.61030129, 69.73797544, 71.8656496 ,
## 73.99332375]), <BarContainer object of 20 artists>)