Chapter 2 Connecting to MongoDB
2.1 Mongo URI Format
The mongo()
function initiates a connection object to a MongoDB server. For example:
library(mongolite)
m <- mongo("mtcars", url = "mongodb://readwrite:test@mongo.opencpu.org:43942/jeroen_test")
To get an overview of available methods, simply print the object to the terminal.
#> <Mongo collection> 'mtcars'
#> $aggregate(pipeline = "{}", options = "{\"allowDiskUse\":true}", handler = NULL, pagesize = 1000, iterate = FALSE)
#> $count(query = "{}")
#> $disconnect(gc = TRUE)
#> $distinct(key, query = "{}")
#> $drop()
#> $export(con = stdout(), bson = FALSE, query = "{}", fields = "{}", sort = "{\"_id\":1}")
#> $find(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0, handler = NULL, pagesize = 1000)
#> $import(con, bson = FALSE)
#> $index(add = NULL, remove = NULL)
#> $info()
#> $insert(data, pagesize = 1000, stop_on_error = TRUE, ...)
#> $iterate(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0)
#> $mapreduce(map, reduce, query = "{}", sort = "{}", limit = 0, out = NULL, scope = NULL)
#> $remove(query, just_one = FALSE)
#> $rename(name, db = NULL)
#> $replace(query, update = "{}", upsert = FALSE)
#> $run(command = "{\"ping\": 1}", simplify = TRUE)
#> $update(query, update = "{\"$set\":{}}", filters = NULL, upsert = FALSE, multiple = FALSE)
The R manual page for the mongo()
function gives some brief descriptions as well.
The manual page tells us that mongo()
supports the following arguments:
collection
: name of the collection to connect to. Defaults to"test"
.db
: name of the database to connect to. Defaults to"test"
.url
: address of the MongoDB server in standard URI Format.verbose
: ifTRUE
, emits some extra outputoptions
: additional connection options such as SSL keys/certs.
The url
parameter contains a special URI format which defines the server address
and additional connection options.
mongodb://[username:password@]host1[:port1][,host2[:port2],...[/[database][?options]]
The Mongo Connection String Manual gives an overview of the connection string syntax and options. Below the most important options for using mongolite.
2.1.1 DNS Seedlist Connection Format
New in mongolite 1.3
is support for seedlist URLs with the mongodb+srv://
prefix. This indicates that before connecting, the client should lookup the actual host addresses and parameters from the DNS SRV or TXT record.
con <- mongo("mtcars", url = "mongodb+srv://readwrite:test@cluster0-84vdt.mongodb.net/test")
con$insert(mtcars)
#> List of 5
#> $ nInserted : num 32
#> $ nMatched : num 0
#> $ nRemoved : num 0
#> $ nUpserted : num 0
#> $ writeErrors: list()
The DNS seedlist allows for using a short and fixed URL for clusters consisting of multiple or dynamic servers and parameters.
2.2 Authentication
MongoDB supports several authentication modes.
2.2.1 LDAP
USER = "drivers-team"
PASS = "mongor0x$xgen"
HOST = "ldaptest.10gen.cc"
# Using plain-text
URI = sprintf("mongodb://%s:%s@%s/ldap?authMechanism=PLAIN", USER, PASS, HOST)
m <- mongo(url = URI)
m$find()
However it is recommended to use SSL instead of plain text when authenticating with a username/password:
2.2.2 X509
Let’s check if our server supports SSL:
certs <- openssl::download_ssl_cert('ldaptest.10gen.cc', 27017)
print(certs)
str(as.list(certs[[1]]))
To use X509 authentication the Mongo URI needs ssl=true&authMechanism=MONGODB-X509
:
# Using X509 SSL auth
HOST <- "ldaptest.10gen.cc"
USER <- "CN=client,OU=kerneluser,O=10Gen,L=New York City,ST=New York,C=US"
URI <- sprintf("mongodb://%s@%s/x509?ssl=true&authMechanism=MONGODB-X509", USER, HOST)
OPTS <- ssl_options(cert = ".auth/client.pem", key = ".auth/key.pem", ca = ".auth/ca.crt", allow_invalid_hostname = TRUE)
m <- mongo(url = URI, options = OPTS)
m$find()
2.2.3 Kerberos
Note: Windows uses SSPI
for Kerberos authentication. This section does not apply.
Kerberos authentication on Linux requires installation of a Kerberos client. On OS-X Kerberos is already installed by default. On Ubuntu/Debian we need:
sudo apt-get install krb5-user libsasl2-modules-gssapi-mit
Next, create or edit /etc/krbs5.conf
and add our server under [realms]
for example:
[realms]
LDAPTEST.10GEN.CC = {
kdc = ldaptest.10gen.cc
admin_server = ldaptest.10gen.cc
}
In a terminal run the following (only have to do this once)
kinit drivers@LDAPTEST.10GEN.CC
klist
We should now be able to connect in R:
2.3 SSH Tunnel
To connect to MongoDB via an SSH tunnel, you need to setup the tunnel separately with an SSH client. For example the mongolite manual contains this example:
Assume we want to tunnel through dev.opencpu.org
which runs an SSH server on the standard port 22 with username jeroen
. To initiate a tunnel from localhost:9999
to mongo.opencpu.org:43942
via the ssh server dev.opencpu.org:22
, open a terminal and run:
ssh -L 9999:mongo.opencpu.org:43942 jeroen@dev.opencpu.org -vN -p22
Some relevant ssh
flags:
-v
(optional) show verbose status output-f
run the tunnel server in the background. Usepkill ssh
to kill.-p22
connect to ssh server on port 22 (default)-i/some/path/id_rsa
authenticate with ssh using a private key
Check man ssh
for more ssh options It is also possible to run this command directly from R:
Once tunnel has been established, we can connect to our our ssh client which will tunnel traffic to our MongoDB server. In our example we run the ssh client on our localhost port 9999:
con <- mongo("mtcars", url = "mongodb://readwrite:test@localhost:9999/jeroen_test")
con$insert(mtcars)
#> List of 5
#> $ nInserted : num 32
#> $ nMatched : num 0
#> $ nRemoved : num 0
#> $ nUpserted : num 0
#> $ writeErrors: list()
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 576878 30.9 1164368 62.2 NA 892812 47.7
#> Vcells 1115296 8.6 8388608 64.0 16384 1768567 13.5
If you want to setup a tunnel client on Windows and you do not have the ssh
program, you can an SSH client like putty to setup the tunnel. See this example.
2.4 SSL options
For security reasons, SSL options can not be configured in the URI but have to be set
manually via the options
parameter. The ssl_options
function shows the default values:
#> List of 6
#> $ pem_file : NULL
#> $ ca_file : NULL
#> $ ca_dir : NULL
#> $ crl_file : NULL
#> $ allow_invalid_hostname: logi FALSE
#> $ weak_cert_validation : logi FALSE
You can use this function to specify connection SSL options:
m <- mongo(url = "mongodb://localhost?ssl=true",
options = ssl_options(cert = "~/client.crt", key = "~/id_rsa.pem"))
The MongoDB SSL client manual has more detailed descriptions on the various options.
2.5 Replica Options
The URI accepts a few special keys when connecting to a replicaset. The connection-string manual is the canonical source for all parameters. Most users should stick with the defaults here, only specify these if you know what you are doing.
2.5.1 Read Preference
The Read Preference parameter specifies if the client should connect to the primary node (default) or a secondary node in the replica set.
2.5.2 Write Concern
The Write Concern parameter is used to specify the level of acknowledgement that
the write operation has propagated to a number of server nodes. The url string parameter
is the letter w
.
Note that specifying this parameter to 2 on a server that is not a replicaset will result in an error when trying to write:
#> Error: cannot use 'w' > 1 when a host is not replicated
2.5.3 Read Concern
Finally, Read Concern allows clients to choose a level of isolation for their reads from replica sets. The default value local
returns the instance’s most recent data, but provides no guarantee that the data has been written to a majority of the replica set members (i.e. may be rolled back).
On the other hand, if we specify majority
the server will only return data that has been propagated to the majority of nodes.
#> List of 6
#> $ nInserted : int 1
#> $ nMatched : int 0
#> $ nModified : int 0
#> $ nRemoved : int 0
#> $ nUpserted : int 0
#> $ writeErrors: list()
#> List of 6
#> $ nInserted : int 1
#> $ nMatched : int 0
#> $ nModified : int 0
#> $ nRemoved : int 0
#> $ nUpserted : int 0
#> $ writeErrors: list()
In the case of our local single-node server this is never the case. Therefore we see that the server does not return any data that meets the majority level.
#> [1] 2
#> foo
#> 1 123
#> 2 456
The data is definitely there though, it just doesn’t meet the majority
criterium. If we create a new connection with level local
we do get to see our data:
#> [1] 2
#> foo
#> 1 123
#> 2 456
2.6 Global options
Finally the mongo_options
method allows for setting global client options that span across connections. Currently two options are supported:
log_level
set the mongo log level, e.g. for printing debugging information.bigint_as_char
set toTRUE
to parse int64 numbers as strings rather than doubles (R does not support large integers natively)
The default values are:
#> $log_level
#> [1] "INFO"
#>
#> $bigint_as_char
#> [1] FALSE
#>
#> $date_as_char
#> [1] FALSE
See the manual page for ?mongo_options
for more details.