Monday, May 30, 2016

Thread safty issues of openssl when used with curl

When you use libcurl to send any SSL connections like HTTPS, FTPS, etc., you need to have a look at the underlying SSL library used by libcurl which does not have native SSL support.

Per:
https://curl.haxx.se/libcurl/c/threadsafe.html
and
https://github.com/openssl/openssl/blob/OpenSSL_1_1_0-pre5/CHANGES
if you are using openssl (libssl) whose version is lower than 1.1.0, the openssl is not thread safe. You thus have to add thread locks to the openssl layer in libcurl following:
https://curl.haxx.se/libcurl/c/threaded-ssl.html

Alternatively, you could start using openssl 1.1.0 though it does not have a stable version at this time. libcurl later than 7.49.0 could compile with openssl 1.1.0, as shown on:
https://curl.haxx.se/changes.html


Other notes: openssl 1.1.0 could not compile with the latest MySQL C++ connector 1.1.7, as some symbols that the connector needs have been deprecated in openssl 1.1.0.

Sunday, May 29, 2016

A C++ Pool class to reuse connections or pointers


 #include <list> 
 #include <string> 
 #include <iostream> 
 #include <pthread.h> 
 #include <thread> 
 #include <mutex> 

/**
 * A pool for caching connections for HTTP requests and MySQL queries.
 * This class has been adapted from http://www.codeproject.com/Articles/8108/Template-based-Generic-Pool-using-C
 */
template <class T>  
class Pool
{
private:
    typedef std::shared_ptr
<T> ObjHolder_t;///< typedef for ObjHolder_t which is actually an std::shared_ptr
    typedef std::list
<ObjHolder_t> ObjList_t;///< typedef for ObjList_t

    unsigned m_size;///< pool size : default 0
    unsigned m_waitTimeSec;///< wait time: How long calling function can wait to find object
    bool m_isTempObjAllowed;///< if pool is full, is temp object allowed
    ObjList_t m_reservedList;///< reserved object list
    ObjList_t m_freeList;///< free object list
    std::mutex m_dataMutex;///< mutex for Pool data
    std::shared_ptr m_nullptr;///< a convenient nullptr std::shared_ptr
    long m_checkAbandonedIntervalSec;///< how often we should check the abandoned objects because some borrowers may fail to checkin the objects they borrowed (default: 3600 seconds)
    long m_lastCheckTimestampForAbandonedObjs;///< the last timestamp when we checked for abandoned objects
    std::function()> m_constructFunc;
    std::function&)> m_checkHealthFunc;
    std::function&)> m_reactiveFunc;
    std::function&)> m_destructFunc;

    /**
     * Initialize this instance with default member variables.
     */
    void initialize()
    {
        std::lock_guard scopelock(m_dataMutex);
        for (auto &it: m_freeList)
        {
            m_destructFunc(it);
            it.reset();
        }
        for (auto &it: m_reservedList)
        {
            m_destructFunc(it);
            it.reset();
        }
        m_reservedList.clear();
        m_freeList.clear();
        m_size = 0;
        m_isTempObjAllowed = true;
        m_waitTimeSec = 3;
        m_checkAbandonedIntervalSec=3600;
    }
public:
    /**
     * A default constructor.
     */
    Pool()
    {
        initialize();
    }
    /**
     * A default deconstructor.
     */
    ~Pool()
    {
        initialize();
    }
    /**
     * Reset the Pool.
     */
    void reset()
    {
        initialize();
    }
    /**
     * Initialize the pool with specific parameters.
     * This method could be only called once per instance.
     * @param nPoolSize
     * @param nExpirationTime
     * @param bTempObjAllowed
     * @param nWaitTime
     */
    void initialize(const unsigned nPoolSize,
            std::function()> constructFunc,
            std::function&)> checkHealthFunc,
            std::function&)> reactiveFunc,
            std::function&)> destructFunc,
            const bool bTempObjAllowed=true,
            const unsigned nWaitTime = 3)
    {
        std::lock_guard scopelock(m_dataMutex);
        if (m_size == 0)
        {
            m_size = nPoolSize;
            m_isTempObjAllowed = bTempObjAllowed;
            m_waitTimeSec = nWaitTime;
            m_constructFunc=constructFunc;
            m_checkHealthFunc=checkHealthFunc;
            m_reactiveFunc=reactiveFunc;
            m_destructFunc=destructFunc;
        }
        else
            throw FailureException("can't Initialize the pool again");
    }

    /**
     * Borrow an object from the Pool.
     * This method promises finding a new object.
     * @return the object pointer
     */
    std::shared_ptr& checkout()
    {
        while (true)
        {
            {
                std::lock_guard scopelock(m_dataMutex);
                std::shared_ptr &pObj=findFreeObject();
                if (pObj!=nullptr)
                {
                    return pObj;
                }
                // did not find a free one
                if (m_freeList.size() + m_reservedList.size() < m_size)
                    return createObject();
                else if ((long)time(NULL) - m_lastCheckTimestampForAbandonedObjs > m_checkAbandonedIntervalSec)
                {
                    collectAbandonedObjects();
                    std::shared_ptr &pObj = findFreeObject();
                    if (pObj!=nullptr)
                        return pObj;
                }
                else if (m_isTempObjAllowed)
                    return createObject();
                collectAbandonedObjects();
                {
                    std::shared_ptr &pObj = findFreeObject();
                    if (pObj!=nullptr)
                        return pObj;
                }
            }
            sleep(m_waitTimeSec);
        }
    }
    /**
     * Return an object to this Pool.
     * This method will first validate the returned object, then put it in the free object list, and finally remove it from the reserved object list.
     * @param pObj the object to return
     */
    void checkin(std::shared_ptr& pObj)
    {
        std::lock_guard scopelock(m_dataMutex);
        if (validateObject(pObj))
        {
            m_freeList.push_back(pObj);
            // Todo: why?
            //oTemp.setObject(NULL);
        }
        else
        {// the object is bad, so deconstruct it
            m_destructFunc(pObj);
        }
        // remove the object from the reserved list
        for (typename ObjList_t::iterator i=m_reservedList.begin(); i!=m_reservedList.end(); ++i)
        {
            if (*i==pObj)
            {
                i = m_reservedList.erase(i);
                break;
            }
        }
    }

private:
    /**
     * Create a new object and add it to the reserved object list.
     * @return the newly created object
     */
    std::shared_ptr& createObject()
    {
        std::shared_ptr newObj=m_constructFunc();
        if (newObj!=nullptr && m_checkHealthFunc(newObj))
        {
            m_reservedList.push_back(newObj);
            return m_reservedList.back();
        }
        else
        {
            throw FailureException("could not create Object");
        }
    }
    /**
     * It will move abandoned objects to the free object list from the reserved object list,
     * if they could be active.
     */
    void collectAbandonedObjects()
    {
        for (typename ObjList_t::iterator it=m_reservedList.begin(); it!=m_reservedList.end(); ++it)
        {
            ObjHolder_t &oHolder = *it;
            if (oHolder.unique())
            {// checks whether the managed object is managed only by the current shared_ptr instance
                if (validateObject(oHolder)==true)
                {
                    m_freeList.push_back(oHolder);
                }
                it = m_reservedList.erase(it);
            }
        }
        m_lastCheckTimestampForAbandonedObjs=(long)time(NULL);
    }

    /**
     * Validate object if it is still usable.
     * If not, try to make it usable.
     * @param obj the pointer to the object which needs check
     * @return true if obj is good; false otherwise
     */
    bool validateObject(std::shared_ptr &obj)
    {
        if (obj==nullptr)
        {
            return false;
        }
        else if (m_checkHealthFunc(obj) || m_reactiveFunc(obj))
        {
            return true;
        }
        return false;
    }

    /**
     * Find a free object which could be active from the free object list.
     * The free object list is checked. If any free object is inactive, we will try to reactive it.
     * If reactiving it failed, we will drop the object.
     * @return nullptr if no free object available which could be active
     */
    std::shared_ptr &findFreeObject()
    {
        // find existing free Object
        while (!m_freeList.empty())
        {
            ObjHolder_t &obj=m_freeList.front();
            if (validateObject(obj))
            {
                m_reservedList.push_back(obj);
                m_freeList.pop_front();
                return m_reservedList.back();
            }
            else// delete the Object
            {
                m_freeList.pop_front();
                m_destructFunc(obj);
            }
        }
        return m_nullptr;
    }

public:
    /**
     * Print the info of this Pool to a string.
     * @return the string representation of this Pool
     */
    std::string toString() const
    {
        std::stringstream ss;
        ss << "Pool(size=" << m_size
                << " isTempObjAllowed=" << m_isTempObjAllowed
                << " reservedList=" << m_reservedList.size()
                << " freeList=" << m_freeList.size();
        ss << ")";
        return ss.str();
    }
};




/////////////////////////////////////////////////////////////////////////////////////////////////////////
// How to use Pool

 #include <iostream> 
 #include <curl/curl.h> 

int main(int argc, char *argv[])
{

    typedef CURL T;
    const unsigned nPoolSize=2;
    std::string url="https://datamarket.accesscontrol.windows.net/v2/OAuth2-13/";
    std::function()> constructFunc(
    [url]() -> std::shared_ptr
    {
        CURL *curl=curl_easy_init();
        CURLcode res;
        if (curl==NULL)
            throw MZFailureException("could not get a curl handle in "
                    +std::string(__FILE__)+"("+std::string(__FUNCTION__)+") on line "+std::to_string(__LINE__));

        /* First set the URL that is about to receive our POST. This URL can
           just as well be a https:// URL if that is what should receive the
           data. */
        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());

        curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0);
        // we will delete the pointer by ourselves. Otherwise, shared_ptr would call delete
        std::shared_ptr p(curl, [](T*){});
        return p;
    });
    std::function&)> checkHealthFunc=
    [](std::shared_ptr& curlConn) -> bool
    {
        return true;
    };
    std::function&)> reactiveFunc=
    [](std::shared_ptr& curlConn) -> bool
    {
        return true;
    };
    std::function&)> destructFunc=
    [](std::shared_ptr& curlConn)
    {
        curl_easy_cleanup(curlConn.get());
        curlConn.reset();
    };
    const bool bTempObjAllowed=true;
    const unsigned nWaitTime = 1;
    // get the pool instance
    Pool
<T>  pool;
    // initialize the pool
    pool.initialize(nPoolSize,
                constructFunc,
                checkHealthFunc,
                reactiveFunc,
                destructFunc,
                bTempObjAllowed,
                nWaitTime);
    // checkout the object
    std::shared_ptr pObj=pool.checkout();
    std::cerr << "after checkout, the pool is: " << pool.toString() << std::endl;
    pObj=pool.checkout();
    std::cerr << "after checkout, the pool is: " << pool.toString() << std::endl;
    pObj=pool.checkout();
    std::cerr << "after checkout, the pool is: " << pool.toString() << std::endl;
    pObj=pool.checkout();
    std::cerr << "after checkout, the pool is: " << pool.toString() << std::endl;
    pObj=pool.checkout();
    std::cerr << "after checkout, the pool is: " << pool.toString() << std::endl;
    if(pObj!=nullptr)
    {
        std::cerr << "got an object which is not nullptr" << std::endl;
        pool.checkin(pObj); // checkin the object
        std::cerr << "after checkin, the pool is: " << pool.toString() << std::endl;
    }
    else
        std::cerr << "got an object which is nullptr" << std::endl;
    // reset the pool
    pool.reset();
    std::cerr << "after reset, the pool is: " << pool.toString() << std::endl;
}

Wednesday, May 25, 2016

How to let MeCab library use a given dictionary directory

MeCab is a famous analysis tool for a few languages. It is used to tokenize Japanese sentences into words by me in my project. I installed it in its default system directories and everything just works well.

Recently I have to hack it to give it a specific dictionary directory which I want it to use in my codes, without installing it on the target machine. I ended up getting issues:MeCab just does not use the dictionary directory I have given, throwing errors.

After reading MeCab source codes, I found mecab-0.996/src/utils.cpp actually looks for the dictionary files using the codes in Reference (3). The function is called load_dictionary_resource() which has to find mecabrc first before loading the real dictionaries. The mecabrc is like a configuration file installed by MeCab to record the dictionary path etc. which looks like:
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir =  /home/your-name/local/lib/mecab/dic/ipadic

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n
, where the dicdir could be a wrong path and we could tell MeCab to use a given dictionary directory instead.

The mecabrc could be configured via the option "--rcfile" and the dictionary directory could be configured via "--dicdir".
#include
MeCab::Tagger *m_mecabTagger;
        m_mecabTagger=MeCab::createTagger("--rcfile /path/to/dummy/mecabrc -O wakati --dicdir /path/to/your/dictionary/dir");
        if (!m_mecabTagger)
       {
             const char *e = m_mecabTagger ? m_mecabTagger->what() :  MeCab::getLastError();
             std::cerr << "ERROR: " << e << std::endl;
             delete m_mecabTagger;
       }

References
(1) MeCab: http://taku910.github.io/mecab/libmecab.html
(2) MeCab API: http://taku910.github.io/mecab/doxygen/classMeCab_1_1Tagger.html
(3) piece of mecab-0.996/src/utils.cpp:
292 bool load_dictionary_resource(Param *param) {
293   std::string rcfile = param->get("rcfile");
294
295 #ifdef HAVE_GETENV
296   if (rcfile.empty()) {
297     const char *homedir = getenv("HOME");
298     if (homedir) {
299       const std::string s = MeCab::create_filename(std::string(homedir),
300                                                    ".mecabrc");
301       std::ifstream ifs(WPATH(s.c_str()));
302       if (ifs) {
303         rcfile = s;
304       }
305     }
306   }
307
308   if (rcfile.empty()) {
309     const char *rcenv = getenv("MECABRC");
310     if (rcenv) {
311       rcfile = rcenv;
312     }
313   }
314 #endif
315
316 #if defined (HAVE_GETENV) && defined(_WIN32) && !defined(__CYGWIN__)
317   if (rcfile.empty()) {
318     scoped_fixed_array buf;
319     const DWORD len = ::GetEnvironmentVariableW(L"MECABRC",
320                                                 buf.get(),
321                                                 buf.size());
322     if (len < buf.size() && len > 0) {
323       rcfile = WideToUtf8(buf.get());
324     }
325   }
326 #endif
327
328 #if defined(_WIN32) && !defined(__CYGWIN__)
329   HKEY hKey;
330   scoped_fixed_array v;
331   DWORD vt;
332   DWORD size = v.size() * sizeof(v[0]);
333
334   if (rcfile.empty()) {
335     ::RegOpenKeyExW(HKEY_LOCAL_MACHINE, L"software\\mecab", 0, KEY_READ, &hKey);
336     ::RegQueryValueExW(hKey, L"mecabrc", 0, &vt,
337                        reinterpret_cast(v.get()), &size);
338     ::RegCloseKey(hKey);
339     if (vt == REG_SZ) {
340       rcfile = WideToUtf8(v.get());
341     }
342   }
343
344   if (rcfile.empty()) {
345     ::RegOpenKeyExW(HKEY_CURRENT_USER, L"software\\mecab", 0, KEY_READ, &hKey);
346     ::RegQueryValueExW(hKey, L"mecabrc", 0, &vt,
347                        reinterpret_cast(v.get()), &size);
348     ::RegCloseKey(hKey);
349     if (vt == REG_SZ) {
350       rcfile = WideToUtf8(v.get());
351     }
352   }
353
354   if (rcfile.empty()) {
355     vt = ::GetModuleFileNameW(DllInstance, v.get(), size);
356     if (vt != 0) {
357       scoped_fixed_array drive;
358       scoped_fixed_array dir;
359       _wsplitpath(v.get(), drive.get(), dir.get(), NULL, NULL);
360       const std::wstring path =
361           std::wstring(drive.get()) + std::wstring(dir.get()) + L"mecabrc";
362       if (::GetFileAttributesW(path.c_str()) != -1) {
363         rcfile = WideToUtf8(path);
364       }
365     }
366   }
367 #endif
368
369   if (rcfile.empty()) {
370     rcfile = MECAB_DEFAULT_RC;
371   }
372
373   if (!param->load(rcfile.c_str())) {
374     rcfile = "mecab_etc/mecabrc";
375     if (!param->load(rcfile.c_str())) {
376         return false;
377     }
378   }
379
380   std::string dicdir = param->get("dicdir");
381   if (dicdir.empty()) {
382     dicdir = ".";  // current
383   }
384   remove_filename(&rcfile);
385   replace_string(&dicdir, "$(rcpath)", rcfile);
386   param->set("dicdir", dicdir, true);
387   dicdir = create_filename(dicdir, DICRC);
388
389   if (!param->load(dicdir.c_str())) {
390     return false;
391   }
392
393   return true;
394 }

Tuesday, May 17, 2016

Openssl segfault bug and building Curl with new openssl libraries

Recently, I met a bug in the openssl library which results in segfault when sending HTTPS requests using the Curl library (which uses openssl).
All the methods of this post have been tested on Ubuntu 12.04.

When using GDB to backtrace the segfault, the segfault looks like:
(gdb) bt
#0  0x00007fbaf69a54cb in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#1  0xca62c1d6ca62c1d6 in ?? ()
#2  0xca62c1d6ca62c1d6 in ?? ()
#3  0xca62c1d6ca62c1d6 in ?? ()
#4  0xca62c1d6ca62c1d6 in ?? ()
#5  0xca62c1d6ca62c1d6 in ?? ()
#6  0xca62c1d6ca62c1d6 in ?? ()
#7  0xca62c1d6ca62c1d6 in ?? ()
#8  0xca62c1d6ca62c1d6 in ?? ()
#9  0x00007fbaf6d10935 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#10 0x00007fba3c12cb70 in ?? ()
#11 0x000000000000000a in ?? ()
#12 0x00007fbaf69a1900 in SHA1_Update () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#13 0x00007fbaf6a23def in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#14 0x00007fbaf69d75e5 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#15 0x00007fbaf69d73c8 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#16 0x00007fbaf69edf9b in EC_KEY_generate_key () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#17 0x00007fbaf6d2f2a4 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#18 0x00007fbaf6d30c03 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#19 0x00007fbaf6d3a373 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#20 0x000000000049a015 in ossl_connect_common ()
#21 0x000000000046a428 in Curl_ssl_connect_nonblocking ()
#22 0x000000000046ee9e in https_connecting ()
#23 0x00000000004671ee in multi_runsingle ()
#24 0x0000000000467cd5 in curl_multi_perform ()
#25 0x0000000000462e9e in curl_easy_perform ()
If you have a look at the dependent libraries used by curl, you could find the ssl libraries it is using:
$ ldd ./curl
    linux-vdso.so.1 =>  (0x00007fffd6039000)
    libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f48cf957000)
    libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f48cf6f9000)
    libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f48cf31d000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f48cf106000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f48ceefe000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f48ceb3f000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f48ce93b000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f48ce71e000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f48cfb9d000)


After googling for a while, I found this should be due to a bug in the openssl library which has been fixed since 1.0.1c, as mentioned on Reference (1). The openssl versions are 1.0.1, 1.0.1a, 1.0.1b, etc. But Ubuntu 12.04 is using some buggy version of openssl 1.0.1 by default.

The next question is just how to build Curl with new versions of openssl without the segfault bug. Again after googling, I found it is not that easy as expected. Many people created hacking ways to do this. I end up finding an easy way to do this:

# install openssl, as the default openssl on Ubuntu 12.04 is buggy about https connections, which could result in segfault
git clone https://github.com/openssl/openssl.git
cd openssl
git checkout OpenSSL_1_0_2g./config --prefix=$LOCAL_DIR no-shared
make -j
make install

# install curl
git clone https://github.com/curl/curl.git
cd curl
git checkout curl-7_48_0
autoreconf -iv
CPPFLAGS="-I$LOCAL_DIR/include" \
LDFLAGS="-L$LOCAL_DIR/lib" \
LIBS="-ldl" \
./configure --disable-shared --prefix=$LOCAL_DIR --without-ldap-lib --without-librtmp --with-ssl
make -j 
make install

After installing curl, the new curl binary executable file contains the new libssl inside statically (OpenSSL/1.0.2g):
$ ldd curl
    linux-vdso.so.1 =>  (0x00007fff8f9fe000)
    libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f5fdae42000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5fdac2b000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5fdaa22000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5fda81e000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5fda460000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5fda242000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f5fdb080000)
$ ./curl --version
curl 7.48.0-DEV (x86_64-unknown-linux-gnu) libcurl/7.48.0-DEV OpenSSL/1.0.2g zlib/1.2.3.4 libidn/1.23
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets
Important notes for developers: when you build your own program against the curl library (libcurl.a), you may find you end up having a binary executable file which still requires the buggy openssl library like:
    libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f0338f94000)
    libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f0338bb8000)
but no worries, I found that this may just be due to the curl library would force you link your program with these buggy openssl libraries by default if you use curl-config to generate your g++ arguments:
$ ./curl-config --libs
-L$LOCAL_DIR/lib -lcurl -lidn -lssl -lcrypto -lssl -lcrypto -lz -lrt -ldl
even if you follow the methods mentioned in this post. However, your program won't really use the buggy openssl libraries on your Ubuntu 12.04 (I have tested my program, and the segfault does not happen any more).









References
(1) https://bugs.launchpad.net/ubuntu/+source/s3cmd/+bug/973741

(2) https://curl.haxx.se/mail/lib-2014-12/0053.html

(3) https://github.com/openssl/openssl/tree/OpenSSL_1_0_2g

(4) https://github.com/curl/curl/tree/curl-7_48_0