A simple workflow to create a clean python library based on Anaconda

The open-source Anaconda Distribution is an easy way to perform Python/R data science and machine learning on Linux, Windows, and Mac OS X. We could easily prepare an isolated and constant environment on anywhere based on a configure file.

Considering the various kind of strange errors due to the different versions of python and python libraries, this management tool makes our life much more comfortable in deploying and cooperating.

(more…)

倒排索引的简单介绍和实现

在信息检索(Information Retrieval)领域, 有个重要而基础的方法, 倒排索引(Inverted Index), 它被广泛用于各种全文搜索. 在无知的时代, 区区曾经"自创"过一种牛掰的方法, 于是称之为"映射", 小数据集上居然颇有效果, 颇为自得, 后来了解了倒排索引后, 俺才切切实实的了解到 “你以为你的 idea 很牛B, 其实只是你文献看得太少了” 这句话的真谛...

(more…)