对urllib中的urlencode的扩展

在使用python的urllib2模拟post时的一个问题，目前还搞不清楚是urlencode的bug还是php对于post支持的问题。各位看官不妨帮我分析下。

情景是这样的，我需要往一个php开发的api接口上post数据，数据格式如下：

{"items":{"001":["1","2"]},"title":"test"}

这是那个api所能处理的格式，这个格式对应到页面表单是就是，如下：

<form action="/api/" method="post">
<span>下列小于3的数字是？</span>
<input type="checkbox" value="662469" id="item_662469" name="item[001][]">1 <br/>
<input type="checkbox" value="662470" id="item_662470" name="item[001][]">2<br/>
<input type="checkbox" value="662471" id="item_662471" name="item[001][]">3<br/>
<input type="checkbox" value="662472" id="item_662472" name="item[001][]">4<br/>
<input type="hidden" name="title" value="test"/>
<input type="submit"/>
</form>

这个表单提交到php的接口上之后，把post的数据输出出来就是上面的那个格式。因此，我需要用python模拟post发送最上面定义的格式到php开发的api上。代码如下：

.. code:: python

import urllib2
import urllib2

params = {"items":{"001":["1","2"]},"title":"test"}
req = urllib2.Request(
                url = 'http://test.com/api',
                data = urllib.urlencode(params),
                headers = {},
            )
urllib2.urlopen(req).read()

这样发送过去的post请求在php端输出出来是：

{"items":"{'001':['1','2']}","title":"test"}

它把items对应的value转成一个string了。于是找到关键点：urllib.urlencode这个函数。经过它处理之后，json数据会被编码成url地址上那种get请求一类的编码，编码完成之后，urlencode中似乎只是对一级的键值对进行了处理，没有处理这种嵌套情况。

然后又看了下提交表单发送的post数据，从firebug可以看到表单数据，以及编码的数据。对比两个编码后的数据发现情况很不一样。表单提交之后的post数据编码后是这样的：item%5B001%5D%5B%5D=1&item%5B001%5D%5B%5D=2&title=test urllib.urlencode编码后的数据是这样的：items=%7B%27001%27%3A%5B%271%27%2C%272%27%5D%7D&title=test 这意味着，虽然php端api输出的接收到的post数据格式为：{"items":{"001":["1","2"]},"title":"test"}，但并不意味着客户端给它发的数据就是这个格式的，php可能做了处理？（懂行的朋友请指教下）

既然知道了差异，那么就改编下urlencode把，一切都是为了业务。于是有了下面代码：［今天使用中发现一个bug，修复］

.. code:: python

#copy from urllib
from urllib import quote, quote_plus, _is_unicode

def urlencode(query, doseq=0):
    """Encode a sequence of two-element tuples or dictionary into a URL query string.
    If any values in the query arg are sequences and doseq is true, each
    sequence element is converted to a separate parameter.
    If the query arg is a sequence of two-element tuples, the order of the
    parameters in the output will match the order of parameters in the
    input.
    """
    if hasattr(query,"items"):
        # mapping objects
        query = query.items()
    else:
        # it's a bother at times that strings and string-like objects are
        # sequences...
        try:
            # non-sequence items should not work with len()
            # non-empty strings will fail this
            if len(query) and not isinstance(query[0], tuple):
                raise TypeError
                # zero-length sequences of all types will get here and succeed,
                # but that's a minor nit - since the original implementation
                # allowed empty dicts that type of behavior probably should be
                # preserved for consistency
        except TypeError:
            ty,va,tb = sys.exc_info()
            raise TypeError, "not a valid non-string sequence or mapping object", tb

    l = []
    if not doseq:
        # preserve old behavior
        for k, v in query:
            k = quote_plus(str(k))
            v = quote_plus(str(v))
            l.append(k + '=' + v)
    else:
        for k, v in query:
            k = quote_plus(str(k))
            if isinstance(v, str):
                v = quote_plus(v)
                l.append(k + '=' + v)
            elif _is_unicode(v):
                # is there a reasonable way to convert to ASCII?
                # encode generates a string, but "replace" or "ignore"
                # lose information and "strict" can raise UnicodeError
                v = quote_plus(v.encode("ASCII","replace"))
                l.append(k + '=' + v)
            elif isinstance(v, dict):    #add by huyang
                for _k, _v in v.items():
                    encode_k = quote_plus('%s[%s][]' % (k, _k))    #fixed
                    if isinstance(_v, list):
                        for inner_v in _v:
                            l.append(encode_k + '=' + quote_plus(inner_v))
                    else:
                        l.append(encode_k + '=' + quote_plus(v))
            else:
                try:
                    # is this a sufficient test for sequence-ness?
                    len(v)
                except TypeError:
                    # not a sequence
                    v = quote_plus(str(v))
                    l.append(k + '=' + v)
                else:
                    # loop over the sequence
                    for elt in v:
                        l.append(k + '=' + quote_plus(str(elt)))
    return '&'.join(l)

相关文章