xDeepFM 极深因子分解机模型从0到1要点解析

整体介绍

考虑三阶FM时，我们是将三个嵌入向量进行Hadamard乘法操作，然后对得到的向量进行求和。而CIN则是在向量级别进行高阶组合，然后再对这些组合结果进行求和池化。这种思路和模型名称 “eXtreme Deep Factorization Machine (xDeepFM)” 相关，因为它强调了对嵌入特征的深度组合操作。

When considering third-order Factorization Machines (FM), the approach involves performing Hadamard product operations on three embedding vectors and then summing the resulting vectors. Contrastingly, Compressed Interaction Network (CIN) operates at the vector level for high-order combinations and subsequently performs sum pooling on these combination results. This approach and model name, “eXtreme Deep Factorization Machine (xDeepFM),” are related because it emphasizes deep combination operations on embedded features.

整体介绍

CIN：Compressed Interaction Network

优点

与DCN相似的是：限制高阶特征、自动交叉特征、参数共享。
CIN和DCN都在Cross层中采用了相似的设计理念，Cross层的输入也是来自前一层和输出层的特征。

Similar to DCN, there are several commonalities: the limitation of high-order features, automatic feature interactions, and parameter sharing. Both CIN and DCN adopt similar design principles in the Cross layer, where the input to the Cross layer is derived from features from the previous layer and the output layer.

xDeepFM将基于Field的向量智能思想引入Cross层，并保留了Cross层的优势，模型结构也非常巧妙，实验结果也有显著提升。

xDeepFM introduces intelligent vectorization based on fields into the Cross layer while retaining the advantages of the Cross layer. The model structure is cleverly designed, and experimental results show significant improvements.

缺点

xDeepFM的时间复杂度会是其工业落地的一个主要性能瓶颈，需要重点优化。

The time complexity of xDeepFM can be a significant performance bottleneck when it comes to industrial deployment, so it requires specific optimization efforts.

CIN与Cross的几个主要差异：

Cross是按位操作，就好像在每一层考虑每个特征的不同组合，而CIN则是按照整个特征向量来考虑的；
在第L层，Cross考虑了从1阶到L+1阶的所有组合特征，而CIN只考虑了L+1阶的组合特征。因此，Cross在输出层会输出所有的中间结果，而CIN每层都会输出中间结果；
造成这两种差异的原因在于Cross的计算公式除了考虑上一层和输入层的乘积外，还额外考虑了输入层的影响。这两种方法都是为了覆盖所有阶数的特征组合，只是采取了不同的策略。

There are several key distinctions between Cross and CIN in terms of how they handle feature interactions:

Cross performs bitwise operations, considering different combinations of each feature at each layer, while CIN considers the entire feature vector.
In the Lth layer, Cross takes into account all combinations of features from the 1st order to the L+1 order, whereas CIN only considers L+1 order combination features. As a result, Cross outputs all intermediate results at the output layer, while CIN produces intermediate results at each layer.
The reason for these differences lies in the calculation formula of Cross, which takes into account the influence of the input layer in addition to the product of the previous layer and the input layer. Both approaches aim to cover feature combinations of all orders but employ different strategies to do so.

Cin网络理解

可以用dcn的cross网络公式理解，只是这里X(L)=X(L-1)X0W ,同时这里表示的都是vector-wise的哈达玛乘积。具体vector-wise和哈达玛乘积见下面解释。

Cin的每层要通过sum pooling对vector的元素加权和输出.
CIN基于vector-wise的高阶组合再作sum pooling.

You can understand this by using the formula for the Cross network in DCN, but here, X(L) = X(L-1) * X0 * W, and it’s worth noting that everything is represented in terms of vector-wise element-wise Hadamard products.

For CIN, at each layer, you perform sum pooling to compute the weighted sum of vector elements, based on vector-wise high-order combinations, and then you output the result.

In essence, CIN is based on vector-wise high-order combinations followed by sum pooling.

Hadamard乘积

Bit-wise和vector_wise:

-DCN（深度交叉网络）的Cross层紧跟在Embedding层之后，可以自动构建高阶特征，但特征交互方式不同。在这里，特征交互发生在元素级（bit-wise），而不是特征向量级（vector-wise）。如果我们考虑隐向量的维度为3维，那么对于两个特征（对应的向量分别为(a1, b1, c1)和(a2, b2, c2)），它们之间的交互形式类似于f(w1∗a1∗a2, w2∗b1∗b2, w3∗c1∗c2)，这意味着特征的交互是按位逐元素进行的。

相反，如果特征交互的形式更像是f(w∗(a1∗a2, b1∗b2, c1∗c2))，那么在这种情况下，我们认为特征的交互是在整个特征向量级别上进行的，而不是分别对每个元素进行处理。

In DCN (Deep & Cross Network), the Cross layer immediately follows the Embedding layer and automatically constructs high-order features, but the nature of feature interaction is different. Here, feature interactions occur at the element-wise level (bit-wise) rather than the feature vector level (vector-wise). If we consider the latent vectors to be 3-dimensional, for two features (corresponding vectors (a1, b1, c1) and (a2, b2, c2)), their interaction is of the form f(w1 * a1 * a2, w2 * b1 * b2, w3 * c1 * c2). This means that feature interactions occur element-wise, treating each element individually.

Conversely, if the feature interaction is more like f(w * (a1 * a2, b1 * b2, c1 * c2)), in this case, we consider that feature interactions occur at the whole feature vector level, rather than processing each element separately.